Description of the current intra-process communication mechanism in ROS 2 and of its drawbacks. Design proposal for an improved implementation. Experimental results.
Authors: Alberto Soragna Juan Oxoby Dhiraj Goel
Date Written: 2020-03
Last Modified: 2020-03
The subscriptions and publications mechanisms in ROS 2 fall in two categories:
This design document presents a new implementation for the intra-process communication.
Even if ROS 2 supports intra-process communication, the implementation of this mechanism has still much space for improvement. Until ROS 2 Crystal, major performance issues and the lack of support for shared pointer messages were preventing the use of this feature in real applications.
With the ROS 2 Dashing release, most of these issues have been addressed and the intra-process communication behavior has improved greatly (see ticket).
The current implementation is based on the creation of a ring buffer for each Publisher
and on the publication of meta-messages through the middleware layer.
When a Publisher
has to publish intra-process, it will pass the message to the IntraProcessManager
.
Here the message will be stored in the ring buffer associated with the Publisher
.
In order to extract a message from the IntraProcessManager
two pieces of information are needed: the id of the Publisher
(in order to select the correct ring buffer) and the position of the message within its ring buffer.
A meta-message with this information is created and sent through the ROS 2 middleware to all the Subscription
s, which can then retrieve the original message from the IntraProcessManager
.
Several shortcomings of the current implementation are listed below.
The current implementation can’t be used when the QoS durability value is set to Transient Local
.
The current implementation does not enforce the depth of the QoS history in a correct way.
The reason is that there is a single ring buffer per Publisher
and its size is equal to the depth of the Publisher
’s history.
A Publisher
stores a message in the ring buffer and then it sends a meta-message to allow a Subscription
to retrieve it.
The Subscription
correctly stores meta-messages up to the number indicated by its depth of the history, but, depending on the frequency at which messages are published and callbacks are triggered, it may happen that a meta-message processed from the Subscription
does not correspond anymore to a valid message in the ring buffer, because it has been already overwritten.
This results in the loss of the message and it is also a difference in behavior between intra and inter-process communication, since, with the latter, the message would have been received.
Moreover, even if the use of meta-messages allows to deleagate the enforcement of other QoS settings to the RMW layer, every time a message is added to the ring buffer the IntraProcessManager
has to compute how many Subscription
s will need it.
This potentially breaks the advantage of having the meta-messages.
For example, the IntraProcessManager
has to take into account that potentially all the known Subscription
s will take the message, regardless of their reliability QoS.
If a Publisher
or a Subscription
are best-effort, they may not receive the meta-message thus preventing the IntraProcessManager
from releasing the memory in the buffer.
More details here.
TODO: take into account also new QoS: Deadline, Liveliness and Lifespan reference.
The current implementation of intra-process communication has to send meta-messages from the Publisher
to the Subscription
s.
This is done using the rmw_publish
function, the implementation of which depends on the chosen middleware.
This results in the performance of a ROS 2 application with intra-process communication enabled being heavily dependent on the chosen RMW implementation.
Given the fact that these meta-messages have only to be received from entities within the same process, there is space for optimizing how they are transmitted by each RMW. However, at the moment none of the supported RMW is actively tackling this issue. This results in that the performance of a single process ROS 2 application with intra-process communication enabled are still worst than what you could expect from a non-ROS application sharing memory between its components.
In the following some experimental evidences are quickly presented.
When a Node
creates a Publisher
or a Subscription
to a topic /MyTopic
, it will also create an additional one to the topic /MyTopic/_intra
.
The second topic is the one where meta-messages travel.
Our experimental results show that creating a Publisher
or a Subscription
has a non-negligible memory cost.
This is particularly true for the default RMW implementation, Fast-RTPS, where the memory requirement increases almost expontentially with the number of participants and entities.
Publishing a meta-message has the same overhead as that of publishing a small inter-process message.
However, comparing the publication/reception of an intra and an inter-process message, the former requires several additional operations: it has to store the message in the ring buffer, monitor the number of Subscription
s, and extract the message.
The result is that from the latency and CPU utilization point of view, it is convenient to use intra-process communication only when the message size is at least 5KB.
Currently, ROS 2 does not provide any API for making nodes or Publisher
and Subscription
to ignore each other.
This feature would be useful when both inter and intra-process communication are needed.
The reason is that the current implementation of the ROS 2 middleware will try to deliver inter-process messages also to the nodes within the same process of the Publisher
, even if they should have received an intra-process message.
Note that these messages will be discarded, but they will still cause an overhead.
The DDS specification provides ways for potentially fixing this problem, i.e. with the ignore_participant
, ignore_publication
and ignore_subscription
operations.
Each of these can be used to ignore a remote participant or entity, allowing to behave as that remote participant did not exist.
The current intra-process communication uses meta-messages that are sent through the RMW between nodes in the same process.
This has two consequences: first it does not allow to directly “ignore” participants in the same process, because they still have to communicate in order to send and receive meta-messages, thus requiring a more fine-grained control ignoring specific Publisher
s and Subscription
s.
Moreover, the meta-messages could be delivered also to nodes in different processes if they have intra-process communication enabled. As before, the messages would be discarded immediately after being received, but they would still affect the performances. The overhead caused by the additional publication of meta-messages can be potentially reduced by appending to the intra-process topic names a process specific identifier.
The new proposal for intra-process communication addresses the issues previously mentioned. It has been designed with performance in mind, so it avoids any communication through the middleware between nodes in the same process.
Consider a simple scenario, consisting of Publisher
s and Subscription
s all in the same process and with the durability QoS set to volatile
.
The proposed implementation creates one buffer per Subscription
.
When a message is published to a topic, its Publisher
pushes the message into the buffer of each of the Subscription
s related to that topic and raises a notification, waking up the executor.
The executor can then pop the message from the buffer and trigger the callback of the Subscription
.
The choice of having independent buffers for each Subscription
leads to the following advantages:
Subscription
, while, at the same time, simplifying the implementation.Subscription
s can extract messages from their own buffer in parallel without blocking each other, thus providing higher throughput.The only drawback is that the system is not reusing as much resources as possible, compared to sharing buffers between entities. However, from a practical point of view, the memory overhead caused by the proposed implementation with respect to the current one, will always be only a tiny delta compared to the overall memory usage of the application.
There are three possible data-types that can be stored in the buffer:
MessageT
shared_ptr<const MessageT>
unique_ptr<MessageT>
The choice of the buffer data-type is controlled through an additional field in the SubscriptionOptions
.
The default value for this option is denominated CallbackDefault
, which corresponds to selecting the type between shared_ptr<constMessageT>
and unique_ptr<MessageT>
that better fits with its callback type.
This is deduced looking at the output of AnySubscriptionCallback::use_take_shared_method()
.
If the history QoS is set to keep all
, the buffers are dynamically adjusted in size up to the maximum resource limits specified by the underlying middleware.
On the other hand, if the history QoS is set to keep last
, the buffers have a size equal to the depth of the history and they act as ring buffers (overwriting the oldest data when trying to push while its full).
Note that in case of publishers with keep all
and reliable
communication, the behavior can be different from the one of inter-process communication.
In the inter-process case, the middlewares use buffers in both publisher and subscription.
If the subscription queue is full, the publisher one would start to fill and then finally the publish call would block when that queue is full.
Since the intra-process communication uses a single queue on the subscription, this behavior can’t be exactly emulated.
Buffers are not only used in Subscription
s but also in each Publisher
with a durability QoS of type transient local
.
The data-type stored in the Publisher
buffer is always shared_ptr<const MessageT>
.
A new class derived from rclcpp::Waitable
is defined, which is named SubscriptionIntraProcessWaitable
.
An object of this type is created by each Subscription
with intra-process communication enabled and it is used to notify the Subscription
that a new message has been pushed into its ring buffer and that it needs to be processed.
The IntraProcessManager
class stores information about each Publisher
and each Subscription
, together with pointers to these structures.
This allows the system to know which entities can communicate with each other and to have access to methods for pushing data into the buffers.
The decision whether to publish inter-process, intra-process or both is made every time the Publisher::publish()
method is called.
For example, if the NodeOptions::use_intra_process_comms_
is enabled and all the known Subscription
s are in the same process, then the message is only published intra-process.
This remains identical to the current implementation.
Node::create_publisher<MessageT>(...)
.NodeTopics::create_publisher(...)
, where a Publisher
is created through the factory.Publisher::SetupIntraProcess(...)
method.IntraProcessManager
is notified about the existence of the new Publisher
through the method IntraProcessManager::add_publisher(PublisherBase::SharedPtr publisher, PublisherOptions options)
.IntraProcessManager::add_publisher(...)
stores the Publisher
information in an internal structure of type PublisherInfo
.
The structure contains information about the Publisher
, such as its QoS and its topic name, and a weak pointer for the Publisher
object.
An uint64_t pub_id
unique within the rclcpp::Context
is assigned to the Publisher
.
The IntraProcessManager
contains a std::map<uint64_t, PublisherInfo>
object where it is possible to retrieve the PublisherInfo
of a specific Publisher
given its id.
The function returns the pub_id
, that is stored within the Publisher
.If the Publisher
QoS is set to transient local
, then the Publisher::SetupIntraProcess(...)
method will also create a ring buffer of the size specified by the depth from the QoS.
Node::create_subscription<MessageT>(...)
.NodeTopics::create_subscription(...)
, where a Subscription
is created through the factory.Subscription::SetupIntraProcess(...)
method.
The most relevant ones being the ring buffer and the waitable object.IntraProcessManager
is notified about the existence of the new Subscription
through the method IntraProcessManager::add_subscription(SubscriptionBase::SharedPtr subscription, SubscriptionOptions options)
.IntraProcessManager::add_subscription(...)
stores the Subscription
information in an internal structure of type SubscriptionInfo
.
The structure contains information about the Subscription
, such as its QoS, its topic name and the type of its callback, and a weak pointer for the Subscription
object.
An uint64_t sub_id
unique within the rclcpp::Context
is assigned to the Subscription
.
The IntraProcessManager
contains a std::map<uint64_t, SubscriptionInfo>
object where it is possible to retrieve the SubscriptionInfo
of a specific Subscription
given its id.
There is also an additional structure std::map<uint64_t, std::pair<std::set<uint64_t>, std::set<uint64_t>>>
.
The key of the map is the unique id of a Publisher
and the value is a pair of sets of ids.
These sets contain the ids of the Subscription
s that can communicate with the Publisher
.
We have two different sets because we want to differentiate the Subscription
s depending on whether they request ownership of the received messages or not (note that this decision is done looking at their buffer, since the Publisher
does not have to interact with the Subscription
callback).SubscriptionIntraProcessWaitable
object is added to the list of Waitable interfaces of the node through node_interfaces::NodeWaitablesInterface::add_waitable(...)
.
It is added to the same callback group used for the standard inter-process communication of that topic.Publisher::publish(std::unique_ptr<MessageT> msg)
.Publisher::publish(std::unique_ptr<MessageT> msg)
calls IntraProcessManager::do_intra_process_publish(uint64_t pub_id, std::unique_ptr<MessageT> msg)
.IntraProcessManager::do_intra_process_publish(...)
uses the uint64_t pub_id
to call IntraProcessManager::get_subscription_ids_for_pub(uint64_t pub_id)
.
This returns the ids corresponding to Subscription
s that have a QoS compatible for receiving the message.
These ids are divided into two sublists, according to the data-type that is stored in the buffer of each Susbscription
: requesting ownership (unique_ptr<MessageT>
) or accepting shared (shared_ptr<MessageT>
, but also MessageT
since it will copy data in any case).rcl_guard_condition_t
member of SubscriptionIntraProcessWaitable
of each Subscription
is triggered (this wakes up rclcpp::spin
).The way in which the std::unique_ptr<MessageT>
message is “added” to a buffer, depends on the type of the buffer.
BufferT = unique_ptr<MessageT>
The buffer receives a copy of MessageT
and has ownership on it; for the last buffer, a copy is not necessary as ownership can be transferred.BufferT = shared_ptr<const MessageT>
Every buffer receives a shared pointer of the same MessageT
; no copies are required.BufferT = MessageT
A copy of the message is added to every buffer.The Publisher::publish(...)
method is overloaded to support different message types:
unique_ptr<MessageT>
MessageT &
MessageT*
const shared_ptr<const MessageT>
The last two of them are actually deprecated since ROS 2 Dashing.
All these methods are unchanged with respect to the current implementation: they end up creating a unique_ptr
and calling the Publisher::publish(std::unique_ptr<MessageT> msg)
described above.
As previously described, whenever messages are added to the ring buffer of a Subscription
, a condition variable specific to the Subscription
is triggered.
This condition variable has been added to the Node
waitset so it is being monitored by the rclcpp::spin
.
Remember that the SubscriptionIntraProcessWaitable
object has access to the ring buffer and to the callback function pointer of its related Subscription
.
SubscriptionIntraProcessWaitable
object awakes rclcpp::spin
.SubscriptionIntraProcessWaitable::is_ready()
condition is checked.
This has to ensure that the ring buffer is not empty.SubscriptionIntraProcessWaitable::execute()
function is triggered.
Here the first message is extracted from the buffer and then the SubscriptionIntraProcessWaitable
calls the AnySubscriptionCallback::dispatch_intra_process(...)
method.
There are different implementations for this method, depending on the data-type stored in the buffer.AnySubscriptionCallback::dispatch_intra_process(...)
method triggers the associated callback.
Note that in this step, if the type of the buffer is a smart pointer one, no message copies occurr, as ownership has been already taken into account when pushing a message into the queue.Publisher::publish(std::unique_ptr<MessageT> msg)
.std::shared_ptr<MessageT> shared_msg = std::move(msg)
.Publisher::publish(std::unique_ptr<MessageT> msg)
calls IntraProcessManager::do_intra_process_publish(uint64_t pub_id, std::shared_ptr<MessageT> shared_msg)
.The following steps are identical to steps 3, 4, and 5 applied when publishing only intra-process.
IntraProcessManager::do_intra_process_publish(...)
uses the uint64_t pub_id
to call IntraProcessManager::get_subscription_ids_for_pub(uint64_t pub_id)
.
Then it calls IntraProcessManager::find_matching_subscriptions(PublisherInfo pub_info)
.
This returns the ids corresponding to Subscription
s that have a QoS compatible for receiving the message.
These ids are divided into two sublists, according to the data-type that is stored in the buffer of each Susbscription
: requesting ownership (unique_ptr<MessageT>
) or accepting shared (shared_ptr<MessageT>
, but also MessageT
since it will copy data in any case).rcl_guard_condition_t
member of SubscriptionIntraProcessWaitable
of each Subscription
is triggered (this wakes up rclcpp::spin
).After the intra-process publication, the inter-process one takes place.
Publisher::publish(std::unique_ptr<MessageT> msg)
calls Publisher::do_inter_process_publish(const MessageT & inter_process_msg)
, where MessageT inter_process_msg = *shared_msg
.The difference from the previous case is that here a std::shared_ptr<const MessageT>
is being “added” to the buffers.
Note that this std::shared_ptr
has been just created from a std::unique_ptr
and it is only used by the IntraProcessManager
and by the RMW, while the user application has no access to it.
BufferT = unique_ptr<MessageT>
The buffer receives a copy of MessageT
and has ownership on it.BufferT = shared_ptr<const MessageT>
Every buffer receives a shared pointer of the same MessageT
, so no copies are required.BufferT = MessageT
A copy of the message is added to every buffer.The difference with publishing a unique_ptr is that here it is not possible to save a copy.
If you move the ownership of the published message to one of the Subscription
(so potentially saving a copy as done in the previous case), you will need to create a new copy of the message for inter-process publication.
The proposed implementation can handle all the different QoS.
keep_last
, then the depth of the history corresponds to the size of the ring buffer.
On the other hand, if the history is set to keep_all
, the buffer becomes a standard FIFO queue with an unbounded size.IntraProcessManager
in order to understand if a Publisher
and a Subscription
are compatible.
The use of buffers ensures that all the messages are delivered without the need to resend them.
Thus, both options, reliable
and best-effort
, are satisfied.Publisher
and a Subscription
are compatible.
How this QoS is handled is described in details in the following paragraph.If the Publisher
durability is set to transient_local
an additional buffer on the Publisher
side is used to store the sent intra-process messages.
Late-joiner Subscription
s will have to extract messages from this buffer once they are added to the IntraProcessManager
.
In this case the IntraProcessManager
has to check if the recently created Subscription
is a late-joiner, and, if it is, it has to retrieve messages from the Transient Local
Publisher
s.
IntraProcessManager::find_matching_publishers(SubscriptionInfo sub_info)
that returns a list of stored PublisherInfo
that have a QoS compatible for sending messages to this new Subscription
.
These will be all Transient Local
Publisher
s, so they have a ring buffer.Subscription
.
TODO: are there any constraints on the order in which old messages have to be retrieved? (i.e. 1 publisher at the time; all the firsts of each publisher, then all the seconds …).rcl_guard_condition_t
member of the SubscriptionIntraProcessWaitable
associated with the new Subscription
.However, this is not enough as it does not allow to handle the scenario in which a transient local
Publisher
has only intra-process Subscription
s when it is created, but, eventually, a transient local
Subscription
in a different process joins.
Initially, published messages are not passed to the middleware, since all the Subscription
s are in the same process.
This means that the middleware is not able to store old messages for eventual late-joiners.
The solution to this issue consists in always publishing both intra and inter-process when a Publisher
has transient local
durability.
For this reason, when transient local
is enabled, the do_intra_process_publish(...)
function will always process a shared pointer.
This allows us to add the logic for storing the published messages into the buffers only in one of the two do_intra_process_publish(...)
cases and also it allows to use buffers that have only to store shared pointers.
In the previous sections, it has been briefly described how a message can be added to a buffer, i.e. if it is necessary to copy it or not.
Here some details about how this proposal adresses some more complex cases.
As previously stated, regardless of the data-type published by the user, the flow always goes towards Publisher::publish(std::unique_ptr<MessageT> msg)
.
The std::unique_ptr<MessageT> msg
is passed to the IntraProcessManger
that decides how to add this message to the buffers.
The decision is taken looking at the number and the type, i.e. if they want ownership on messages or not, of the Subscription
s.
If all the Subscription
s want ownership of the message, then a total of N-1
copies of the message are required, where N
is the number of Subscription
s.
The last one will receive ownership of the published message, thus saving a copy.
If none of the Subscription
s want ownership of the message, 0
copies are required.
It is possible to convert the message into a std::shared_ptr<MessageT> msg
and to add it to every buffer.
If there is 1 Subscription
that does not want ownership while the others want it, the situation is equivalent to the case of everyone requesting ownership:N-1
copies of the message are required.
As before the last Subscription
will receive ownership.
If there is more than 1 Subscription
that do not want ownership while the others want it, a total of M
copies of the message are required, where M
is the number of Subscription
s that want ownership.
1
copy will be shared among all the Subscription
s that do not want ownership, while M-1
copies are for the others.
As in the current implementation, if both inter and intra-process communication are needed, the std::unique_ptr<MessageT> msg
will be converted into a std::shared_ptr<MessageT> msg
and passed respectively to the do_intra_process_publish
and do_inter_process_publish
functions.
A copy of the message will be given to all the Subscription
s requesting ownership, while the others can copy the published shared pointer.
The following tables show a recap of when the proposed implementation has to create a new copy of a message.
The notation @
indicates a memory address where the message is stored, different memory addresses correspond to different copies of the message.
publish<T> | BufferT | Results |
---|---|---|
unique_ptr<MsgT> @1 | unique_ptr<MsgT> | @1 |
unique_ptr<MsgT> @1 | unique_ptr<MsgT> unique_ptr<MsgT> |
@1 @2 |
unique_ptr<MsgT> @1 | shared_ptr<MsgT> | @1 |
unique_ptr<MsgT> @1 | shared_ptr<MsgT> shared_ptr<MsgT> |
@1 @1 |
unique_ptr<MsgT> @1 | unique_ptr<MsgT> shared_ptr<MsgT> |
@1 @2 |
unique_ptr<MsgT> @1 | unique_ptr<MsgT> shared_ptr<MsgT> shared_ptr<MsgT> |
@1 @2 @2 |
unique_ptr<MsgT> @1 | unique_ptr<MsgT> unique_ptr<MsgT> shared_ptr<MsgT> shared_ptr<MsgT> |
@1 @2 @3 @3 |
publish<T> | BufferT | Results |
---|---|---|
shared_ptr<MsgT> @1 | unique_ptr<MsgT> | @2 |
shared_ptr<MsgT> @1 | unique_ptr<MsgT> unique_ptr<MsgT> |
@2 @3 |
shared_ptr<MsgT> @1 | shared_ptr<MsgT> | @1 |
shared_ptr<MsgT> @1 | shared_ptr<MsgT> shared_ptr<MsgT> |
@1 @1 |
shared_ptr<MsgT> @1 | unique_ptr<MsgT> shared_ptr<MsgT> |
@2 @1 |
shared_ptr<MsgT> @1 | unique_ptr<MsgT> shared_ptr<MsgT> shared_ptr<MsgT> |
@2 @1 @1 |
shared_ptr<MsgT> @1 | unique_ptr<MsgT> unique_ptr<MsgT> shared_ptr<MsgT> shared_ptr<MsgT> |
@2 @3 @1 @1 |
The possibility of setting the data-type stored in each buffer becomes helpful when dealing with more particular scenarios.
Considering a scenario with N Subscription
s all taking a unique pointer.
If the Subscription
s don’t actually take the message (e.g. they are busy and the message is being overwritten due to QoS settings) the default buffer type (unique_ptr
since the callbacks require ownership) would result in the copy taking place anyway.
By setting the buffer type to shared_ptr
, no copies are needed when the Publisher
pushes messages into the buffers.
Eventually, the Subscription
s will copy the data only when they are ready to process it.
On the other hand, if the published data are very small, it can be advantageous to do not use C++ smart pointers, but to directly store the data into the buffers.
In all this situations, the number of copies is always smaller or equal than the one required for the current intra-process implementation.
However, there is a particular scenario where having multiple buffers makes much more difficult saving a copy.
There are two Subscription
s, one taking a shared pointer and the other taking a unique pointer.
With a more centralized system, if the first Subscription
requests its shared pointer and then releases it before the second Subscription
takes the message, it is potentially possible to optimize the system to manage this situation without requiring any copy.
On the other hand, the proposed implementation will immediately create one copy of the message for the Subscription
requiring ownership.
Even in case of using a shared_ptr
buffer as previously described, it becomes more difficult to ensure that the other Subscription
is not using the pointer anymore.
The IntraProcessManger::do_intra_process_publish(...)
function knows whether the intra-process buffer of each Subscription
requires ownership or not.
For this reason it can perform the minimum number of copies required by looking at the total number of Subscription
s and their types.
The buffer does not perform any copy when receiving a message, but directly stores it.
When extracting a message from the buffer, the Subscription
can require any particular data-type.
The intra-process buffer will perform a copy of the message whenever necessary, for example in the previously described cases where the data-type stored in the buffer is different from the callback one.
The implementation of the presented new intra-process communication mechanism is hosted on GitHub here.
This section contains experimental results obtained comparing the current intra-process communication implementation with an initial implementation of the proposed one. The tests span multiple ROS 2 applications and use-cases and have been validated on different machines.
All the following experiments have been run using the ROS 2 Dashing and with -O2
optimization enabled.
colcon build --cmake-args -DCMAKE_CXX_FLAGS="-O2" -DCMAKE_C_FLAGS="-O2"
The first test has been carried out using the intra_process_demo
package contained in the ROS 2 demos repository.
A first application, called image_pipeline_all_in_one
, is made of 3 nodes, where the fist one publishes a unique_ptr<Image>
message.
A second node subscribes to the topic and republishes the image after modifying it on a new topic.
A third node subscribes to to this last topic.
Also a variant of the application has been tested: it’s image_pipeline_with_two_image_view
, where there are 2 consumers at the end of the pipeline.
In these tests the latency is computed as the total pipeline duration, i.e. the time from when the first node publishes the image to when the last node receives it.
The CPU usage and the latency have been obtained from top
command and averaged over the experiment duration.
Performance evaluation on a laptop computer with Intel i7-6600U CPU @ 2.60GHz.
ROS 2 system | IPC | RMW | Latency [us] | CPU [%] | RAM [Mb] |
---|---|---|---|---|---|
image_pipeline_all_in_one | off | Fast-RTPS | 1800 | 23 | 90 |
image_pipeline_all_in_one | standard | Fast-RTPS | 920 | 20 | 90 |
image_pipeline_all_in_one | new | Fast-RTPS | 350 | 15 | 90 |
image_pipeline_with_two_image_view | off | Fast-RTPS | 2900 | 24 | 94 |
image_pipeline_with_two_image_view | standard | Fast-RTPS | 2000 | 20 | 95 |
image_pipeline_with_two_image_view | new | Fast-RTPS | 1400 | 16 | 94 |
From this simple experiment is immediately possible to see the improvement in the latency when using the proposed intra-process communication. However, an even bigger improvement is present when analyzing the results from more complex applications.
The next results have been obtained running the iRobot benchmark application. This allows the user to specify the topology of a ROS 2 graph that will be entirely run in a single process.
The application has been run with the topologies Sierra Nevada and Mont Blanc. Sierra Nevada is a 10-node topology and it contains 10 publishers and 13 subscriptions. One topic has a message size of 10KB, while all the others have message sizes between 10 and 100 bytes.
Mont Blanc is a bigger 20-node topology, containing 23 publishers and 35 subscriptions. Two topics have a message size of 250KB, three topics have message sizes between 1KB and 25KB, and the rest of the topics have message sizes smaller than 1KB.
A detailed description and the source code for these application and topologies can be found here.
Note that, differently from the previous experiment where the ownership of the messages was moved from the publisher to the subscription, here nodes use const std::shared_ptr<const MessageT>
messages for the callbacks.
Performance evaluation on a laptop computer with Intel i7-6600U CPU @ 2.60GHz.
ROS 2 system | IPC | RMW | Latency [us] | CPU [%] | RAM [Mb] |
---|---|---|---|---|---|
Sierra Nevada | off | Fast-RTPS | 600 | 14 | 63 |
Sierra Nevada | standard | Fast-RTPS | 650 | 16 | 73->79 |
Sierra Nevada | new | Fast-RTPS | 140 | 8 | 63 |
Mont Blanc | off | Fast-RTPS | 1050 | 22 | 180 |
Mont Blanc | standard | Fast-RTPS | 750 | 18 | 213->220 |
Mont Blanc | new | Fast-RTPS | 160 | 8 | 180 |
A similar behavior can be observed also running the application on resource constrained platforms. The following results have been obtained on a RaspberryPi 2.
ROS 2 system | IPC | RMW | Latency [us] | CPU [%] | RAM [Mb] |
---|---|---|---|---|---|
Sierra Nevada | off | Fast-RTPS | 800 | 18 | 47 |
Sierra Nevada | standard | Fast-RTPS | 725 | 20 | 54->58 |
Sierra Nevada | new | Fast-RTPS | 170 | 10 | 47 |
Mont Blanc | off | Fast-RTPS | 1500 | 30 | 130 |
Mont Blanc | standard | Fast-RTPS | 950 | 26 | 154->159 |
Mont Blanc | new | Fast-RTPS | 220 | 14 | 130 |
For what concerns latency and CPU usage, Sierra Nevada behaves almost the same regardless if standard IPC is enabled or not. This is due to the fact that most of its messages are very small in size. On the other hand, there are noticeable improvements in Mont Blanc, where several messages of non-negligible size are used.
From the memory point of view, there is an almost constant increase in the utilization during the execution of the program when standard intra-process communication mechanism is used. Since the experiments have been run for 120 seconds, there is an increase of approximately 60KB per second. However, even considering the initial memory usage, it is possible to see how it is affected from the presence of the additional publishers and subscriptions needed for intra-process communication. There is a difference of 10MB in Sierra Nevada and of 33MB in Mont Blanc between standard intra-process communication on and off.
The last experiment show how the current implementation performs in the case that both intra and inter-process communication are needed.
The test consists of running Sierra Nevada on RaspberryPi 2, and, in a separate desktop machine, a single node subscribing to all the available topics coming from Sierra Nevada.
This use-case is common when using tools such as rosbag
or rviz
.
ROS 2 system | IPC | RMW | Latency [us] | CPU [%] | RAM [Mb] |
---|---|---|---|---|---|
Sierra Nevada + debug node | off | Fast-RTPS | 800 | 22 | 50 |
Sierra Nevada + debug node | standard | Fast-RTPS | 1100 | 35 | 60->65 |
Sierra Nevada + debug node | new | Fast-RTPS | 180 | 15 | 32 |
These results show that if there is at least one node in a different process, with the current implementation it is better to keep intra-process communication disabled. The proposed implementation does not require the ROS 2 middleware when publishing intra-process. This allows to easily remove the connections between nodes in the same process when it is required to publish also inter process, potentially resulting in a very small overhead with respect to the only intra-process case.
There are some open issues that are not addressed neither on the current implementation nor on the proposed one.
Publisher
or a Subscription
with a history depth of 10 will be able to store up to 20 messages without processing them (10 intra-process and 10 inter-process).
This issue is also present in the current implementation, since each Subscription
is doubled.