Adding Machine Learning Based Perception to Objectives

In this tutorial, you will learn about integrating machine learning (ML) for perception into MoveIt Studio. This kind of perception can enable applications like pick-and-place, opening doors, pressing buttons, part inspection, and more. This tutorial will cover the system architecture, including containerization, ROS interfaces, and creating custom MoveIt Studio Behaviors.


Some of the Objectives in MoveIt Studio require user input, such as clicking a point on an image to identify an object in the scene. In contrast, this tutorial will show how to use machine learning to detect and estimate the pose of objects in the scene so that your Objectives can execute fully autonomously. These capabilities are useful in applications where a user or operator is not always present.

The type of ML we will be using here is instance segmentation, which is a type of image segmentation that identifies and classifies each object in the image. The output of the segmentation is a mask for each object, which can be used to extract the 3D pose of the object. See this tutorial for additional information on instance segmentation.

System Architecture


In this example, the ML pipeline we want to integrate two related capabilities:

  1. Instance segmentation.

  2. Pose estimation for the lever handle of a door, which uses the output of segmentation.

The two main components of this system are the Behaviors which are built and run in MoveIt Studio, and a separate Docker container which runs the ML code. Separating the two means we do not need to share dependencies between the two, and allows the ML pipeline to be run on separate hardware, such as GPUs in the cloud. Each component is run inside a Docker container. Running the ML container inside Docker is one way to create an isolated development environment or runtime, but it not a strict requirement.

The figure below illustrates how the packages, Docker containers, and ROS nodes are related. Text in blue is the name of ROS a package.


Suppose you’re creating three new ROS packages:

  1. ml_behaviors will define new MoveIt Studio Behaviors that call the ML pipeline as ROS services.

  2. ml_server will host the ML pipeline as a ROS service, which runs in a separate container with GPU compute capabilities.

  3. ml_interface will define the ROS message and service files which describe the interface between the ML and the MoveIt Studio container.

The two containers communicate via ROS. The package ml_interface containers ROS interface definitions for two ROS Services, segment_image and estimate_pose. Inside the ML container, there is a ROS Node that provides these services.

On your host system, create a User Workspace at a chosen location (for example, ~/moveit_studio_ws/). This will contain both the ml_interface package, and another package ml_behaviors which calls the ROS services provided by the ML container. This workspace will be mounted into the MoveIt Studio Docker containers.

The ml_behaviors package contains two behaviors:

  1. Detect Objects
    • Detects Objects in the RGB image and outputs instance segmentation masks and bounding boxes with class names and confidence scores.

  2. Get Lever Handle From Detections
    • Takes the output of Detect Objects and outputs the 3D representation of the lever handle.

For information on how to create your own custom behaviors, see the Objective Building Examples tutorials.

When building an Objective, we use the Detect Objects Behavior followed the Get Lever Handle From Detections Behavior. As illustrated by the green lines in the figure above, the ports detect_objects_goal and detect_objects_result are passed from the Detect Objects Behavior to the Get Lever Handle From Detections Behavior. These behaviors call the ROS services segment_image and estimate_pose to perform the ML tasks. Building custom behaviors that call ROS services or actions can be done by inheriting from the Built-In Behaviors ActionClientBehaviorBase or ServiceClientBehaviorBase.

Elsewhere on your host system, create a ROS workspace for the ML code (for example, ~/ml_ws). This new workspace will contain ml_server and ml_interface.

ROS Interface Definitions

The ROS interface definitions for the segment_image and estimate_pose services might look like this:

# segment_image.srv
# Request
sensor_msgs/Image source_rgb_image
sensor_msgs/Image source_depth_image
sensor_msgs/CameraInfo source_rgb_camera_info

# Response
# Contains class ID and probability score
vision_msgs/ObjectHypothesis[] result_hypothesis
sensor_msgs/Image[] masks

bool success
string error_message
# estimate_pose.srv
# Request
# original inputs plus the outputs from /segment_image
sensor_msgs/Image source_rgb_image
sensor_msgs/Image source_depth_image
sensor_msgs/CameraInfo source_rgb_camera_info
vision_msgs/ObjectHypothesis[] result_hypothesis
sensor_msgs/Image[] masks

# Response
vision_msgs/ObjectHypothesis[] result_hypothesis
geometry_msgs/PoseStamped[] poses

bool success
string error_message

Launching the ML Pipeline along side MoveIt Studio

To run the ML alongside MoveIt Studio, the ML Docker container needs to be launched. Launch the pipeline in with a one-line command by creating a service in a docker-compose.yaml file that builds and launches the ROS nodes. These Docker files could be placed in the ml_server package inside the ~/ml_ws ROS workspace.

For example:

docker compose up detect_and_fit
# docker-compose.yaml
# an example Docker Compose file
    image: ml_base
    context: .
    dockerfile: .docker/Dockerfile
    # Interactive shell
    stdin_open: true
    tty: true
    # Networking and IPC for ROS 2
    network_mode: host
    ipc: host
    # add GPU capabilities
        - ROS_DOMAIN_ID
        # Allows graphical programs in the container
        - DISPLAY=${DISPLAY}
        - QT_X11_NO_MITSHM=1
        # Mount the ROS workspace containing the source code
        - ../../:/ml_ws/:rw
        # Allows graphical programs in the container
        - /tmp/.X11-unix:/tmp/.X11-unix:rw
        - ${XAUTHORITY:-$HOME/.Xauthority}:/root/.Xauthority
        # Mount the host's DDS config(s)
        - ${CYCLONEDDS_URI:-/dev/null}:/opt/.ros/cyclonedds.xml
                - capabilities: [gpu]
        # command gets passed to the, which ends by executes it
    command: /bin/bash

# build the ROS 2 workspace in a separate service,
# so we ensure the workspace is built before we launch the nodes.
    extends: base
    entrypoint: /opt/.ros/
    command: bash -c "source /ml_ws/src/ml_behaviors/.docker/"

# add a service that runs the ROS launch command to start the ML pipeline
# this service will be named `detect_and_fit`
    extends: base
        condition: service_completed_successfully
    entrypoint: /opt/.ros/
    command: ros2 launch ml_server detect_and_fit.launch.xml

Then, MoveIt Studio can be launched as usual:

./moveit_studio run

Video Demonstration of Opening Lever Handle Door

The video below shows the result of integrating a custom ML pipeline for detecting lever handles into MoveIt Studio. The behaviors Detect Objects and Get Lever Handle From Detections are used to detect the lever handle, then built-in behaviors are used to plan and execute a trajectory to open the door.


This tutorial showed an example of how an ML pipeline can be integrated into MoveIt Studio. From here, you can apply these ideas to your own ML pipeline.