Version: 8

3. Perception & ML

🕒 Duration: 1-2 hours
💪 Level: Intermediate

Tutorial Overview

This tutorial dives into the advanced perception and machine learning capabilities of MoveIt Pro. You'll learn how to use computer vision tools like AprilTags, point cloud registration, and segmentation to enable your robot to perceive its environment. You'll also explore how to structure complex task plans using Behavior Trees. Whether you're stacking blocks, identifying parts, or debugging planning failures, this tutorial will guide you through practical workflows and best practices.

Pre-reqs

You should have already installed MoveIt Pro. We will assume you have already completed Tutorial 1 and have basic familiarity with the software.

Start MoveIt Pro

Launch the application using:

moveit_pro run -c lab_sim

Perception Approaches In MoveIt Pro

In highly structured environments you might be able to program your robot arm to do things like basic pick and place without any perception. However, for most robotics applications today having cameras and computer vision is crucial.

In our world view of robotics there are roughly 4 main categories of perception:

Fiducial Marker Detection (e.g. AprilTags)
Classic computer vision (e.g. OpenCV)
Point Cloud Segmentation (e.g. PCL)
Machine Learning (e.g. SAM2)

MoveIt Pro no longer ships with examples of classic computer vision, due to the rapidly evolving AI landscape. However we do ship with many examples of the other three approaches to perception. We will demonstrate these in the following sections:

Fiducial Marker Detection with AprilTags

In this exercise we will build a new application that will use AprilTags to detect and stack blocks.

info

AprilTags are a type of fiducial marker system commonly used in robotics and computer vision. They consist of black-and-white square patterns that can be easily detected and uniquely identified by a camera. In robotics, AprilTags are often used for precise localization and pose estimation of objects in the environment. AprilTags are particularly useful in applications requiring low-cost, reliable visual tracking without the need for complex sensors.

Initialize the Robot and Scene

First, create a new Objective called Stack Blocks and set the category to Tutorials. If you’re unsure how to create a new Objective, please refer to the Tutorial 1.

New Objective

Once created, remember to delete the AlwaysSuccess Behavior from your new empty Objective.

Take make this tutorial quick and easy, we will use a lot of pre-built subtrees that ship with MoveIt Pro. Add the following four subtrees to the default Sequence:

Open Gripper
Look at Table
Clear Snapshot
Take Wrist Camera Snapshot

Subtrees

When finished, Run the Stack Blocks Objective. You should see the robot reset its end effector, move to its home position, clear out any previous depth sensor readings, then take a fresh depth sensor reading using its wrist camera (not the scene camera).

Get Object Pose from AprilTag

The next step for us in the Stack Blocks Objective is to detect the pose of a block, by using the block's AprilTag to locate it.

Edit the Stack Blocks Objective that you began above.
Add a Sequence node to help us organize our Behavior Tree as it grows.

Add the following three subtrees/Behaviors to the newly added Sequence, tweaking the parameters to some of the Behaviors as described below:

tip

Use the dropdown selectors in the behavior input ports to make building your Behavior Tree faster.

Take Wrist Camera Image
Get AprilTag Pose from Image
1. Set the camera_info to {wrist_camera_info}.
2. Set the image to {wrist_camera_image}
TransformPose to align the detection pose with the end effector grasp frame.
1. Set the input_pose to {detection_pose}.
2. Set the output_pose to {grasp_pose}.
3. Set the quaternion_xyzw to 1;0;0;0 to rotate the pose 180 degrees about the X axis.
  1. You can use the Orientation Converter by clicking Edit icon next to the input.
4. Set the translation_xyz to 0;0;0 or leave it empty.
VisualizePose
1. Set the pose to {grasp_pose}

Your tree should look similar to this now (zoomed in):

Partial Progress BTree

tip

If you want to know more about how each Behavior works, you can see the built-in descriptions in two locations. Either:

When you hover over a Behavior in the left sidebar
When you click on a Behavior and the parameter sidebar opens on the right. The right sidebar is also useful in that it shows you all the input and output ports, along with their port descriptions.

Now Run your Stack Blocks Objective, and you should see a 3-axis colored pose marker appear on the detected block:

Stack Blocks

Your simple AprilTag perception approach is in-place, great job so far!

Pick from Fiducial Marker Detection

Next, to pick the block we detected with the Fiducial Marker Detection:

Press the Edit button in your Stack Blocks Objective to begin editing again.
Add the Pick from Pose subtree to the bottom of the root Sequence (not the child Sequence).
Run the Objective again

You can't really see if the grasp was successful in the Visualization pane since we are not leveraging attached collision objects to add grasp information to the planning scene in this example, but if you look at the /wrist_camera/color view pane you should see that the block is between the robot's two fingers:

Stack Blocks

Can the camera feeds be higher resolution?

Some folks have asked us if the above image could be higher quality, and the answer is yes! We've purposefully turned down the default rendering quality to make MoveIt Pro more performant for the most generic set of customer's computers. We don't require that you have a GPU for many of our applications, for example. It's too advanced for this tutorial, but know that the rendering can certainly be increased if your computer can handle it.

Create a Subtree

At this point our behavior tree became more complex, so let's convert the previous Sequence into a subtree.

tip

As your application gets more complex, we recommend you use sequences and subtrees to manage the complexity with nicely labeled abstractions.

info

Subtrees are modular components within a larger Behavior Tree that encapsulate a set of actions or decision logic. By allowing one Behavior Tree to reference another, subtrees enable greater reusability, organization, and maintainability in complex robotic applications. They help abstract away lower-level details, reduce duplication of nodes, and make it easier to manage common sequences or Behaviors across different Objectives. In MoveIt Pro, subtrees are especially useful for structuring sophisticated task plans in a clean and scalable way.

Edit the Objective, then:

Click on the child Sequence (not the parent Sequence)
Click on the Create Subtree icon that appears on top of the node
Name the subtree Get Object Pose
Set the category again to Tutorials
Keep the Subtree-only Objective checkbox checked
Click Create

info

The Subtree-only Objective checkbox means that this Objective can only be run as a subtree within another Objective, not as a stand-alone Objective.

Get Object Pose Modal

After converting to a subtree, the Stack Blocks Objective should look like:

Stack Blocks Progress

Run the Objective to see if it works. Did we miss anything?

You should see an error message that looks like this:

Error Because of Subtree

This is expected because we forgot to setup port remapping for our new subtree. A great segue to our next lession.

Port Remapping

info

Port remapping in Behavior Trees allows you to reuse generic nodes by dynamically assigning their input/output ports to different blackboard entries, depending on the context in which they're used. This makes Behavior Trees more flexible and modular, enabling the same node or subtree to be used in different contexts across various parts of the tree without changing its internal logic.

Go into edit mode of the Get Object Pose subtree
1. We recommend you do this by first editing Stacked Blocks Objective then clicking the pencil icon on the Get Object Pose subtree. However you can also search for the subtree directly in the list of available Objectives, you just won't be able to switch between the parent and child Behavior Tree as easily.
Choose the root node (also called Get Object Pose).
In the popup sidebar, click the + button to add an in/out port, to allow the sharing of the AprilTag pose that was detected.
Set the Name of the port to grasp_pose
Set the Default Value of the port to {grasp_pose}
Optionally, set the Type to geometry_msgs::PoseStamped so that {grasp_pose} shows up in other geometry_msgs::PoseStamped input port dropdown menus.
Optionally add a description.

It should look something like this:

Get Object Pose

info

The Port Name is the Blackboard variable name that will be passed to and from the inner contents of this subtree. For example, here we set it to grasp_pose (no brackets) so that it outputs the blackboard variable {grasp_pose} from its interior TransformPose Behavior.

The Default Value is the text that will populate in this port's value anywhere this subtree is added into an Objective. This default text can always be changed after the subtree is added.

Click out of the sidebar, somewhere in the Behavior Tree editor area, to close the sidebar.

Now go back to editing the parent Stack Blocks Objective by using the back arrow button located to the left of the Objective name at the top of the screen.

tip

Another way to go back to editing the parent Objective is to use your browser back button.

The Stack Blocks Objective should now look like this:

Stack Blocks

info

Notice the In/Out port icon next to grasp_pose. All ports that are remapped into a subtree are always both in and out ports, because they are shared memory pointers between the parent Objective's and the subtree's blackboard.

Place the Object

Finally, let's put the finishing touches on our Stack Blocks Objective by adding these existing, pre-built subtrees:

Add Look at Table
Add Place Object
Add Open Gripper

note

Place Object is an example subtree that will put something in the robot's end effector at a pre-defined waypoint. Adding the Look at Table subtree before placing the object is needed so that the planner will approach the placement point from above.

Your fully finished Stack Blocks Objective should look like this:

Stack Blocks

Run the Objective and you should see the robot pick up a block, move to the Look at Table waypoint, then plan a placement trajectory and ask for approval:

Look at Table

User Approval

The approval step is optional but can be nice for visualizing what the robot is about to do. MoveIt Pro provides various built-in Behaviors for user interaction for applications that require human in the loop. In our example, your Place Object subtree includes this capability by using the Wait for Trajectory Approval if User Available subtree, which checks if there is a UI attached, and if so asks the user to approve the trajectory.

Once you approve the trajectory, the robot should stack the block on top of the other:

Blocks

tip

You won’t see the blocks being stacked in the “Visualization” pane without updating the planning scene, but they can be seen live in the simulated camera feeds, for example under /scene_camera/color and /wrist_camera/color.

Now let's learn how to implement some more advanced perception to identify the block.

Point Cloud Registration with Iterative Closest Point (ICP)

There are many other perception capabilities within MoveIt Pro beyond AprilTags, and in this section, we’ll learn about point cloud registration.

info

Point cloud registration is the process of localizing an object within a point cloud, given a CAD mesh file as an input. This is used in robotics for locating a part within a workspace, or as an input to manipulation flows like polishing and grinding parts.

Typically, point cloud registration starts with an initial guess pose, which might be from an ML perception model, or based on where an object should be by the design of the robot workspace. This initial guess pose should be close to the object being registered, but does not have to be exact. The registration process then will find the exact pose using one of several algorithms, such as Iterative Closest Point (ICP).

info

Iterative Closest Point (ICP) is a foundational algorithm in robotics used to align 3D point clouds by estimating the best-fit rigid transformation between two sets of data. In robotic applications, ICP plays a critical role in tasks like localization, mapping, object tracking, and sensor fusion by helping a robot match its current sensor data to a known map or model. The algorithm works by iteratively refining the alignment based on minimizing the distance between corresponding points. While powerful, ICP requires a reasonable initial guess to avoid converging to an incorrect local minimum and is most effective when there is significant overlap between point clouds.

In MoveIt Pro, the RegisterPointClouds Behavior provides this capability:

RegisterPointClouds

We'll explore an example application:

First, restart MoveIt Pro from the command line to reset the stacked block to its original position. Then edit the Register CAD Part Objective so that we can skim its architecture:

You’ll see the following overall flow:

Move the camera on the end effector to look at the area of interest and capture a point cloud and add it to the visualization
Create an initial guess pose
Load a cube mesh as a pointcloud and and visualize it in red at our initial guess
Register the cube in the wrist camera point cloud using ICP
Load a cube mesh as a pointcloud and and visualize it in green at the registered pose

Next, Run the Objective, and you should see two point clouds appear, first a red one above the table (the initial guess), then a green one that matches the closest cube to the initial guess.

Objective

Next we will modify the guess pose.

Edit the Objective
Change the x, y, and z values in the CreateStampedPose Behavior to:

-0.2; 0.75; 0.6

Run the Objective again and notice how the new guess will register a different cube.

Advanced: As an additional hands-on exercise, you can replace the Get Object Pose subtree in the Stack Blocks Objective with this Register CAD Part Objective. Hint: You will need to add a port to the Register CAD Part subtree. You will also need to use the TransformPoseWithPose inside Register CAD Part to get the target pose in the world frame. Finally, You will need to use TransformPose to adjust the pose to match the gripper frame.

note

You can refer to the Stack Blocks with ICP Objective to see how to modify Register CAD Part for use with Stack Blocks.

Point Cloud Segmentation using ML

info

Point Cloud Segmentation with Machine Learning (ML) refers to the process of automatically dividing a 3D point cloud into meaningful regions or object parts based on learned patterns. Instead of relying solely on hand-tuned geometric rules (like plane fitting or clustering), ML-based segmentation trains models to recognize complex structures and variations directly from data. These models can classify points individually (semantic segmentation) or group them into distinct object instances (instance segmentation).

In this section we will explore the example ML Segment Point Cloud from Clicked Point Objective that demonstrates using ML for perception. It uses the GetMasks2DFromPointQuery Behavior which calls the ML model Segment Anything Model (SAM2) to find objects of interest.

The Objective:

Prompts the user to click on three objects in the color wrist camera image. The number three is arbitrary, it can be one or more.
Creates a 2D mask of the object using SAM2

What are Masks?
A 2D binary image where each pixel indicates whether it belongs to the segmented object (foreground) or the background, typically generated by models like SAM for isolating objects.
Converts the 2D mask to a 3D mask, mapping the mask into the 3D point cloud
Applies the 3D mask to the point cloud, removing everything except the chosen object(s)

Run this yourself:

Run the ML Segment Point Cloud from Clicked Point Objective.
Ensure the view port /wrist_camera/color is visible
It will prompt you to click three points on an object of interest
It will then segment out the point cloud for that object and visualize it. Note that results may vary depending on the points selected.

Advanced: For another hands-on exercise, instead of using fiducial marker perception or point cloud segmentation to get the pose of the block in your Stack Blocks Objective, try using ML segmentation. Hint: consider the GetPointCloudFromMask3D and GetCentroidFromPointCloud Behaviors.

info

Why does the point cloud of the beaker appear flat?

In this environment we have modeled the beaker in the way laser sensors would perceive them. That means that the simulated camera will see the beaker, but the simulated depth sensor will not.

Resetting & Manipulating the Simulation

In previous sections the robot was moving blocks around on the table. If your blocks get into a state you don’t want them, you may desire to reset the simulaton. There are two options for this:

You can restart MoveIt Pro from the command line, or
Run the MuJoCo Interactive Viewer to reset the simulation.

Performance Note:

Running the MuJoCo Interactive Viewer alongside MoveIt Pro can impact system performance, and may not be feasible for lower-powered systems.

To enable the MuJoCo Interactive Viewer:

Exit MoveIt Pro using CTRL-C at the command line
Navigate through the command line to the lab_sim robot config folder:
```
cd ~/moveit_pro/moveit_pro_example_ws/src/lab_sim/config/
```
Open config.yaml using your favorite editor / IDE. We recommend VS Code.
Search for the urdf_params tag
Find the line that says mujoco_viewer and flip the boolean to true.

hardware:
  robot_description:
    urdf:
      package: "lab_sim"
      path: "description/picknik_ur.xacro"
    srdf:
      package: "lab_sim"
      path: "config/moveit/picknik_ur.srdf"
    urdf_params:
      - mujoco_model: "description/scene.xml"
      - mujoco_viewer: true

Re-launch MoveIt Pro and the MuJoCo Interactive Viewer should launch alongside MoveIt Pro.

Within the viewer, you can move objects around manually with a "hand of god"-like functionality:

Double-click the object you want to move
To lift and move: CTRL+Right Mouse
To drag horizontally: CTRL+SHIFT+Right Mouse

HandOfGod

You can reset the simulation using the Reset button on the bottom left menu under the section "Simulation".

Reset

Finally, you can see useful debug information about the collision bodies by switching the "Geom Groups" under the section "Group enable":

Toggle off Geom 0-2
Toggle on Geom 3-5
Under the Rendering tab, enable Convex Hull

Now we can visualize the collision geometry, instead of the visual representation.

CollisionShapes

tip

Learn more about configuring the simulator at the MuJoCo configuration guide

Add a Breakpoint

To debug what is occurring in an Objective, insert a breakpoint using a BreakpointSubscriber.

Breakpoint Subscriber

Edit your Stack Blocks Objective, and add a BreakpointSubscriber in the middle, right after Pick from pose

Stack Blocks

Run the Objective, and you should see the robot pick the cube, then the Objective will wait at the breakpoint until the Resume button in the top right corner is pressed.

Resume

At this point, you can move the Visualization pane to get a better look at the scene, dig into the Blackboard parameter values such as the current value of grasp_pose, or other debugging needs.

Press Resume to finish running the Objective.

Summary

By completing this tutorial, you've gained hands-on experience with powerful tools in MoveIt Pro for perception-driven manipulation. You've learned how to integrate fiducial detection, point cloud processing, and modular Behavior Trees to create reusable, intelligent robotic Objectives. You also explored simulation reset tools and debugging techniques that support more robust development workflows.

🎉 Congratulations, we're now ready to move to the next tutorial!

Tutorial Overview​

Pre-reqs​

Start MoveIt Pro​

Perception Approaches In MoveIt Pro​

Fiducial Marker Detection with AprilTags​

Initialize the Robot and Scene​

Get Object Pose from AprilTag​

Pick from Fiducial Marker Detection​

Create a Subtree​

Port Remapping​

Place the Object​

Point Cloud Registration with Iterative Closest Point (ICP)​

Point Cloud Segmentation using ML​

Resetting & Manipulating the Simulation​

Add a Breakpoint​

Summary​

Tutorial Overview

Pre-reqs

Start MoveIt Pro

Perception Approaches In MoveIt Pro

Fiducial Marker Detection with AprilTags

Initialize the Robot and Scene

Get Object Pose from AprilTag

Pick from Fiducial Marker Detection

Create a Subtree

Port Remapping

Place the Object

Point Cloud Registration with Iterative Closest Point (ICP)

Point Cloud Segmentation using ML

Resetting & Manipulating the Simulation

Add a Breakpoint

Summary