Tutorial 3 - Perception & Machine Learning
π Duration: 1-2 hours
πͺ Level: Intermediate
Tutorial Overviewβ
This tutorial dives into the advanced perception and machine learning capabilities of MoveIt Pro. You'll learn how to use computer vision tools like AprilTags, point cloud registration, and segmentation to enable your robot to perceive its environment. You'll also explore how to structure complex task plans using Behavior Trees. Whether you're stacking blocks, identifying parts, or debugging planning failures, this tutorial will guide you through practical workflows and best practices.
Pre-reqsβ
You should have already installed MoveIt Pro. We will assume you have already completed Tutorial 1 and have basic familiarity with the software.
Start MoveIt Proβ
Launch the application using:
moveit_pro run -c lab_sim
Perception Approaches In MoveIt Proβ
In highly structured environments you might be able to program your robot arm to do things like basic pick and place without any perception. However, for most robotics applications today having cameras and computer vision is crucial.
In our world view of robotics there are roughly 4 main categories of perception:
- Fiducial Marker Detection (e.g. AprilTags)
- Classic computer vision (e.g. OpenCV)
- Point Cloud Segmentation (e.g. PCL)
- Machine Learning (e.g. SAM2)
MoveIt Pro no longer ships with examples of classic computer vision, due to the rapidly evolving AI landscape. However we do ship with many examples of the other three approaches to perception. We will demonstrate these in the following sections:
Fiducial Marker Detection with AprilTagsβ
In this exercise we will build a new application that will use AprilTags to detect and stack blocks.
AprilTags are a type of fiducial marker system commonly used in robotics and computer vision. They consist of black-and-white square patterns that can be easily detected and uniquely identified by a camera. In robotics, AprilTags are often used for precise localization and pose estimation of objects in the environment. AprilTags are particularly useful in applications requiring low-cost, reliable visual tracking without the need for complex sensors.
Initialize the Robot and Sceneβ
First, create a new Objective called Stack Blocks and choose or create the category Tutorials. If you're unsure how to create a new Objective, please refer to the Tutorial 1.
Once created, remember to delete the AlwaysSuccess Behavior from your new empty Objective.
To make this tutorial quick and easy, we will use a lot of pre-built Subtrees that ship with MoveIt Pro. Add the following four Subtrees to the default Sequence:
Open GripperLook at TableClear SnapshotTake Wrist Camera Snapshot
When finished, Run the Stack Blocks Objective. You should see the robot reset its end effector, move to its home position, clear out any previous depth sensor readings, then take a fresh depth sensor reading using its wrist camera (not the scene camera).
Get Object Pose from AprilTagβ
The next step for us in the Stack Blocks Objective is to detect the pose of a block, by using the block's AprilTag to locate it.
- Edit the
Stack BlocksObjective that you began above. - Using the blue plus button, add a
Sequencenode to help us organize our Behavior Tree as it grows.
Using the blue plus button, add the following three Subtrees/Behaviors to the newly added Sequence, tweaking the parameters to some of the Behaviors as described below:
Use the dropdown selectors in the Behavior input ports to make building your Behavior Tree faster.
Take Wrist Camera ImageGet AprilTag Pose from Image- Set the
camera_infoto{wrist_camera_info}. - Set the
imageto{wrist_camera_image}
- Set the
TransformPoseto align the detection pose with the end effector grasp frame.-
Set the
input_poseto{detection_pose}. -
Set the
output_poseto{grasp_pose}. -
Set the
quaternion_xyzwto1;0;0;0to rotate the pose 180 degrees about the X axis.-
You can use the Orientation Converter by clicking Edit icon next to the input.
-
-
Set the
translation_xyzto0;0;0or leave it empty.
-
VisualizePose- Set the
poseto{grasp_pose}
- Set the
TransformPose has a built-in visualize_pose port you can set to true to publish a coordinate-frame marker for the output pose directly in the 3D view. This lets you skip the separate VisualizePose Behavior when debugging transforms. You can also set marker_text, marker_size, and marker_lifetime to customize the marker.
Your tree should look similar to this now (zoomed in):
If you want to know more about how each Behavior works, you can see the built-in descriptions in two locations. Either:
- When you hover over a Behavior in the left sidebar
- When you click on a Behavior and the parameter sidebar opens on the right. The right sidebar is also useful in that it shows you all the input and output ports, along with their port descriptions.
Now Run your Stack Blocks Objective, and you should see a 3-axis colored pose marker appear on the detected block:
Your simple AprilTag perception approach is in-place, great job so far!
Pick from Fiducial Marker Detectionβ
Next, to pick the block we detected with the Fiducial Marker Detection:
- Press the Edit button in your
Stack BlocksObjective to begin editing again. - Add the
Pick from PoseSubtree to the bottom of the rootSequence(not the childSequence). - Run the Objective again
You can't really see if the grasp was successful in the 3D Visualization pane since we are not leveraging attached collision objects to add grasp information to the planning scene in this example, but if you look at the /wrist_camera/color view pane you should see that the block is between the robot's two fingers:
Some folks have asked us if the above image could be higher quality, and the answer is yes! We've purposefully turned down the default rendering quality to make MoveIt Pro more performant for the most generic set of customer's computers. We don't require that you have a GPU for many of our applications, for example. It's too advanced for this tutorial, but know that the rendering can certainly be increased if your computer can handle it.
Create a Subtreeβ
At this point our Behavior Tree became more complex, so let's convert the previous Sequence into a Subtree.
As your application gets more complex, we recommend you use sequences and Subtrees to manage the complexity with nicely labeled abstractions.
Subtrees were introduced in Tutorial 1. For a refresher, see the About Subtrees section.
Edit the Objective, then:
-
Click on the child
Sequence(not the parentSequence) -
Click on the
Create Subtreeicon that appears on top of the node
-
Name the Subtree
Get Object Pose -
Set the category again to
Tutorials -
Keep the Subtree-only Objective checkbox checked
-
Click Create
The Subtree-only Objective checkbox means that this Objective can only be run as a Subtree within another Objective, not as a stand-alone Objective.
After converting to a Subtree, the Stack Blocks Objective should look like:
Run the Objective to see if it works. Did we miss anything?
You should see an error message that looks like this:
This is expected because we forgot to setup port remapping for our new Subtree. A great segue to our next lession.
Port Remappingβ
Port remapping in Behavior Trees allows you to reuse generic nodes by dynamically assigning their input/output ports to different blackboard entries, depending on the context in which they're used. This makes Behavior Trees more flexible and modular, enabling the same node or Subtree to be used in different contexts across various parts of the tree without changing its internal logic.
- Go into edit mode of the
Get Object PoseSubtree- We recommend you do this by first editing the
Stack BlocksObjective then clicking the pencil icon on theGet Object PoseSubtree. However you can also search for the Subtree directly in the left Objectives sidebar, you just won't be able to switch between the parent and child Behavior Tree as easily.
- We recommend you do this by first editing the
- Choose the root node (also called
Get Object Pose). - In the popup sidebar, click the + button to add an in/out port, to allow the sharing of the AprilTag pose that was detected.
- Set the
Nameof the port tograsp_pose - Set the
Default Valueof the port to{grasp_pose} - Optionally, set the Type to
geometry_msgs::msg::PoseStamped_<std::allocator<void> >so that{grasp_pose}shows up in othergeometry_msgs::PoseStampedinput port dropdown menus. To get the correct type, click into a Behavior in the Editor that outputs{grasp_pose}and click the blue copy button next to the port type above the port value. - Optionally add a description.
It should look something like this:
The Port Name is the Blackboard variable name that will be passed to and from the inner contents of this Subtree.
For example, here we set it to grasp_pose (no brackets) so that it outputs the blackboard variable {grasp_pose} from its interior TransformPose Behavior.

The Default Value is the text that will initially populate this port's value whenever the Subtree is added to an Objective.
You can always change this default after adding the Subtree.
For example, the port's value does not need to match the port name, and can be changed in the parent Objective.
The following screenshot demonstrates that the port value and port name can be different, and this is still valid.

As long as the port name matches the desired blackboard variable name created by a Behavior within the Subtree, the output variable will be mapped correctly.
Click out of the sidebar, somewhere in the Behavior Tree editor area, to close the sidebar.
Now go back to editing the parent Stack Blocks Objective by using the back arrow button located to the left of the Objective name at the top of the screen.
Another way to go back to editing the parent Objective is to use your browser back button.
The Stack Blocks Objective should now look like this:
Notice the In/Out port icon next to grasp_pose. All ports that are remapped into a Subtree are always both in and out ports, because they are shared memory pointers between the parent Objective's and the Subtree's blackboard.
Place the Objectβ
Finally, let's put the finishing touches on our Stack Blocks Objective by adding these existing, pre-built Subtrees:
- Add
Look at Table - Add
Place Object - Add
Open Gripper
Place Object is an example Subtree that will put something in the robot's end effector at a pre-defined waypoint. Adding the Look at Table Subtree before placing the object is needed so that the planner will approach the placement point from above.
Your fully finished Stack Blocks Objective should look like this:
Run the Objective and you should see the robot pick up a block, move to the Look at Table waypoint, then plan a placement trajectory and ask for approval:
The approval step is optional but can be nice for visualizing what the robot is about to do. MoveIt Pro provides various built-in Behaviors for user interaction for applications that require human in the loop. In our example, your Place Object Subtree includes this capability by using the Wait for Trajectory Approval if User Available Subtree, which checks if there is a UI attached, and if so asks the user to approve the trajectory.
Once you approve the trajectory, the robot should stack the block on top of the other:
You wonβt see the blocks being stacked in the "3D Visualization" pane without updating the planning scene, but they can be seen live in the simulated camera feeds, for example under /scene_camera/color and /wrist_camera/color.
Adding Teleop Recovery to your Objectiveβ
In real-world environments, automatic motion planning doesn't always succeed β an unexpected obstacle, imprecise perception, or a cluttered workspace can cause a placement to fail. Rather than having the Objective simply stop, MoveIt Pro lets you add human-in-the-loop recovery so an operator can step in and manually guide the robot through the difficult part.
A Fallback node (also known as a Selector) tries each of its children in order. If the first child succeeds, it stops. If the first child fails, it moves on to the next child as a recovery strategy.
Let's add teleop recovery to the placement step in our Stack Blocks Objective:
- Open your
Stack BlocksObjective in edit mode. - Using the blue plus button, add a
FallbackBehavior and connect it to theSequencenode, placing it just above thePlace ObjectSubtree.
- Delete the line connecting
Place Objectfrom the parentSequencenode by clicking on the line and pressing Delete. - Drag a new line from
Place Objectto the newFallbacknode, making it the first child.
- Now add the
Request TeleoperationBehavior as the second child of theFallbacknode. This is the recovery Behavior that will activate ifPlace Objectfails.
- Click on the
Request TeleoperationBehavior and modify its parameters:- Set
enable_user_interactiontotrue - Set
user_interaction_prompttoPlace the object manually
- Set
Now when the placement motion fails, instead of the Objective stopping, a Teleoperation menu will appear allowing you to manually guide the robot to a valid placement location:
Once you've teleoped the robot to the desired location, click the Continue button to resume the Objective. It will then open the gripper and complete the remaining steps automatically.
This pattern of wrapping an action in a Fallback with Request Teleoperation as the recovery child is a powerful design pattern you can reuse throughout your Objectives. It gives autonomous execution a chance to succeed first, but seamlessly falls back to human guidance when needed.
Adding Comments to your Behavior Treeβ
As your Behavior Trees grow more complex, it becomes important to document what different sections do. MoveIt Pro provides a Comment Behavior that lets you add notes directly in the tree β similar to code comments.
Using the blue plus button, add a Comment Behavior above the Fallback node we just created. Click on it and set the text parameter to:
Teleop recovery: If Place Object fails due to obstacles or planning errors,
fall back to manual teleoperation so the operator can guide the placement.
The Comment Behavior has no effect on execution β it always returns SUCCESS immediately. It exists purely for documentation purposes, making your Objectives easier to understand for yourself and your teammates.
Get in the habit of adding Comment nodes to explain non-obvious logic in your Behavior Trees, especially around Fallback and Parallel patterns where the intent may not be immediately clear.
Now let's learn how to implement some more advanced perception to identify the block.
Point Cloud Registration with Iterative Closest Point (ICP)β
There are many other perception capabilities within MoveIt Pro beyond AprilTags, and in this section, weβll learn about point cloud registration.
Point cloud registration is the process of localizing an object within a point cloud, given a CAD mesh file as an input. This is used in robotics for locating a part within a workspace, or as an input to manipulation flows like polishing and grinding parts.
Typically, point cloud registration starts with an initial guess pose, which might be from an ML perception model, or based on where an object should be by the design of the robot workspace. This initial guess pose should be close to the object being registered, but does not have to be exact. The registration process then will find the exact pose using one of several algorithms, such as Iterative Closest Point (ICP).
Iterative Closest Point (ICP) is a foundational algorithm in robotics used to align 3D point clouds by estimating the best-fit rigid transformation between two sets of data. In robotic applications, ICP plays a critical role in tasks like localization, mapping, object tracking, and sensor fusion by helping a robot match its current sensor data to a known map or model. The algorithm works by iteratively refining the alignment based on minimizing the distance between corresponding points. While powerful, ICP requires a reasonable initial guess to avoid converging to an incorrect local minimum and is most effective when there is significant overlap between point clouds.
In MoveIt Pro, the RegisterPointClouds Behavior provides this capability:
We'll explore an example application:
First, restart MoveIt Pro from the command line to reset the stacked block to its original position. Then edit the Register CAD Part Objective so that we can skim its architecture:
Youβll see the following overall flow:
- Move the camera on the end effector to look at the area of interest and capture a point cloud and add it to the visualization
- Create an initial guess pose
- Load a cube mesh as a pointcloud and and visualize it in red at our initial guess
- Register the cube in the wrist camera point cloud using ICP
- Load a cube mesh as a pointcloud and and visualize it in green at the registered pose
Next, Run the Objective, and you should see two point clouds appear, first a red one above the table (the initial guess), then a green one that matches the closest cube to the initial guess.
Next we will modify the guess pose.
- Edit the Objective
- Change the x, y, and z values in the
CreateStampedPoseBehavior to:
-0.2; 0.75; 0.6
- Run the Objective again and notice how the new guess will register a different cube.
Advanced: As an additional hands-on exercise, you can replace the Get Object Pose Subtree in the Stack Blocks Objective with this Register CAD Part Objective.
Hint: You will need to add a port to the Register CAD Part Subtree. You will also need to use the TransformPoseWithPose inside Register CAD Part to get the target pose in the world frame.
Finally, You will need to use TransformPose to adjust the pose to match the gripper frame.
You can refer to the Stack Blocks with ICP Objective to see how to modify Register CAD Part for use with Stack Blocks.
Point Cloud Segmentation using MLβ
Point Cloud Segmentation with Machine Learning (ML) refers to the process of automatically dividing a 3D point cloud into meaningful regions or object parts based on learned patterns. Instead of relying solely on hand-tuned geometric rules (like plane fitting or clustering), ML-based segmentation trains models to recognize complex structures and variations directly from data. These models can classify points individually (semantic segmentation) or group them into distinct object instances (instance segmentation).
In this section we will explore the example ML Segment Point Cloud from Clicked Point Objective that demonstrates using ML for perception. It uses the GetMasks2DFromPointQuery Behavior which calls the ML model Segment Anything Model (SAM2) to find objects of interest.
The Objective:
-
Prompts the user to click on three objects in the color wrist camera image. The number three is arbitrary, it can be one or more.
-
Creates a 2D mask of the object using SAM2
What are Masks?A 2D binary image where each pixel indicates whether it belongs to the segmented object (foreground) or the background, typically generated by models like SAM for isolating objects.
-
Converts the 2D mask to a 3D mask, mapping the mask into the 3D point cloud
-
Applies the 3D mask to the point cloud, removing everything except the chosen object(s)
Run this yourself:
- Run the
ML Segment Point Cloud from Clicked PointObjective. - Ensure the view port
/wrist_camera/coloris visible - It will prompt you to click three points on an object of interest
- It will then segment out the point cloud for that object and visualize it. Note that results may vary depending on the points selected.
Advanced: For another hands-on exercise, instead of using fiducial marker perception or point cloud segmentation to get the pose of the block in your Stack Blocks Objective, try using ML segmentation.
Hint: consider the GetPointCloudFromMask3D and GetCentroidFromPointCloud Behaviors.
In this environment we have modeled the beaker in the way laser sensors would perceive them. That means that the simulated camera will see the beaker, but the simulated depth sensor will not.
Resetting Your Robot Simulationβ
As you learned in Tutorial 1, you can reset the simulation from the MoveIt Pro settings menu.
For more advanced use cases, there are additional ways to reset the simulation:
- Programmatically: Use the
ResetMujocoKeyframeBehavior, such as by running theReset MuJoCo SimObjective, to reset the scene to the "default" keyframe. - Third party UI: Run the MuJoCo Interactive Viewer to reset the simulation (details in next section).
- Command line: Restart MoveIt Pro to completely reset the simulation scene and robot state.
To enable MuJoCo Behaviors and Objectives in your config:
- Add
MujocoBehaviorsLoaderto the corebehavior_loader_plugins - Add
mujoco_objectivesto theobjective_library_paths
See the lab_sim config.yaml for an example.
Interactive Manipulation of Simβ
You can "reach in" and manually manipulate objects in the simulation using the MuJoCo Interactive Viewer. This is a graphical tool for visualizing, debugging, and interacting with MuJoCo physics simulations in real time.
Running the MuJoCo Interactive Viewer alongside MoveIt Pro can impact system performance, and may not be feasible for lower-powered systems.
The MuJoCo Interactive Viewer is only supported when MoveIt Pro is installed locally, not for MoveIt Pro Cloud, due to the need to access your terminal.
To enable the MuJoCo Interactive Viewer:
-
Exit MoveIt Pro using CTRL-C at the command line
-
Navigate through the command line to the
lab_simrobot config folder:cd ~/moveit_pro/moveit_pro_example_ws/src/lab_sim/config/ -
Open
config.yamlusing your favorite editor / IDE. We recommend VS Code. -
Search for the
urdf_paramstag -
Find the line that says
mujoco_viewerand flip the boolean totrue.
hardware:
robot_description:
urdf:
package: "lab_sim"
path: "description/picknik_ur.xacro"
srdf:
package: "lab_sim"
path: "config/moveit/picknik_ur.srdf"
urdf_params:
- mujoco_model: "description/scene.xml"
- mujoco_viewer: true
- Re-launch MoveIt Pro and the MuJoCo Interactive Viewer should launch alongside MoveIt Pro.
Within the viewer, you can move objects around manually with a "hand of god"-like functionality:
- Double-click the object you want to move
- To lift and move: CTRL+Right Mouse
- To drag horizontally: CTRL+SHIFT+Right Mouse
You can reset the simulation within the Interactive Viewer using the Reset button on the bottom left menu under the section "Simulation".
Finally, you can see useful debug information about the collision bodies by switching the "Geom Groups" under the section "Group enable":
- Toggle off Geom 0-2
- Toggle on Geom 3-5
- Under the Rendering tab, enable Convex Hull
Once you've toggled those settings, we can visualize the collision geometry, instead of the visual representation.
Learn more about configuring the simulator at the MuJoCo configuration guide
Looping an Objective Until Failureβ
So far, our Objectives run once and stop. But in many real-world applications β such as picking items off a conveyor belt or sorting parts in a bin β you want the robot to repeat the same task continuously until something goes wrong (e.g., no more objects to pick).
Behavior Trees make this easy with the KeepRunningUntilFailure decorator node. This node repeatedly ticks its child. As long as the child returns SUCCESS, it re-runs it. When the child returns FAILURE, the loop stops.
Let's modify our Stack Blocks Objective to loop:
- Open your
Stack BlocksObjective in edit mode. - Using the blue plus button, add a
KeepRunningUntilFailureBehavior above the mainSequencenode. - Reparent the
Sequencenode so it becomes the child ofKeepRunningUntilFailure.
Your tree structure should now look like:
KeepRunningUntilFailure
βββ Sequence
βββ (your existing Behaviors...)
- Run the Objective. The robot will now continuously detect blocks, pick them up, and place them β looping back to the beginning each time it succeeds.
- The Objective will stop automatically when any step in the Sequence fails (e.g., no more blocks are detected by the AprilTag perception).
MoveIt Pro also provides a Repeat decorator that runs its child a fixed number of times, which is useful when you know exactly how many iterations you need. You can find it using the blue plus button search.
This looping pattern combined with the teleop recovery Fallback from the previous section creates a robust autonomous system: the robot keeps working, and if it ever gets stuck, a human can step in to help before the loop continues.
Recording Training Data with ROS Bagβ
Now that we have a looping Objective that autonomously picks and places blocks, we can use it as an oracle policy β an automated demonstration generator β to collect training data for machine learning models such as Diffusion Policy.
An oracle policy is simply a scripted or planned Objective that performs the task correctly, as opposed to a human teleoperating the robot. The advantage is that you can collect large amounts of consistent training data without manual effort.
Recording a ROS Bagβ
To record the robot's joint states and camera feeds while the Objective runs, open a terminal inside the MoveIt Pro container and run:
ros2 bag record --max-cache-size 32000000000 --snapshot-mode -s mcap \
/robot_description \
/joint_states_synchronized \
/joint_commands_synchronized \
/demonstration_indicator \
/wrist_camera/color_synchronized \
/scene_camera/color_synchronized \
-o ~/rosbag_training_data
This records all the key topics needed for training an end-to-end model:
- Joint states and commands for learning the robot's motion
- Camera feeds (wrist and scene cameras) for visual perception
- Demonstration indicator for marking the start and end of each demonstration
With the ROS bag recording running, start the Move Flasks to Burners Objective. You are now recording training data for your end-to-end model as the robot performs the task autonomously.
While training a model from this data is beyond the scope of this tutorial, you could follow the full Diffusion Policy how-to guide to learn how to train and deploy a neural network policy from your recorded demonstrations.
LLM-Powered Behavior Tree Builderβ
MoveIt Pro includes an experimental LLM-powered Behavior Tree Builder that can help you create Objectives using natural language prompts. This feature leverages large language models to automatically generate Behavior Trees based on your descriptions.
The LLM Behavior Tree Builder is currently in beta. Generated Objectives may require testing and modification for your specific use case.
Opening the AI Assistantβ
-
To use the AI Assistant, you must first open the left Objectives sidebar if it's not already open, then either create a new Objective or open an existing one.
-
Once editing a Behavior Tree, look for the AI Assistant icon on the bottom left with all the other icons.
Setting Up API Keysβ
To use the AI Assistant, you'll need to provide your own API key for one of the supported LLM providers. Anthropic's Claude models have provided the best results internally and is the default model selected.
You can quickly obtain an API key from one of these providers:
- Anthropic Console (recommended)
- OpenAI Platform
- Google AI Studio
-
Click on the settings gear icon to manage your API keys:
-
In the popup window add your API key, which saves automatically.
API keys are stored locally in your browser for security. By entering an API key, you acknowledge that data from this application may be sent to the third-party AI service, and you agree to comply with its privacy policy and terms of use.
Using the LLM Featureβ
Once your API key is configured, you can start creating Objectives with natural language:
-
Open the AI Assistant as explained above.
-
Describe your Objective: Use natural language to describe what you want the robot to do. Example prompt:
Create an Objective that picks up a block on the table -
The LLM will generate a Behavior Tree structure for your request that you must click the green "Accept Changes" button to apply.
-
Test and refine: run the generated Objective and test it in your environment.
- Be specific: Provide clear, detailed descriptions of what you want the robot to accomplish.
- Mention context: Include information about your robot setup, environment, or constraints.
- Start simple: Begin with basic tasks and build complexity incrementally.
- Iterate: Refine your prompts based on the generated results.
Summaryβ
By completing this tutorial, you've gained hands-on experience with powerful tools in MoveIt Pro for perception-driven manipulation. You've learned how to integrate fiducial detection, point cloud processing, and modular Behavior Trees to create reusable, intelligent robotic Objectives. You also explored teleop recovery, looping patterns, recording training data for ML, and custom ONNX model integration.
π Congratulations, we're now ready to move to the next tutorial!