Skip to main content
Version: 9

Tutorial 3 - Perception & Machine Learning

🕒 Duration: 1-2 hours
💪 Level: Intermediate

Tutorial Overview​

This tutorial dives into the advanced perception and machine learning capabilities of MoveIt Pro. You'll learn how to use computer vision tools like AprilTags, point cloud registration, and ML-based segmentation to enable your robot to perceive its environment. You'll also explore how to structure complex task plans using Behavior Trees. Whether you're picking medicine bottles, registering CAD parts, or debugging planning failures, this tutorial will guide you through practical workflows and best practices.

Pre-reqs​

You should have already installed MoveIt Pro. We will assume you have already completed Tutorial 1 and have basic familiarity with the software.

Start MoveIt Pro​

Launch the application using:

moveit_pro run -c lab_sim

The lab_sim robot configuration package contains a table with several medicine bottles, each with an AprilTag on its cap, plus a placement tray. You'll use these bottles throughout the tutorial.

Perception Approaches In MoveIt Pro​

In highly structured environments you might be able to program your robot arm to do things like basic pick and place without any perception. However, for most robotics applications today having cameras and computer vision is crucial.

In our world view of robotics there are roughly 4 main categories of perception:

  • Fiducial Marker Detection (e.g. AprilTags - fiducial markers)
  • Classic computer vision (e.g. OpenCV — Open Source Computer Vision Library)
  • Point Cloud Registration (e.g. ICP — Iterative Closest Point)
  • Machine Learning (e.g. SAM3 — Segment Anything Model 3)

MoveIt Pro no longer ships with examples of classic computer vision, due to the rapidly evolving AI landscape. However we do ship with many examples of the other three approaches to perception. We will introduce each individually and then put it all together in an Objective that fuses ICP with ML to pick up bottles in a loop.

Fiducial Marker Detection with AprilTags​

In this exercise we will build a new Objective that uses AprilTags to detect a medicine bottle by the tag on its cap and pick it up.

info

AprilTags are a type of fiducial marker system commonly used in robotics and computer vision. They consist of black-and-white square patterns that can be easily detected and uniquely identified by a camera. In robotics, AprilTags are often used for precise localization and pose estimation of objects in the environment. AprilTags are particularly useful in applications requiring low-cost, reliable visual tracking without the need for complex sensors.

Initialize the Robot and Scene​

First, create a new Objective. If you're unsure how to do this, please refer to the Tutorial 1.

  1. Name the Objective Pick One Bottle with AprilTag.
  2. Choose or create the category Tutorials.
  3. Set the Description to Use fiducial markers to detect and grasp a bottle.

Once created, remember to delete the AlwaysSuccess Behavior from your new empty Objective.

To make this tutorial quick and easy, we will use a lot of pre-built Subtrees that ship with MoveIt Pro. Add the following four Subtrees to the default Sequence using the blue + button:

  1. Open Gripper
  2. Look at Table
  3. Clear Snapshot
  4. Take Wrist Camera Snapshot

When finished, Run the Pick One Bottle with AprilTag Objective. You should see the robot open its gripper, move to a pose looking down at the table, clear out any previous depth sensor readings, then take a fresh depth sensor reading using its wrist camera (not the scene camera).

Get Object Pose from AprilTag​

The next step for us in the Pick One Bottle with AprilTag Objective is to detect the pose of a bottle by reading the AprilTag on its cap.

  1. Edit the Pick One Bottle with AprilTag Objective that you began above.
  2. Add a Sequence node to help us organize our Behavior Tree as it grows.

Add the following four Subtrees/Behaviors to the newly added Sequence, tweaking the parameters to some of the Behaviors as described below:

  1. Take Wrist Camera Image

  2. Get AprilTag Pose from Image

    1. Set the camera_info to {wrist_camera_info}.

    2. Set the image to {wrist_camera_image}

      Using Parameter Dropdown Selectors

      Use the dropdown selectors in the Behavior input ports to make building your Behavior Tree faster.

  3. TransformPose to align the detection pose with the end effector grasp frame.

    1. Set the input_pose to {detection_pose}.

    2. Set the output_pose to {grasp_pose}.

    3. Set the quaternion_xyzw to 1;0;0;0 to rotate the pose 180 degrees about the X axis so the gripper approaches the cap from above.

      Orientation Converter

      You can instead use the Orientation Converter if you are not as comfortable with Quaternions. Click the Edit icon next to the quaternion_xyzw parameter input and set the Roll to 180.

    4. Set the translation_xyz to 0;0;0 or leave it empty.

  4. VisualizePose

    1. Set the pose to {grasp_pose}
Inline visualization

TransformPose has a built-in visualize_pose port you can set to true to publish a coordinate-frame marker for the output pose directly in the 3D view. This lets you skip the separate VisualizePose Behavior when debugging transforms. You can also set marker_text, marker_size, and marker_lifetime to customize the marker.

Your tree should look similar to this now (zoomed in):

tip

If you want to know more about how each Behavior works, you can see the built-in descriptions in two locations. Either:

  • When you hover over a Behavior in the left sidebar
  • When you click on a Behavior and the parameter sidebar opens on the right. The right sidebar is also useful in that it shows you all the input and output ports, along with their port descriptions.

Now Run your Pick One Bottle with AprilTag Objective, and you should see a 3-axis colored pose marker appear on the cap of the detected bottle:

Your simple AprilTag perception approach is in-place, great job so far!

Pick from Fiducial Marker Detection​

Next, to pick the bottle we detected with the Fiducial Marker Detection:

  • Press the Edit button in your Pick One Bottle with AprilTag Objective to begin editing again.
  • Add the Pick from Pose Subtree to the bottom of the root Sequence (not the child Sequence).
  • Run the Objective again

You can't really see if the grasp was successful in the 3D Visualization pane since we are not leveraging attached collision objects to add grasp information to the planning scene in this example, but if you look at the /wrist_camera/color view pane you should see that the bottle is between the robot's two fingers:

Can the camera feeds be higher resolution?

Yes — the defaults are tuned for performance on machines without a lot of available CPU. See Adjusting the Simulated Camera Resolution to bump it up.

Create a Subtree​

At this point our Behavior Tree is becoming more complex, so let's convert the previous Sequence into a Subtree.

tip

As your application gets more complex, we recommend you use sequences and Subtrees to manage the complexity with nicely labeled abstractions.

info

Subtrees were introduced in Tutorial 1. For a refresher, see the About Subtrees section.

Edit the Objective, then:

  1. Click on the child Sequence (not the parent Sequence)

  2. Click on the Create Subtree icon that appears on top of the node

  3. Name the Subtree Get Bottle Pose from AprilTag

  4. Set the category again to Tutorials

  5. Set the description to Get a detected pose from the AprilTag

  6. Keep the Subtree-only Objective checkbox checked

  7. Click Create

info

The Subtree-only Objective checkbox means that this Objective can only be run as a Subtree within another Objective, not as a stand-alone Objective.

After converting to a Subtree, the Pick One Bottle with AprilTag Objective should look like:

Run the Objective to see if it works. Did we miss anything?

You should see an error message that looks like this:

This is expected because we forgot to setup port remapping for our new Subtree. A great segue to our next lession.

Port Remapping​

info

Port remapping in Behavior Trees allows you to reuse generic nodes by dynamically assigning their input/output ports to different blackboard entries, depending on the context in which they're used. This makes Behavior Trees more flexible and modular, enabling the same node or Subtree to be used in different contexts across various parts of the tree without changing its internal logic.

  1. Go into edit mode of the Get Bottle Pose from AprilTag Subtree

    1. We recommend you do this by first editing the Pick One Bottle with AprilTag Objective then clicking the pencil icon on the Get Bottle Pose from AprilTag Subtree. However you can also search for the Subtree directly in the left Objectives sidebar, you just won't be able to switch between the parent and child Behavior Tree as easily.
  2. Choose the root node (also called Get Bottle Pose from AprilTag).

  3. In the popup sidebar, click the + button to add an in/out port, to allow the sharing of the AprilTag pose that was detected.

  4. Set the Name of the port to grasp_pose

  5. Set the Default Value of the port to {grasp_pose}

  6. Optionally, set the Type to geometry_msgs::msg::PoseStamped_<std::allocator<void> >.

    Why set the Type?

    Setting the Type makes {grasp_pose} show up in other geometry_msgs::PoseStamped input port dropdown menus, which is more user friendly. To get the correct type string, click into a Behavior in the Editor that outputs {grasp_pose} and click the blue copy button next to the port type above the port value.

  7. Optionally add a description, e.g. Target pose of the graspable object.

It should look something like this:

Click out of the sidebar, somewhere in the Behavior Tree editor area, to close the sidebar.

info

The Port Name is the blackboard variable name that will be passed to and from the inner contents of this Subtree. For example, here we set it to grasp_pose (no brackets) so that it outputs the blackboard variable {grasp_pose} from its interior TransformPose Behavior.

The Default Value is the text that will initially populate this port's value whenever the Subtree is added to an Objective. You can always change this default after adding the Subtree. For example, the port's value does not need to match the port name, and can be changed in the parent Objective.
The following screenshot demonstrates that the port value and port name can be different, and this is still valid.

As long as the port name matches the desired blackboard variable name created by a Behavior within the Subtree, the output variable will be mapped correctly.

Now go back to editing the parent Pick One Bottle with AprilTag Objective by using the back button at the top of the MoveIt Pro window, in the center.

tip

Another way to go back to editing the parent Objective is to use your browser back button.

The Pick One Bottle with AprilTag Objective should now look like this:

info

Notice the In/Out port icon next to grasp_pose. All ports that are remapped into a Subtree are always both in and out ports, because they are shared memory pointers between the parent Objective's and the Subtree's blackboard.

Place the Object​

Finally, let's put the finishing touches on our Pick One Bottle with AprilTag Objective. We'll move the bottle to the tray in two stages — first to a hover pose above the tray, then down to the tray surface — using two Move to Waypoint Subtrees. Both consult the planning scene during planning, so any obstacles you've added (we'll add one in the next section) are respected automatically.

Before adding the Behaviors, you need two saved waypoints in the UI:

  • Above Tray Place Simple — the robot hovering above the tray, gripper pointing down.
  • Tray Place Simple — the robot at the tray surface, gripper pointing down, ready to release.

You will need to use the Pose then Waypoints features in the Teleoperation to move the arm above the tray for two waypoints, which you should do now before proceeding.

Now let's go back to editing the Pick One Bottle with AprilTag Objective then add the following Subtrees to the root Sequence after Pick from Pose:

  1. Move to Waypoint — go to the hover pose above the tray.
    • Set waypoint_name to Above Tray Place Simple.
  2. Move to Waypoint — descend to the tray surface.
    • Set waypoint_name to Tray Place Simple.
  3. Add Open Gripper.
  4. Duplicate the Above Tray Place Simple Move to Waypoint and drag it to the bottom of the Sequence — this is the post-place retract motion that lifts the gripper back up off the tray.
Move to Waypoint respects the planning scene

Move to Waypoint is built on our planners, which check the planning scene for collisions during planning. If you add a keep-out zone or any other obstacle that blocks the path, the Behavior fails up front instead of needing a separate ValidateTrajectory step.

Click to see the completed Pick One Bottle with AprilTag Objective

Run the Objective and you should see the robot pick up a bottle, move to the hover pose above the tray, descend to the tray surface, and open the gripper to drop the bottle in.

tip

You can see the bottle drop into the tray live in the simulated camera feeds, for example under /scene_camera/color and /wrist_camera/color.

For a more comprehensive reference, see the example Pick All Bottles with AprilTags Objective.

Adding Teleop Recovery to your Objective​

In real-world environments, automatic motion planning doesn't always succeed — an unexpected obstacle, imprecise perception, or a cluttered workspace can cause a place motion to fail mid-flight. Rather than having the Objective simply stop, MoveIt Pro lets you add human-in-the-loop recovery so an operator can step in and manually guide the robot through the difficult part.

Force a Place Failure with a Keep-Out Zone​

To force the place motion to fail, we'll add a keep-out zone between the bottles and the tray — same approach you used in Tutorial 1.

  1. Run Reset Simulation from the settings menu (as you learned in Tutorial 1) to clear any leftover state from previous runs.

  2. In the 3D Visualization pane, click the Keep-Out Zones icon at the top-left to open the Planning Scene Editor sidebar.

  3. Click + Keep-Out Zone to add a new zone. A yellow planar surface (e.g. a wall) appears in the workspace.

  4. Adjust the pitch to 90 degrees so the surface stands up as a wall.

  5. Use the x, y, z position inputs to move the keep-out zone to be hovering over the red tray, in the path between the robot and its desired place location. Values around x = 1, y = 0.6, z = 0.70 work well in lab_sim.

Run your Pick One Bottle with AprilTag Objective.

  • The pick succeeds, and the move to Above Tray Place Simple succeeds (the hover pose sits above the slab).
  • The second Move to Waypoint (Tray Place Simple) fails.
Why this fails

The motion planner reports a PlanToJointGoal Error because the goal joint positions cause a collision between the gripper and the keep-out zone on the way down to the tray. You can also see the potential collisions as red spheres in the 3D Visualizer.

Without recovery, the Objective stops at this failure — exactly the situation we'll fix next.

The Behavior Tree view also highlights which Behaviors are failing through the use of red lines and borders.

Reuse it later

Once you're happy with the zone's position, click Export Scene in the sidebar to save it to a file.

For example, name it Keepout Zone Over Red Tray.

Re-running this exercise later just needs a single Load Scene click.

Locking keep-out zones

To avoid inadvertently editing a keep-out zone in the 3D Visualizer, click the lock icon to prevent mis-clicks.

Wrap the Place in a Fallback​

Next, the place motion is two Move to Waypoint Subtrees. We'll wrap both of them in a Sequence, drop that Sequence into a Fallback, and add Request Teleoperation as the recovery sibling.

Fallback Behaviors

A Fallback node (also known as a Selector) tries each of its children in order. If the first child succeeds, it stops. If the first child fails, it moves on to the next child as a recovery strategy.

Here's an overview of the structure we're building:

TopLevelSequence
...pick steps...
Pick from Pose
Fallback
Sequence (autonomous place attempt)
Move to Waypoint (Above Tray Place Simple)
Move to Waypoint (Tray Place Simple)
Request Teleoperation
Open Gripper

To build it:

  1. With the keep-out zone still in place, edit the Objective again.
  2. On the top-level Sequence, click its blue + button to add a Fallback Behavior as a child, then drag the Fallback to sit just after Pick from Pose (and just before the first Move to Waypoint of the place sequence).
  1. Add a Sequence node as the first child of the Fallback.
  2. Move both of the below Move to Waypoint Subtrees into that new child Sequence — drag them so they hang off the inner Sequence instead of the top-level Sequence.
  1. Now add the Request Teleoperation Behavior as the second child of the Fallback. This is the recovery Behavior that will activate if the autonomous place sequence fails.
Keep Open Gripper outside the Fallback

Make sure Open Gripper is still the last child of the top-level Sequence (not inside the Fallback). It should run after the Fallback succeeds, regardless of which branch ran.

  1. Click on the Request Teleoperation Behavior and modify its parameters:
    • Set enable_user_interaction to true
    • Set user_interaction_prompt to Place the bottle manually

Now when the place motion fails, instead of the Objective stopping, a Teleoperation menu will appear allowing you to manually guide the robot to a valid place position with the bottle still in its grasp:

Once you've teleoped the robot to the desired place position, click the Continue button to resume the Objective. The final Open Gripper will run automatically and release the bottle into the tray.

tip

This pattern of wrapping an action in a Fallback with Request Teleoperation as the recovery child is a powerful design pattern you can reuse throughout your Objectives. It gives autonomous execution a chance to succeed first, but seamlessly falls back to human guidance when needed.

Reverting the obstacle

The keep-out zone we added is purely a teaching aid — it lives only in the planning scene, not in the simulator. To restore the autonomous flow once you're done experimenting, open the Planning Scene Editor sidebar, select the zone, and press Delete (or use the delete button in the panel). The place motion will succeed on its own again.

Adding Comments to your Behavior Tree​

As your Behavior Trees grow more complex, it becomes important to document what different sections do. MoveIt Pro provides a Comment Behavior that lets you add notes directly in the tree — similar to code comments.

Using the blue plus button, add a Comment Behavior above the Fallback node we just created. Click on it and set the text parameter to:

Teleop recovery: If the place motion fails due to obstacles or planning errors,
fall back to manual teleoperation so the operator can place the bottle.

The Comment Behavior has no effect on execution — it always returns SUCCESS immediately. It exists purely for documentation purposes, making your Objectives easier to understand for yourself and your teammates.

tip

Get in the habit of adding Comment nodes to explain non-obvious logic in your Behavior Trees, especially around Fallback and Parallel patterns where the intent may not be immediately clear.

Now let's learn how to implement some more advanced perception to identify objects.

Point Cloud Registration with Iterative Closest Point (ICP)​

There are many other perception capabilities within MoveIt Pro beyond AprilTags, and in this section, we’ll learn about point cloud registration.

info

Point cloud registration is the process of localizing an object within a point cloud, given a CAD mesh file as an input. This is used in robotics for locating a part within a workspace, or as an input to manipulation flows like polishing and grinding parts.

Typically, point cloud registration starts with an initial guess pose, which might be from an ML perception model, or based on where an object should be by the design of the robot workspace. This initial guess pose should be close to the object being registered, but does not have to be exact. The registration process then will find the exact pose using one of several algorithms, such as Iterative Closest Point (ICP).

info

Iterative Closest Point (ICP) is a foundational algorithm in robotics used to align 3D point clouds by estimating the best-fit rigid transformation between two sets of data. In robotic applications, ICP plays a critical role in tasks like localization, mapping, object tracking, and sensor fusion by helping a robot match its current sensor data to a known map or model. The algorithm works by iteratively refining the alignment based on minimizing the distance between corresponding points. While powerful, ICP requires a reasonable initial guess to avoid converging to an incorrect local minimum and is most effective when there is significant overlap between point clouds.

In MoveIt Pro, the RegisterPointClouds Behavior provides this capability:

See ICP in Action with Register CAD Part​

Before we build our own, let's see ICP working end-to-end in an existing example of detecting a microscope.

  1. First, reset the simulation using the Reset Simulation button in the MoveIt Pro settings menu (as you learned in Tutorial 1) so the scene is in its starting state.
  2. Open the Register CAD Part Objective to inspect its architecture.
Click to see the full Objective

The overall flow is:

  1. Move the wrist camera to look at the area of interest and capture a point cloud.
  2. Create an initial guess pose.
  3. Load the microscope STL as a point cloud and visualize it in red at the initial guess.
  4. Register the microscope STL against the wrist camera point cloud using ICP.
  5. Load the microscope STL again and visualize it in green at the registered pose.

Run the Objective. You should see two point clouds appear: first a red one above the table (the initial guess), then a green one snapped onto the microscope on the table.

Try editing the Objective and shifting the CreateStampedPose x/y/z by 10 cm or so. A small change still converges; a guess that's too far off lands on the wrong feature or fails outright.

ICP needs a seed

ICP is sensitive to the initial guess. In production Objectives, we typically use a fast detector — an AprilTag, an ML segmentation centroid, or a known workspace pose — to seed ICP, then let ICP nail down the final pose. We're about to use exactly that pattern.

Build a Reusable ICP Subtree for Bottles​

Now let's package the ICP fit into a Subtree we can drop into any pick Objective. The Subtree will take a masked point cloud (just the bottle, no background) plus an initial guess pose, run ICP against the bottle CAD model, and output a grasp pose ready for Pick from Pose.

Create a new Objective called Fit Bottle to Cloud via ICP:

  1. Choose the category Tutorials.
  2. Check the Subtree-only Objective checkbox.
  3. Click Create, then delete the default AlwaysSuccess Behavior.

First, wire up the Subtree's ports so the parent Objective can pass data in and out:

  1. Click the root Fit Bottle to Cloud via ICP node.
  2. Add three in/out ports:
    • masked_cloud_world (default {masked_cloud_world}) — the segmented bottle cloud you'll feed in.
    • initial_guess_pose (default {initial_guess_pose}) — the seed for ICP.
    • grasp_pose (default {grasp_pose}) — the output the parent Objective will pick from.

Now add the Behaviors to the root Sequence. We'll build it in four small groups so it's easy to follow what each part is doing.

1. Show the initial guess​

Load the bottle STL as a red point cloud, place it at the seed pose, and publish it. This is what ICP will refine — visualizing it first makes a bad seed obvious.

  1. LoadPointCloudFromFile — load the bottle STL as a red model cloud.
    • Set package_name to picknik_accessories.
    • Set file_path to mujoco_assets/assets/bottle.stl.
    • Set color to 255;0;0.
    • Set num_sampled_points to 10000.
    • Set frame_id to world.
    • Set point_cloud to {model_cloud}.
  2. TransformPointCloud — place the model cloud at the initial guess.
    • Set input_cloud to {model_cloud}.
    • Set transform_pose to {initial_guess_pose}.
    • Set output_cloud to {model_cloud}.
  3. SendPointCloudToUI — visualize the red guess.
    • Set point_cloud to {model_cloud}.

2. Run ICP and compose the result​

Register the model cloud against the segmented target, then compose ICP's delta on top of the seed to get the bottle's full pose in world coordinates.

  1. RegisterPointClouds — run ICP between the model and the segmented target.
    • Set base_point_cloud to {model_cloud}.
    • Set target_point_cloud to {masked_cloud_world}.
    • Set max_correspondence_distance to 0.1.
    • Set max_iterations to 100.
    • Set target_pose_in_base_frame to {icp_target_pose}.
  2. TransformPoseWithPose — compose the ICP delta on top of the initial guess.
    • Set input_pose to {initial_guess_pose}.
    • Set transform_pose to {icp_target_pose}.
    • Set output_pose to {bottle_in_world}.

3. Show the fit​

Load the same STL again as a green cloud, place it at the fitted pose, and publish it. Eyeballing the green cloud against the live segmented points is the fastest way to confirm ICP converged.

  1. LoadPointCloudFromFile — load the STL again, this time as a green aligned cloud.
    • Same package_name, file_path, num_sampled_points, frame_id as the red load above.
    • Set color to 0;255;0.
    • Set point_cloud to {aligned_cloud}.
  2. TransformPointCloud — place the green cloud at the fitted bottle pose.
    • Set input_cloud to {aligned_cloud}.
    • Set transform_pose to {bottle_in_world}.
    • Set output_cloud to {aligned_cloud}.
  3. SendPointCloudToUI — visualize the green fit.
    • Set point_cloud to {aligned_cloud}.

4. Build the grasp pose​

Turn the bottle's world-frame pose into a gripper grasp pose by shifting the centroid up to the shoulder, then flipping the orientation so the gripper approaches from above.

  1. TransformPose — shift the fit pose up from the model's origin (the bottle's centroid) to the shoulder of the bottle, where the gripper actually closes.
    • Set input_pose to {bottle_in_world}.
    • Set translation_xyz to 0;0;0.057.
    • Set output_pose to {grasp_point_pose}.
  2. TransformPose — flip 180° about the local Y axis so the gripper approach points world-down.
    • Set input_pose to {grasp_point_pose}.
    • Set quaternion_xyzw to 0;1;0;0.
    • Set output_pose to {grasp_pose}.
Grasp poses are offsets from a reference point

ICP gives you the bottle's pose at the CAD model's origin — which for bottle.stl sits at the geometric centroid. That centroid almost never coincides with where you actually want the gripper to close, so a fixed TransformPose offset takes you from "where the model is" to "where the gripper grasps." This is the common pattern: detection gives you a reference point on the object (centroid, AprilTag, mask centroid, …); a hand-tuned offset defines the grasp on top of that. Whenever you swap the perception layer, only the reference-point step changes — the offset stays the same.

Click to see the completed Fit Bottle to Cloud via ICP Subtree

You now have a self-contained ICP perception block. Next we'll feed it from an ML segmentation step.

Point Cloud Segmentation using ML​

info

Point Cloud Segmentation with Machine Learning (ML) refers to the process of automatically dividing a 3D point cloud into meaningful regions or object parts based on learned patterns. Instead of relying solely on hand-tuned geometric rules (like plane fitting or clustering), ML-based segmentation trains models to recognize complex structures and variations directly from data. These models can classify points individually (semantic segmentation) or group them into distinct object instances (instance segmentation).

MoveIt Pro ships examples that use two generations of the Segment Anything Model:

  • SAM2 — prompted with clicked points on an image. Best when a human is in the loop and can pick the object of interest interactively.
  • SAM3 — prompted with a text prompt and/or an image exemplar. Best for autonomous flows where you know in advance what kind of object to look for.
SAM3 vs SAM2
AspectSAM2SAM2 AutomaskingSAM3CLIPSeg
Prompt typesPoints + BoxesNone (grid-based)Text + Boxes + ExemplarsText
Text supportNoNoYesYes
Models3 (encoder, prompt encoder, decoder)3 (same as SAM2)4 (vision, text, geometry, decoder)2 (CLIP encoder, CLIPSeg decoder)
Input resolution1024x10241024x10241008x1008352x352
Best forInteractive point-click segmentationPromptless scene discoveryFlexible multimodal detectionLegacy text segmentation

For a deeper treatment of when to use each model, see the ML Exemplar Segmentation guide.

In this section you'll combine SAM3 (text prompt) with the ICP Subtree you just built to create a fully autonomous bottle picker. After that we'll run two existing examples that show SAM2 clicked-point segmentation and SAM3 image-exemplar segmentation.

SAM2 with Clicked Points​

The ML Segment Point Cloud from Clicked Point Objective uses the GetMasks2DFromPointQuery Behavior to call SAM2. Instead of a text prompt, the model takes user-clicked points on the wrist camera image. This is the right tool when a human is in the loop and can pick objects interactively.

While clicking points by hand is convenient for demonstration, point prompts are most useful when another system generates the points automatically. Common upstream sources include the centroid of a region of interest from classical computer vision (color thresholding, blob detection), the center of a bounding box returned by an object detector, or pixel coordinates extracted from a vision-language model like Gemini. Anywhere upstream perception can produce a 2D pixel guess for "the thing I want to pick," SAM2 can refine that guess into a precise segmentation mask — no text prompt required.

Run this yourself:

  1. Run the Look at Table Objective first to position the wrist camera over the work area.
  2. Run the ML Segment Point Cloud from Clicked Point Objective.
  3. Ensure the view port /wrist_camera/color is visible.
  4. When prompted, click three points on the pill bottle in the lower right.

After your three clicks, the prompt should look like this, with green markers on the bottle:

The Objective then converts the 2D mask to 3D, applies it to the point cloud, and visualizes only the clicked object.

What are Masks?

A 2D binary image where each pixel indicates whether it belongs to the segmented object (foreground) or the background, typically generated by models like SAM for isolating objects.

Why do glass objects look flat in the point cloud?

In this environment we model glass beakers and flasks the way real laser depth sensors perceive them — the simulated camera sees them as transparent glass, but the simulated depth sensor doesn't return points for their surfaces. If you click on a beaker instead of a bottle, the resulting point cloud will look flat for the same reason.

Build Pick One Bottle with SAM3​

Create a new Objective called Pick One Bottle with SAM3:

  1. Choose the category Tutorials.
  2. Click Create, then delete the default AlwaysSuccess Behavior.

Add the following Behaviors and Subtrees to the root Sequence:

  1. Open Gripper

  2. Move to Waypoint — set waypoint_name to Look at Bottles Left.

  3. Segment Bottle Subtree — segments a bottle using SAM3. Wire the ports as follows:

    • camera_image_topic: /wrist_camera/color
    • camera_info_topic: /wrist_camera/camera_info
    • camera_points_topic: /wrist_camera/points
    • object_prompt: a small white or grey rectangular pill bottle with a purple blue top cap
    • masked_cloud_world: {masked_cloud_world}
    Text-prompt SAM3

    Segment Bottle Subtree calls GetMasks2DFromExemplar under the hood. By feeding it just a text prompt (no exemplar image), SAM3 runs in pure text-prompt mode — no human input, no reference image, just a sentence describing what to find.

  4. GetCentroidFromPointCloud — derive an ICP initial guess from the segmented cloud's centroid.

    • Set point_cloud to {masked_cloud_world}.
    • Set output_pose to {initial_guess_pose}.
  5. Fit Bottle to Cloud via ICP — the Subtree you built in the previous section.

    • Set masked_cloud_world to {masked_cloud_world}.
    • Set initial_guess_pose to {initial_guess_pose}.
    • Set grasp_pose to {grasp_pose}.
  6. VisualizePose — publish the grasp frame as a labeled marker in 3D Visualization so you can see where the gripper is going to go before it moves.

    • Set pose to {grasp_pose}.
    • Set marker_name to grasp_pose.
    • Set marker_text to grasp_pose.
    • Set marker_size to 0.1.
  7. Pick from Pose — uses the {grasp_pose} produced above by default.

Now build the Cartesian up → over → down drop sequence. A naive Move to Waypoint here can let the joint-space planner spin the wrist mid-flight, swinging the bottle. Going up, over, and down along an explicit Cartesian path keeps the gripper pointing down for the entire move.

  1. CreateStampedPose — origin in the gripper frame, used to read the post-pick pose.
    • Set position_xyz to 0;0;0.
    • Set orientation_xyzw to 0;0;0;1.
    • Set reference_frame to grasp_link.
    • Set stamped_pose to {gripper_local_pose}.
  2. TransformPoseFrame — resolve the gripper origin into the world frame.
    • Set input_pose to {gripper_local_pose}.
    • Set target_frame_id to world.
    • Set output_pose to {post_pick_world_pose}.
  3. TransformPose — Up: lift 10 cm in the gripper's local frame (= up in world, because the gripper is pointing down post-pick).
    • Set input_pose to {post_pick_world_pose}.
    • Set translation_xyz to 0;0;-0.1.
    • Set quaternion_xyzw to 0;0;0;1.
    • Set output_pose to {up_pose}.
  4. CreateStampedPose — Over: a hover pose above the placement tray, gripper-down.
    • Set position_xyz to 0.95;0.5568;0.85.
    • Set orientation_xyzw to -0.0436;0.9990;0;0.
    • Set reference_frame to world.
    • Set stamped_pose to {over_pose}.
  5. CreateStampedPose — Down: the actual drop pose just above the tray surface.
    • Set position_xyz to 0.95;0.5568;0.605.
    • Set orientation_xyzw to -0.0436;0.9990;0;0.
    • Set reference_frame to world.
    • Set stamped_pose to {down_pose}.
  6. CreateVector — initialize the Cartesian path container.
    • Set vector to {drop_path}.
  7. AddToVector — push the lift waypoint.
    • Set element to {up_pose}.
    • Set input_vector and output_vector to {drop_path}.
  8. AddToVector — push the hover waypoint. Same ports, set element to {over_pose}.
  9. AddToVector — push the drop waypoint. Same ports, set element to {down_pose}.
  10. PlanCartesianPath — plan the up → over → down trajectory.
    • Set path to {drop_path}.
    • Set position_only to true (orientation tracked as a soft nullspace task — prevents wrist swing).
    • Set planning_group_name to manipulator and tip_links to grasp_link.
    • Set joint_trajectory_msg to {drop_traj}.
  11. ExecuteTrajectory — run the planned trajectory.
    • Set joint_trajectory_msg to {drop_traj}.
    • Set controller_action_name to /joint_trajectory_admittance_controller/follow_joint_trajectory.
    • Set execution_pipeline to jtac.
  12. Open Gripper — release the bottle into the tray.
Why position-only?

Cartesian paths planned with full 6-DOF tracking can fail or replan when the planner thinks it can't hit every orientation along the way, which is what produces the wrist swing. Setting position_only="true" relaxes orientation to a soft nullspace task — the planner keeps the gripper close to the requested orientation but won't swing it 90° to chase a stricter constraint. The Pick All Pill Bottles Objective uses the same trick.

Click to see the completed Pick One Bottle with SAM3 Objective

Run the Objective. You should see:

  1. The robot move to the look pose and capture an image.

  2. SAM3 segment a bottle (visible as a colored mask on /masks_visualization).

  3. The red model cloud appear at the mask centroid (initial guess).

  4. ICP snap the green cloud onto the bottle, and the labeled grasp_pose axis marker appear at the shoulder.

  5. The robot grasp the bottle and lift straight up. The wrist camera view shows the bottle held between the gripper fingers.

  6. The robot traverse over to the tray, descend, and open the gripper to drop the bottle in.

Why segmentation + ICP?

Segmentation alone gives you a bag of points, but the bottle's exact pose — and especially its grasp axis — needs the geometric constraint of the CAD model. ICP supplies that. The two complement each other: ML handles "where in the image" robustly, ICP handles "what 6-DOF pose, precisely."

SAM3 with an Image Exemplar​

A pure text prompt isn't always specific enough — sometimes you want SAM3 to find "things shaped like this" rather than "things matching this description." For that, SAM3 accepts an image exemplar: a small reference image of the target object, optionally with a bounding box around the relevant region.

The example ML Find Bottles on Table from Image Exemplar Objective:

  • Loads a small reference image of a square pill bottle from disk.
  • Wraps it in a bounding box with CreateBoundingBoxFromOffset.
  • Moves the wrist camera to Look at Bottles Left and grabs a fresh image.
  • Calls GetMasks2DFromExemplar against SAM3 with the exemplar plus the bounding box.
  • Publishes the resulting masks to /masks_visualization for inspection.

Run this yourself:

  1. Run the ML Find Bottles on Table from Image Exemplar Objective.
  2. Switch the primary view to /masks_visualization (the Objective also does this automatically).
  3. You should see the square pill bottles highlighted with confidence scores. The exemplar reference image with its bounding box appears in /bboxes_visualization.
Combining text and image

The Objective in the next section feeds SAM3 both a text prompt and an image exemplar at the same time. The two prompts reinforce each other and produce more reliable masks than either alone.

Putting It All Together: Pick All Pill Bottles​

The Pick All Pill Bottles Objective is the headline example of perception-driven manipulation in lab_sim. It scales the same SAM3 + ICP pattern you just built up into a full pick-and-place loop:

  • SAM3 segmentation with both an image exemplar and a text prompt for higher-quality masks.
  • ICP to refine the segmented point cloud against the bottle CAD model and produce a precise grasp pose.
  • A loop that iterates over a list of placement poses loaded from pill_bottle_place_poses.yaml, picks one bottle per iteration, and places it on the tray at the next location.

Open and Run the Pick All Pill Bottles Objective. The robot will:

  1. Load the placement grid and visualize a marker at every place pose.
  2. For each place pose: look at the bottles, segment one with SAM3, register it with ICP, pick it, and place it at the current target.

The 3D Visualization shows the labeled place_pose_* markers and the red collision-volume slab the Objective adds to the planning scene to keep the robot off the table; /masks_visualization shows SAM3's text-and-exemplar segmentation in action.

GPU Recommended

This Objective is in the Application - ML (GPU Recommended) subcategory. It will run on CPU but is dramatically faster with a GPU. See the ML segmentation guide for hardware recommendations.

Suggested Hands-On Exercises​

  • Open Pick All Pill Bottles and find the Get Bottle Grasp via ICP Subtree. Compare it to the Fit Bottle to Cloud via ICP Subtree you built — they share the same core ICP fit but the production version cleans up the masked cloud before fitting.
  • Modify pill_bottle_place_poses.yaml to add or remove a placement pose and re-run the Objective.
  • Swap your Fit Bottle to Cloud via ICP Subtree into Pick All Pill Bottles in place of Fit Bottle Model Subtree and see what differences show up.

Resetting Your Robot Simulation​

As you learned in Tutorial 1, you can reset the simulation from the MoveIt Pro settings menu.

For more advanced use cases, there are additional ways to reset the simulation:

  1. Programmatically: Use the ResetMujocoKeyframe Behavior in your own Objective to reset the scene to the "default" keyframe.
  2. Third party UI: Run the MuJoCo Interactive Viewer to reset the simulation (details in next section).
  3. Command line: Restart MoveIt Pro to completely reset the simulation scene and robot state.
Advanced

To enable MuJoCo control in your own robot configuration package, you must do the following:

  • Add MujocoBehaviorsLoader to the core behavior_loader_plugins
  • Add mujoco_objectives to the objective_library_paths

See the lab_sim config.yaml for an example.

Interactive Manipulation of Sim​

You can "reach in" and manually manipulate objects in the simulation using the MuJoCo Interactive Viewer. This is a graphical tool for visualizing, debugging, and interacting with MuJoCo physics simulations in real time.

Performance Note

Running the MuJoCo Interactive Viewer alongside MoveIt Pro can impact system performance, and may not be feasible for lower-powered systems.

Local installations only

The MuJoCo Interactive Viewer is only supported when MoveIt Pro is installed locally, not for MoveIt Pro Cloud, due to the need to access your terminal.

To enable the MuJoCo Interactive Viewer:

  1. Exit MoveIt Pro using CTRL-C at the command line

  2. Navigate through the command line to the lab_sim robot configuration package folder:

    cd ~/moveit_pro/moveit_pro_example_ws/src/lab_sim/config/
  3. Open config.yaml using your favorite editor / IDE. We recommend VS Code.

  4. Search for the urdf_params tag

  5. Find the line that says mujoco_viewer and flip the boolean to true.

    hardware:
    robot_description:
    urdf:
    package: "lab_sim"
    path: "description/picknik_ur.xacro"
    srdf:
    package: "lab_sim"
    path: "config/moveit/picknik_ur.srdf"
    urdf_params:
    - mujoco_model: "description/scene.xml"
    - mujoco_viewer: true
  6. Re-launch MoveIt Pro and the MuJoCo Interactive Viewer should launch alongside MoveIt Pro.

Within the viewer, you can move objects around manually with a "hand of god"-like functionality:

  1. Double-click the object you want to move
  2. To lift and move: CTRL+Right Mouse
  3. To drag horizontally: CTRL+SHIFT+Right Mouse

You can reset the simulation within the Interactive Viewer using the Reset button on the bottom left menu under the section "Simulation".

Finally, you can see useful debug information about the collision bodies by switching the "Geom Groups" under the section "Group enable":

  • Toggle off Geom 0-2
  • Toggle on Geom 3-5
  • Under the Rendering tab, enable Convex Hull

Once you've toggled those settings, we can visualize the collision geometry, instead of the visual representation.

tip

Learn more about configuring the simulator at the MuJoCo configuration guide

Recording Training Data with ROS Bag​

The Pick All Pill Bottles Objective autonomously picks and places bottles, which makes it a useful oracle policy — an automated demonstration generator — for collecting training data for machine learning models such as Diffusion Policy.

An oracle policy is simply a scripted or planned Objective that performs the task correctly, as opposed to a human teleoperating the robot. The advantage is that you can collect large amounts of consistent training data without manual effort.

Recording a ROS Bag​

To record the robot's joint states and camera feeds while the Objective runs, open a terminal inside the MoveIt Pro container and run:

ros2 bag record --max-cache-size 32000000000 --snapshot-mode -s mcap \
/robot_description \
/joint_states_synchronized \
/joint_commands_synchronized \
/demonstration_indicator \
/wrist_camera/color_synchronized \
/scene_camera/color_synchronized \
-o ~/rosbag_training_data

This records all the key topics needed for training an end-to-end model:

  • Joint states and commands for learning the robot's motion
  • Camera feeds (wrist and scene cameras) for visual perception
  • Demonstration indicator for marking the start and end of each demonstration

With the ROS bag recording running, start the Pick All Pill Bottles Objective. You are now recording training data for your end-to-end model as the robot performs the task autonomously.

While training a model from this data is beyond the scope of this tutorial, you could follow the full Diffusion Policy how-to guide to learn how to train and deploy a neural network policy from your recorded demonstrations.

LLM-Powered Behavior Tree Builder​

MoveIt Pro includes an experimental LLM-powered Behavior Tree Builder that can help you create Objectives using natural language prompts. This feature leverages large language models to automatically generate Behavior Trees based on your descriptions.

Beta Feature

The LLM Behavior Tree Builder is currently in beta. Generated Objectives may require testing and modification for your specific use case.

Opening the AI Assistant​

  1. To use the AI Assistant, you must first open the left Objectives sidebar if it's not already open, then either create a new Objective or open an existing one.

  2. Once editing a Behavior Tree, look for the AI Assistant icon on the bottom left with all the other icons.

Setting Up API Keys​

To use the AI Assistant, you'll need to provide your own API key for one of the supported LLM providers. Anthropic's Claude models have provided the best results internally and is the default model selected.

You can quickly obtain an API key from one of these providers:

  1. Click on the settings gear icon to manage your API keys:

  2. In the popup window add your API key, which saves automatically.

Legal Notice

API keys are stored locally in your browser for security. By entering an API key, you acknowledge that data from this application may be sent to the third-party AI service, and you agree to comply with its privacy policy and terms of use.

Using the LLM Feature​

Once your API key is configured, you can start creating Objectives with natural language:

  1. Open the AI Assistant as explained above.

  2. Describe your Objective: Use natural language to describe what you want the robot to do. Example prompt:

    Create an Objective that picks up a bottle on the table
  3. The LLM will generate a Behavior Tree structure for your request that you must click the green "Accept Changes" button to apply.

  4. Test and refine: run the generated Objective and test it in your environment.

Tips for Better Results
  • Be specific: Provide clear, detailed descriptions of what you want the robot to accomplish.
  • Mention context: Include information about your robot setup, environment, or constraints.
  • Start simple: Begin with basic tasks and build complexity incrementally.
  • Iterate: Refine your prompts based on the generated results.

Summary​

By completing this tutorial, you've gained hands-on experience with the perception tools in MoveIt Pro that drive real manipulation work: AprilTag fiducial detection, ICP-based point cloud registration, and SAM2/SAM3 ML segmentation — and how they combine in the Pick All Pill Bottles Objective. You also explored teleop recovery and recorded training data for ML.

For grasping novel or deformable objects without a CAD model or fiducial, MoveIt Pro also ships Learning to Grasp (L2G) — an ML model that proposes candidate grasp poses directly from an object point cloud, exposed as the GetGraspPoseFromPointCloud Behavior. See the ML Grasping guide for a full walkthrough.

🎉 Congratulations, we're now ready to move to the next tutorial!