How to Train an End-to-End Model
This guide walks you through collecting ML training data and then using a Diffusion Policy for end-to-end ML control.
By following this guide, you'll learn how to:
- Collect demonstration data using teleoperation (or an ML oracle)
- Train and export a diffusion-based policy model
- Integrate the policy into a MoveIt Pro Objective for deployment
This is a feature exclusive to the MoveIt Pro Ultimate tier.
Diffusion policy overview
Diffusion policy is a cutting-edge method for learning robot control policies using diffusion models, offering strong empirical performance and solid theoretical foundations. It outperforms traditional behavior cloning approaches, especially in tasks requiring long-horizon planning or complex sensorimotor coordination. The approach benefits from a relatively simple training process and stable convergence, avoiding the complexities of adversarial or reinforcement learning. Its compatibility with standard datasets and frameworks, along with several open-source implementations, has contributed to its rapid adoption in the robotics community. Thanks to its reliability, efficiency, and generalization ability, diffusion policy is increasingly used as a default baseline in state-of-the-art robotics control pipelines.
Teleoperation Setup
MoveIt Pro supports many types of teleoperation, built into the UI, for human-in-the-loop demonstrations. In this tutorial, we recommend you use a Meta Quest device. See our tutorial on setting up the Quest here. Follow this tutorial up to and including the Run the Meta Quest APK section, but completing the rest of the tutorial will not be necessary for this guide.
MoveIt Pro also supports using Oracles - automated scripted demos using our standard Objective approach.
Data collection and processing
PyTorch integration
PyTorch is an open-source machine learning framework that provides flexible tools for building, training, and deploying deep learning models using dynamic computation graphs and GPU acceleration. The framework provides two key concepts for integrating user data into the training process: Datasets and DataLoaders. The PyTorch DataLoader utility efficiently loads data from a dataset in batches, with optional shuffling, parallelism, and preprocessing. The Datasets are responsible for providing access to individual data samples and their labels, defining how data is read, preprocessed, and returned to the DataLoader. We provide the MCAPDataset, which loads ROS bag MCAP-formatted data into a compatible format.
For the MCAPDataset to function correctly, the recorded ROS bag must contain the following topics.
/joint_states_synchronized
(sensor_msgs/msg/JointState)/joint_commands_synchronized
(sensor_msgs/msg/JointState)/wrist_camera/color_synchronized
(sensor_msgs/msg/Image)/scene_camera/color_synchronized
(sensor_msgs/msg/Image)/demonstration_indicator
(std_msgs/msg/String)
The topics should be synchronized. The best way to achieve this is to run a node that subscribes to all needed topics and republishes them. We provide the diffusion_policy_observation_recorder
node to do exactly that.
diffusion_policy_observation_recorder node
The diffusion_policy_observation_recorder
node is responsible for subscribing to sensor topics, including joint states and camera images, and republishing the data on synchronized topics. The following parameters are configurable.
- joints: The names of the joints to record. The configured joints must be contained in the robot model defined by the SRDF.
- wrist_camera_topic: The name of the wrist camera topic to record. The images will be republished to the
/wrist_camera/color_synchronized
topic. - scene_camera_topic: The name of the scene camera topic to record. The images will be republished to the
/scene_camera/color_synchronized
topic. - joint_states_topic: The name of the joint states topic to get the joint data. The topic is usually named
/joint_states
. The joint states will be republished to the/joint_states_synchronized
topic. - joint_commands_topic: The name of the joint states topic to get the joint command data. The default value is
/joint_states
. The model performance is generally enhanced if you publish the commanded data to a topic, e.g.,/joint_commands
, and configure it for this parameter. The topic is usually named/joint_states
. The joint commands will be republished to the/joint_commands_synchronized
topic. - dt: The publishing period for the synchronized topics. A value of 0.1 seconds is a reasonable default. For tasks requiring higher reactivity, e.g. pushing a block, a smaller value for delta t is better. For tasks more forgiving of slower reaction, a higher delta t is acceptable. Some degree of trial and error will be necessary to determine the best value of this parameter.
- image_size: Defaults to [240, 320]. This parameter will rescale the image topics before republishing. This is likely a good value for most applications, though experimentation with the best value for a given application is possible.
As an example, the following can be added to a site config launch file to include this node:
<node pkg="moveit_pro_ml_ros" exec="diffusion_policy_observation_recorder_main">
<param name="joints" value="[joint_1, joint_2, joint_3, joint_4, joint_5, joint_6, joint_7, robotiq_85_left_knuckle_joint]" />
<param name="wrist_camera_topic" value="/zed2i_camera/right/image_raw_color" />
<param name="scene_camera_topic" value="/zed2i_camera/left/image_raw_color" />
<param name="joint_states_topic" value="/joint_states" />
<param name="joint_commands_topic" value="/joint_commands" />
<param name="dt" value="0.1" />
<param name="image_size" value="[240, 320]" />
</node>
ROS Bag Data Collection
Record the data with the following ROS bag command.
ros2 bag record --max-cache-size 32000000000 --snapshot-mode -s mcap /robot_description /joint_states_synchronized /joint_commands_synchronized /demonstration_indicator /wrist_camera/color_synchronized /scene_camera/color_synchronized -o {PATH_TO_DATA_DIRECTORY}
You can now begin data collection:
- Manually perturb the scene, e.g., move the robot to a novel start state, and randomize the position of objects to be interacted with, etc.
- Publish a string with data “Starting demonstration” to the demonstration indicator topic.
- Provide a demonstration either by teleoperation or an oracle policy
- Publish a string with data “Stopping demonstration” to the demonstration indicator topic.
- Go to step 1 and repeat.
A general rule is to collect between 50 and 150 demonstrations for a task. The data should cover a wide range of conditions. It is important to demonstrate various failure mode recoveries in the data.
About training with ROS topic interfaces
The control logic of data collection is designed to be implemented via ROS topics and services. The ROS bag MCAP Data loader used in our PyTorch training scripts relies on a string topic called /demonstration_indicator. Whenever a demonstration begins, it is expected that a string with the data Starting demonstration will be published to the topic. When the demonstration finishes, a string with the data Stopping demonstration is expected to be published. If a start demonstration message is published when a demonstration is already active, that demonstration will be restarted.
The provided ROS bag record command runs in snapshot mode, which creates a service named /rosbag2_recorder/snapshot. When the service is called, the current data in the back will be saved to a chunk. This is beneficial because we want the chunks to line up with the demonstrations to avoid collecting too much or too little data
Finally, our data collection Objective subscribed to a topic for end effector control. We recommend a VR headset for use with the Objective. We have also configured a data collection Objective for direct joint control, which can be used with leader-follower setups or GELLO models.
Collecting Data with Oracle Policies
In this guide, we explored collecting teleoperation data, but MoveIt Pro can also be used to create Objectives that automatically create training data without human interaction.
To do this, create and run an Objective and record the trajectory and camera data like you would with normal teleop.
We recommend using Model Predictive Control (MPC) Behaviors to get the smoothest data that is easy to describe. See the example Pushing Sim Objective in the diffusion policy workspace.
Train the model
Local setup
Prerequisites for your training PC
- NVIDIA GPU with 16 GB VRAM or better (e.g., RTX‑4080)
- 32 GB RAM recommended
- 1 TB disk space recommended for collecting data
- MoveIt Pro Ultimate license
You will need around 10 hours to train a new skill.
Cloud instance
We do not officially support any particular GPU cloud provider. MoveIt Pro requires access to either a virtual machine or a bare metal instance. The minimum requirements are an NVIDIA GPU with at least 16 GB of VRAM, 32 GB of CPU RAM, and 50 GB of disk space. We have verified hyperbolic.ai as a viable service for GPU-powered model training. Training a new diffusion policy on an NVIDIA H100 80GB GPU should cost around $10. You will need to transfer your data to the instance for training. The data is expected to exceed 1 GB, so a remote storage option is likely needed. One option is to add the training data to the git repository for your workspace. You must set up git LFS for this to be a viable option.
Following the MoveIt Pro installation instructions, install MoveIt Pro. You will likely need to follow the steps for installing Docker. Make sure to follow the steps for installing the NVIDIA container toolkit. This enables the Docker container to access the GPU for training.
Training the model
Once MoveIt Pro is installed and configured, clone the diffusion_model_trainer
GitHub repository and add your workspace as a submodule. Make sure to clone the repository recursively to pull in all of the submodules. You also need to set up git LFS for the large model files.
git clone https://github.com/PickNikRobotics/diffusion_model_trainer.git
cd diffusion_model_trainer
git submodule update --init --recursive
git lfs install
git lfs pull
Contact support if you are an Ultimate licensed customer and need help accessing this repository.
After the first iteration of data collection is complete, you can run a dev container with the following command.
moveit_pro dev
This command will automatically configure the environment specified with pixi.
Please note that the Python version in the environment may differ from the installed Python version of ROS 2. In this case, you may run into errors when running ROS 2 commands. It is best to only use the dev container for training and exporting the model. If you need to run ROS 2 commands in the dev container, you can run moveit_pro shell
in another terminal.
To train the model, run the following command.
ros2 run diffusion_model_trainer train_diffusion_model -- --config-name=train_diffusion_model task.dataset_path={PATH_TO_DATA_DIRECTORY}
This will train the model and create checkpoints every 50 epochs. We recommend periodically exporting the ONNX model and testing on the robot or in simulation to validate the model.
The diffusion model training is configured via Hydra. Hydra is a Python framework for managing complex configurations, enabling you to compose, override, and organize settings dynamically from multiple sources like YAML files and command-line arguments. Hydra has become widely adopted specifically for configuring model hyperparameters. The Hydra YAML should be configured on a per-application basis.
You need to set the following parameters for your application:
- joint_based_task.yaml
- dataset.joint_names
- dataset.joint_states_topic
- dataset.joint_commands_topic
- shape_meta.obs./joint_states_synchronized.shape
- shape_meta.obs.action.shape
- Note: We only support the DDIMScheduler.
Deploy the model
MoveIt Pro uses ONNX to execute models within its Behavior Tree framework. Since model publishers typically do not provide scripts for exporting PyTorch models, we include ONNX export scripts to enable the use of diffusion policies with MoveIt Pro.
Prerequisites for Robot PC
- NVIDIA GPU with 8 GB VRAM or better (e.g., RTX‑4080)
- MoveIt Pro Ultimate license
Exporting the model
In a MoveIt Pro dev container, run the following command.
ros2 run diffusion_model_trainer export_onnx_diffusion_unet_image_workspace -- --config-name=export_onnx_model task.dataset_path={PATH_TO_DATA_DIRECTORY}
This will create two ONNX models in the data directory. You can copy those into your site configs models
directory for use with MoveIt Pro. Create the models directory if necessary. You will need to run colcon build
if this is your first time copying the models into that directory.
Running in MoveIt Pro
Create an Objective that uses the DiffusionPolicy Behavior. It depends on the JTAC controller, so you will need to make sure the controller is activated before running the DiffusionPolicy Behavior. Make sure the exported ONNX models are installed in a site config. We recommend putting them in a directory called models
. Run the Objective and verify the policy in action.
Conclusion
You have now completed the end-to-end setup for the Diffusion Policy in MoveIt Pro. Starting from demonstration data collection using either teleoperation or the ML Oracle, through training and exporting the model as ONNX, and finally integrating the policy into a MoveIt Pro Objective, you have transformed raw demonstrations into a deployable learned controller.