Skip to main content
Version: 9

Transforms Troubleshooting

A Behavior that performs a TF lookup (for example, GetMasks3DFromMasks2D, TransformPoseFrame, CalculatePoseOffset) fails when the requested transform is not available at the requested time. The specific TF2 error text tells you exactly which failure mode occurred, and the remedies are different. Find the error text in the list below and follow the guidance for that specific case.

Read the full error message

Every TF2 lookup failure includes a message from the underlying tf2::TransformException. The message varies — match on the exact phrase to figure out which section applies:


Extrapolation into the Past

Error text: Lookup would require extrapolation into the past. Requested time {t0} but the earliest data is at time {t1}

What it means: The Behavior asked for a transform at a time ({t0}) that is older than anything currently in the TF buffer for at least one edge of the lookup chain. The buffer retains at most cache_time (default 10 s) of history per dynamic transform, so any query older than that gets rejected.

Common causes

  • Consumer-side CPU starvation. The Behavior's pipeline (ML inference, planning, point-cloud processing, etc.) is slow enough that by the time it calls lookupTransform with a sensor message's header.stamp, the TF subscription has already received newer transforms and pruned the relevant old ones out of the buffer. The bigger the processing delay, the further in the "past" the query looks.
  • Stale header.stamp consumption. The Behavior is using a header.stamp from a message that was queued upstream (for example, a point cloud sitting in a queue between two Behaviors) longer than cache_time.
  • use_sim_time mismatch. If the publisher runs on wall-clock time but the consumer is using simulation time (/clock), or vice versa, timestamps from one clock appear in the distant past of the other.
  • Clock desync on a multi-PC setup. If the publisher's clock is significantly ahead of the consumer's clock, the publisher's stamps look as if they are in the future of the buffer — paradoxically reported as "into the past" from the consumer's perspective when the buffer fills with those future-looking messages and then the consumer's own lookup time falls below the buffer's earliest.

Diagnose

# CPU load at the moment of failure — if cores are saturated, this is consumer starvation.
top -b -n1 -H | head -40

# Measure pipeline latency: for the affected sensor, compare header.stamp to now().
ros2 topic echo /<sensor_topic> --field header.stamp --once
date +%s.%N

# use_sim_time parity — all TF subscribers and publishers must agree.
ros2 param get /objective_server_node use_sim_time
ros2 param get /robot_state_publisher use_sim_time

# Clock sync on multi-PC setups (run on each host, compare).
chronyc tracking

Mitigate

  • Reduce pipeline latency so the Behavior consumes each message promptly. Profile the hot loop and move expensive work off the critical path if possible.
  • Increase the TF buffer's cache_time when constructing tf2_ros::Buffer if transient slowness is unavoidable and the Behavior logic tolerates older lookups. Default is 10 s.
  • Use tf2::TimePointZero in the lookup if "most recent" is acceptable (bypasses the stamp entirely).
  • Fix clock parity between publishers and consumers — either all use wall clock or all use sim time; run chrony on every host and confirm sub-millisecond offsets.

Extrapolation into the Future

Error text: Lookup would require extrapolation into the future. Requested time {t0} but the latest data is at time {t1}

What it means: The Behavior asked for a transform at a time ({t0}) that is newer than the newest entry in the TF buffer for at least one dynamic edge of the chain. The publisher is either behind, stopped, or on a clock that lags the consumer's.

Common causes

  • Publisher-side CPU starvation. The node that publishes the dynamic transform (for example, robot_state_publisher fed by joint_state_broadcaster) is falling behind its publish rate. {t1} slips backward from wall-clock now() until the gap exceeds the Behavior's lookup timeout.
  • Publisher died or wedged. {t1} is frozen at the time the publisher last produced a message. The gap grows linearly with wall-clock time.
  • Network backpressure on multi-PC setups. TF messages are queued up on the publisher host or in DDS transport, and the consumer's subscription falls behind.
  • use_sim_time mismatch in the opposite direction from the "into the past" case — publisher on sim time, consumer on wall clock.
  • Clock desync with the publisher's clock behind the consumer's.

Diagnose

# Is the /tf stream live at the expected rate?
ros2 topic hz /tf

# Who publishes to /tf on the consumer's domain?
ros2 topic info /tf --verbose

# Per-edge publish rate and average delay — shows the offender's exact edge.
ros2 run tf2_ros tf2_monitor <source_frame> <target_frame>

# CPU load on the publisher host and on the consumer host during failure.
top -b -n1 -H | head -40

# Clock sync between hosts on multi-PC setups.
chronyc tracking

Mitigate

  • Identify the falling-behind publisher from tf2_monitor and give it more CPU (lower the load on its host, raise its priority with chrt, or move it to a less contended host).
  • Increase the Behavior's lookupTransform timeout modestly if the publisher is just slightly slow. This is a tactical fix — it does not solve a wedged publisher.
  • Restart the publisher if it has stopped producing messages. Wedged /tf publishers are often a symptom of a deadlocked driver or node; investigate the stuck process rather than just restarting.
  • Fix clock / sim-time parity across all hosts.

Extrapolation at Time

Error text: Lookup would require extrapolation at time {t0}, but only time {t1} is in the buffer

What it means: The TF buffer has exactly one entry in a TimeCache for at least one frame in the lookup chain, and it does not match the requested time. A cache with only one entry cannot interpolate. Critically, a frame published on /tf_static would not fail this way — /tf_static stores its data in a StaticCache that never raises extrapolation. The presence of this error means a dynamic publisher seeded (or overwrote) the cache with a single one-shot message.

Common cause: one-shot dynamic publisher corrupts a semantically-static frame

A Behavior uses tf2_ros::TransformBroadcaster (dynamic, /tf) to publish a calibration-like pose once and then returns. The child_frame_id happens to match a frame that the system otherwise expects to be static (for example, a Realsense optical frame like scene_camera_color_optical_frame). The dynamic publish writes a single entry to the subscriber's TF buffer; the lookup then throws extrapolation against the frozen {t1} stamp and keeps doing so forever, because the publisher does not emit any more messages to advance the newest stamp.

Giveaways:

  • {t1} does not change across repeated failures within a session.
  • Restarting the Behavior's host process (or agent_bridge) temporarily "fixes" the issue because the TF buffer is rebuilt from scratch — until the offending publisher runs again.
  • The offending frame is one you would otherwise expect to be static (a camera extrinsic, a calibrated mount, a URDF-derived frame).

Other cause: publisher stopped and all but its last message aged out

A previously healthy dynamic publisher stopped publishing. Over time, cache_time pruned older entries until only the most recent remains. Less common in practice because most healthy dynamic publishers run at rates much higher than a 10 s window.

Diagnose

# Which publishers are on /tf?  Compare against expectations — spurious publishers are the usual suspect.
ros2 topic info /tf --verbose

# Capture a few seconds of /tf and see which child_frame_ids appear. Anything that should be static (camera optical frames, calibrated mounts) is a bug.
timeout 10 ros2 topic echo /tf > /tmp/tf.txt 2>/dev/null
awk '/^[[:space:]]*frame_id:/{p=$2} /^[[:space:]]*child_frame_id:/{print p" -> "$2}' /tmp/tf.txt | sort | uniq -c | sort -rn

# Compare against /tf_static — any frame appearing in BOTH topics with different parents is a poisoning candidate.
timeout 3 ros2 topic echo /tf_static | grep -E "child_frame_id" | sort -u

# Per-edge latest-stamp view. Frozen delays across multiple iterations name the stuck edge.
ros2 run tf2_ros tf2_monitor <source_frame> <target_frame>

Mitigate

  • Find the Behavior that publishes the offending dynamic transform and either remove the broadcast (if it was a side effect not part of the Behavior's documented contract) or switch it to tf2_ros::StaticTransformBroadcaster (if the transform is semantically static). StaticTransformBroadcaster publishes on /tf_static with TRANSIENT_LOCAL durability and allocates a StaticCache on subscribers, which never throws extrapolation.
  • Do not use tf2_ros::TransformBroadcaster with a child_frame_id that might match an already-published frame. Any one-shot dynamic publish will overwrite the subscriber's cached value. If you need to publish a detection or a user-supplied pose into TF, use a frame name that cannot collide with real URDF or driver-owned frames.

Missing Frames

Error text: canTransform(...) returned false, or "{frame}" does not exist, or messages from tf2::LookupException that state a frame is not known.

What it means: The buffer has no record of a frame by that name — no publisher has announced it on /tf or /tf_static that the consumer has seen.

Common causes

  • Typo in the frame name (header.frame_id or a Behavior input port).
  • Publisher not running — the node that owns this frame was not launched, crashed on startup, or exited. Common with camera drivers that publish their optical frames on /tf_static once at startup and then die on a USB error.
  • DDS discovery failure on multi-PC setups — the consumer's DDS domain did not see the publisher. Check ROS_DOMAIN_ID, ROS_LOCALHOST_ONLY, and firewall rules.
  • /tf_static transient-local QoS mismatch — if a subscriber uses a QoS profile incompatible with /tf_static's TRANSIENT_LOCAL, the latched message is silently dropped. The stock tf2_ros::TransformListener does this correctly; custom subscribers may not.

Diagnose

# Does the frame show up anywhere?
ros2 run tf2_tools view_frames # produces frames.pdf with the full current tree

# Who publishes the expected static frames?
ros2 topic info /tf_static --verbose

# Any late-binding QoS issues? Look for incompatibility warnings.
ros2 topic info /tf_static --verbose # compare Durability/Reliability across publishers and subscribers

Mitigate

  • Fix the frame name if it was a typo. Frame names are case-sensitive and free-form strings.
  • Restart the publisher, or fix its crash-on-startup condition. dmesg | grep -i usb helps for camera drivers.
  • Verify DDS domain settings match across hosts, and check ros2 node list from the consumer's host to confirm the publisher is discoverable.

Disconnected TF Tree

Error text: Could not find a connection between '{target}' and '{source}' because they are not part of the same tree

What it means: Both frames exist in the buffer, but there is no path of transforms connecting them. Usually one side is an isolated sub-tree.

Common causes

  • Missing root-linking transform. For example, a calibrated camera has been added to the tree with a parent frame that itself is not connected back to world or the robot's base.
  • Multiple publishers disagree on a frame's parent. The last publisher to broadcast a given frame "wins" the parent relationship, and the others' sub-trees become orphaned. The TF tree only allows one parent per frame.
  • A node that provided a bridging transform (e.g., a calibration publisher) died and the child sub-tree is now floating.

Diagnose

# Generate the full tree visualization and look for the disconnected island.
ros2 run tf2_tools view_frames
# Open frames.pdf — the island containing your source frame will be visibly detached from the robot tree.

# Enumerate every edge currently on /tf_static.
timeout 3 ros2 topic echo /tf_static | grep -E "^\s*(frame_id|child_frame_id):" | paste - -

Mitigate

  • Add the missing bridging transform. For example, publish world → <calibrated_camera_link> as a static transform so the camera's optical sub-tree attaches to the robot tree.
  • Enforce a single publisher per child frame. If two nodes both want to publish scene_camera_link, consolidate them — put the transform in the URDF (published by robot_state_publisher), or assign each node a distinct child frame name.

General diagnostics and patterns

tf2_monitor is the single most useful command

ros2 run tf2_ros tf2_monitor <source_frame> <target_frame>

Prints every edge in the chain with its publish frequency, average delay, and most recent transform time. Frozen delays across multiple iterations name the stuck edge; drifting delays show a publisher falling behind.

Inspect /tf and /tf_static publishers

ros2 topic info /tf --verbose
ros2 topic info /tf_static --verbose

Look for: unexpected publishers, QoS mismatches (/tf_static must be TRANSIENT_LOCAL), and publisher counts larger than intended.

Watch the dynamic /tf stream live

timeout 10 ros2 topic echo /tf > /tmp/tf.txt 2>/dev/null
awk '/^[[:space:]]*frame_id:/{p=$2} /^[[:space:]]*child_frame_id:/{print p" -> "$2}' /tmp/tf.txt | sort | uniq -c | sort -rn

Shows each parent→child edge that was broadcast and how many times. Low-count edges (1–2 messages in 10 s) are one-shot publishers — the root cause of most Extrapolation at time errors.

Converting the frozen {t1} value to a timestamp

When {t1} appears in an extrapolation error, convert it to a wall-clock time:

date -u -d @<seconds>

Compare against recent events — driver startup, agent_bridge restart, the last time the suspect Behavior ticked. The event matching {t1} names the publisher.

Design patterns to avoid

  • Do not use tf2_ros::TransformBroadcaster to publish semantically-static data. Every dynamic publication seeds a TimeCache on every subscriber, with a fixed stamp. If the publisher stops, the cache entry goes stale and lookups throw extrapolation. Use tf2_ros::StaticTransformBroadcaster for calibrated offsets and other fixed relationships.
  • Do not reuse existing frame names as child_frame_id in a dynamic publication. Overwriting a frame that the system relies on (for example, a Realsense optical frame) is a silent way to corrupt the TF tree for every other subscriber.
  • Prefer tf2::TimePointZero in internal Behavior-to-Behavior TF lookups when "most recent" is acceptable. It bypasses the stamp-matching path entirely and avoids all three extrapolation modes.