Observation History and Delay#
Observations have two temporal features: history and delay. History stacks past frames for temporal context, while delay can be used to model sensor latency.
TL;DR#
Add history to stack frames:
from mjlab.managers.manager_term_config import ObservationTermCfg
joint_vel: ObservationTermCfg = ObservationTermCfg(
func=joint_vel,
history_length=5, # Keep last 5 frames
flatten_history_dim=True # Flatten for MLP: (12,) * 5 = (60,)
)
Add delay to model sensor latency:
# At 50Hz control (20ms/step): lag=2-3 → 40-60ms latency
camera: ObservationTermCfg = ObservationTermCfg(
func=camera_obs,
delay_min_lag=2,
delay_max_lag=3,
)
Combine both:
joint_pos: ObservationTermCfg = ObservationTermCfg(
func=joint_pos,
delay_min_lag=1,
delay_max_lag=3, # Delayed observations
history_length=5, # Stack 5 delayed frames
flatten_history_dim=True
)
# Pipeline: compute → delay → stack → flatten
Observation History#
History stacks past observations to provide temporal context.
Basic Usage#
Flattened history (for MLPs):
joint_vel: ObservationTermCfg = ObservationTermCfg(
func=joint_vel, # Returns (num_envs, 12)
history_length=3,
flatten_history_dim=True # Output: (num_envs, 36)
)
Structured history (for RNNs):
joint_vel: ObservationTermCfg = ObservationTermCfg(
func=joint_vel, # Returns (num_envs, 12)
history_length=3,
flatten_history_dim=False # Output: (num_envs, 3, 12)
)
Group-Level Override#
Apply history to all terms in a group:
@dataclass
class PolicyCfg(ObservationGroupCfg):
concatenate_terms: bool = True
history_length: int = 5 # Applied to all terms
flatten_history_dim: bool = True
joint_pos: ObservationTermCfg = ObservationTermCfg(func=joint_pos)
joint_vel: ObservationTermCfg = ObservationTermCfg(func=joint_vel)
# Both terms get 5-frame history, flattened
Term-level settings override group settings:
@dataclass
class PolicyCfg(ObservationGroupCfg):
history_length: int = 3 # Default for group
joint_pos: ObservationTermCfg = ObservationTermCfg(
func=joint_pos,
history_length=5 # Override: use 5 instead of 3
)
Reset Behavior#
History buffers are cleared on environment reset. The first observation after reset is backfilled across all history slots, ensuring valid data from step 0.
# At reset
buffer = [obs_0, obs_0, obs_0] # Backfilled
# After 2 steps
buffer = [obs_0, obs_1, obs_2] # Normal accumulation
History Flattening Order (Term-Major vs Time-Major)#
When flatten_history_dim=True and concatenate_terms=True, mjlab uses
term-major ordering, where each term’s full history is flattened before
concatenating terms:
Term A: shape (num_envs, obs_dim_A) with history_length=3
Term B: shape (num_envs, obs_dim_B) with history_length=3
mjlab output (TERM-MAJOR):
[A_t0, A_t1, A_t2, B_t0, B_t1, B_t2, ...]
└─ all A history ─┘ └─ all B history ─┘
An alternative approach is time-major (or frame-major) ordering, where complete observation frames are built at each timestep before concatenating across time:
TIME-MAJOR (alternative approach):
[A_t0, B_t0, ..., A_t1, B_t1, ..., A_t2, B_t2, ...]
└─ frame t0 ──┘ └─ frame t1 ──┘ └─ frame t2 ──┘
Sim2sim compatibility: If you need to transfer policies to/from frameworks that use time-major ordering, you will need to reorder observations. This affects policies trained with history but not those without.
Observation Delay#
Real robots have sensors with communication delays (WiFi, USB) and varying refresh rates (30Hz camera, 100Hz encoders). The delay system models both sensor latency and slower-than-control refresh rates.
Delay Parameters#
delay_min_lag / delay_max_lag (default: 0) Lag range in steps. Uniformly
samples an integer lag from [min_lag, max_lag] (both inclusive) each update.
lag=0 means current observation, lag=2 means 2 steps ago.
delay_per_env (default: True) If True, each environment gets a different
lag. If False, all environments share the same lag.
delay_hold_prob (default: 0.0)
Probability [0, 1] of keeping the previous lag instead of resampling.
delay_update_period (default: 0) How often (in steps) to resample the lag
and potentially get a new observation. If 0, resample every step. If > 0, the
observation may repeat for N steps (models sensors that refresh slower than
control rate).
delay_per_env_phase (default: True) If True, each environment has a
different phase offset for update period (staggers refresh times).
Understanding Delay vs Multi-Rate#
Delay and multi-rate are orthogonal concepts that model different real-world phenomena:
Delay (
delay_min_lag/delay_max_lag): Models sensor latency / communication delay. Controls how old the observation is.Multi-rate (
delay_update_period): Models sensor refresh rate. Controls how often the sensor produces a new reading.
Visualizing the difference (50Hz control = 20ms/step):
Sensor captures: A B C D E F G H
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
Control steps: 0 1 2 3 4 5 6 7
20ms 40ms 60ms 80ms 100ms 120ms 140ms 160ms
No delay, no multi-rate (baseline - perfect sensor):
You receive: A B C D E F G H
↑ current observation every step
Delay only (lag=2, no update_period):
You receive: - - A B C D E F
↑ ↑ ↑ ↑ ↑ ↑
40ms 40ms 40ms 40ms 40ms 40ms delay
Every step gets a NEW observation, just 40ms old
Multi-rate only (update_period=2, no lag):
You receive: A A C C E E G G
↑same ↑ ↑same ↑ ↑same ↑ ↑same ↑
Observations update every 2 steps (25Hz refresh)
Steps 1,3,5,7 repeat previous observation
Both delay + multi-rate (lag=2, update_period=2):
Sensor captures: A B C D E F G H
You receive: - - A A C C E E
↑same ↑ ↑same ↑ ↑same ↑
40ms delayed + only refreshes every 2 steps
Models 25Hz camera with 40ms latency
Real-world example - 30Hz camera at 50Hz control with 40ms latency:
camera: ObservationTermCfg = ObservationTermCfg(
func=camera_obs,
delay_min_lag=2, # 40ms latency
delay_max_lag=2,
delay_update_period=2, # 25Hz refresh (approximates 30Hz)
)
Common mistake: Using only delay_min_lag=2, delay_max_lag=2 gives you
40ms latency but you still get 50 different camera frames per second. You need
delay_update_period=2 to model the slower refresh rate.
Computing Delays from Real-World Latency#
Convert real-world latency to simulation steps:
delay_steps = latency_ms / (1000 / control_hz)
Example at 50Hz control (20ms per step): - 40ms latency = 40 / 20 = 2 steps - 60ms latency = 60 / 20 = 3 steps - 100ms latency = 100 / 20 = 5 steps
Example at 100Hz control (10ms per step): - 40ms latency = 40 / 10 = 4 steps - 60ms latency = 60 / 10 = 6 steps
Note
Delays are quantized to control timesteps. At 50Hz control (20ms/step),
you can only represent 0ms, 20ms, 40ms, 60ms, etc. To approximate a 45ms sensor,
use delay_min_lag=2, delay_max_lag=3 which uniformly samples lag ∈ {2, 3}
(both inclusive), giving either 40ms or 60ms delay.
Computing Multi-Rate Updates#
Convert sensor refresh rate to update period:
update_period = control_hz / sensor_hz
Example at 50Hz control:
30Hz camera: update_period = 50 / 30 ≈ 2 steps → actual 25Hz (error: -17%)
25Hz LiDAR: update_period = 50 / 25 = 2 steps → actual 25Hz (exact)
10Hz GPS: update_period = 50 / 10 = 5 steps → actual 10Hz (exact)
Example at 100Hz control:
30Hz camera: update_period = 100 / 30 ≈ 3 steps → actual 33.3Hz (error: +11%)
50Hz IMU: update_period = 100 / 50 = 2 steps → actual 50Hz (exact)
Note
Since update_period must be an integer, sensor rates that don’t evenly
divide the control frequency can only be approximated. For example, 30Hz at 50Hz
control needs update_period=1.67, so round to 2 → 25Hz (17% error). Higher control
frequencies reduce quantization error (100Hz control approximates 30Hz as 33.3Hz
with only 11% error).
Examples#
Joint encoders (100Hz, no delay) at 50Hz control:
joint_pos: ObservationTermCfg = ObservationTermCfg(func=joint_pos)
# delay_min_lag=delay_max_lag=0 by default.
Camera (30Hz, 40-60ms latency) at 50Hz control:
# 30Hz camera: update_period = 50/30 ≈ 2 → actually 25Hz (17% error, acceptable)
# 40-60ms latency = 2-3 steps at 50Hz (20ms/step)
camera: ObservationTermCfg = ObservationTermCfg(
func=camera_obs,
delay_min_lag=2, # 40ms
delay_max_lag=3, # 60ms
delay_update_period=2, # 25Hz (approximates 30Hz)
delay_per_env_phase=True # Staggered refresh across envs
)
Mixed-rate system at 50Hz control:
@dataclass
class PolicyCfg(ObservationGroupCfg):
# Fast encoders (no delay)
joint_pos: ObservationTermCfg = ObservationTermCfg(
func=joint_pos,
# delay_min_lag=0, delay_max_lag=0 (default)
)
# 25Hz camera (40-80ms latency)
camera: ObservationTermCfg = ObservationTermCfg(
func=camera_obs,
delay_min_lag=2, # 40ms
delay_max_lag=4, # 80ms
delay_update_period=2 # 25Hz (50Hz control / 2)
)
Processing Pipeline#
Observations flow through this pipeline:
compute → noise → clip → scale → delay → history → flatten
Why delay before history? History stacks delayed observations. This models real systems where you buffer old sensor readings, not future ones.
Example with both:
joint_vel: ObservationTermCfg = ObservationTermCfg(
func=joint_vel,
scale=0.1, # Scale raw values
delay_min_lag=1, # 20ms delay at 50Hz
delay_max_lag=2, # 40ms delay at 50Hz
history_length=3, # Stack 3 delayed frames
flatten_history_dim=True
)
# Pipeline:
# 1. compute() returns (num_envs, 12)
# 2. scale: multiply by 0.1
# 3. delay: return observation from 1-2 steps ago
# 4. history: stack last 3 delayed frames → (num_envs, 3, 12)
# 5. flatten: reshape → (num_envs, 36)
Performance#
Delay buffers are only created when delay_max_lag > 0. Terms with no delay
(the default) have zero overhead. Similarly, history buffers are only created
when history_length > 0.