FAQ & Troubleshooting

FAQ & Troubleshooting#

This page collects common questions about platform support, performance, training stability, and visualization, along with practical debugging tips and links to further resources.

Platform Support#

Does it work on macOS?#

Yes, but only with limited performance. mjlab runs on macOS using CPU-only execution through MuJoCo Warp.

Training is not recommended on macOS, as it lacks GPU acceleration.
Evaluation works, but is significantly slower than on Linux with CUDA.

For serious training workloads, we strongly recommend Linux with an NVIDIA GPU.

Does it work on Windows?#

We have performed preliminary testing on Windows and WSL, but some workflows are not guaranteed to be stable.

Windows support may lag behind Linux.
Windows will be tested less frequently, since Linux is the primary development and deployment platform.
Community contributions that improve Windows support are very welcome.

CUDA Compatibility#

Not all CUDA versions are supported by MuJoCo Warp.

See mujoco_warp#101 for details on CUDA compatibility.
Recommended: CUDA 12.4+ (for conditional execution support in CUDA graphs).

Performance#

Is it faster than Isaac Lab?#

Based on our experience over the last few months, mjlab is on par or faster than Isaac Lab.

Does mjlab support multi-GPU training?#

Yes, mjlab supports multi-GPU distributed training using torchrunx.

Use --gpu-ids 0 1 (or --gpu-ids all) when running the train command.
See the Distributed Training for configuration details and examples.

Training & Debugging#

My training crashes with NaN errors#

A typical error when using rsl_rl looks like:

RuntimeError: normal expects all elements of std >= 0.0

This occurs when NaN/Inf values in the physics state propagate to the policy network, causing its output standard deviation to become negative or NaN.

There are many possible causes, including potential bugs in MuJoCo Warp (which is still in beta). mjlab offers two complementary mechanisms to help you handle this:

For training stability - NaN termination

Add a nan_detection termination to reset environments that hit NaN:

from dataclasses import dataclass, field

from mjlab.envs.mdp.terminations import nan_detection
from mjlab.managers.termination_manager import TerminationTermCfg

@dataclass
class TerminationCfg:
   # Your other terminations...
   nan_term: TerminationTermCfg = field(
      default_factory=lambda: TerminationTermCfg(
            func=nan_detection,
            time_out=False,
      )
   )

This marks NaN environments as terminated so they can reset while training continues. Terminations are logged as Episode_Termination/nan_term in your metrics.

Warning

This is a band-aid solution. If NaNs correlate with your task objective (for example, NaNs occur exactly when the agent tries to grasp an object), the policy will never learn to complete that part of the task. Always investigate the root cause using nan_guard in addition to this termination.

For debugging - NaN guard

Enable nan_guard to capture the simulation state when NaNs occur:

uv run train.py --enable-nan-guard True

See the NaN Guard documentation for details.

The nan_guard tool makes it easier to:

Inspect the simulation state at the moment NaNs appear.
Build a minimal reproducible example (MRE).
Report potential framework bugs to the MuJoCo Warp team.

Reporting well-isolated issues helps improve the framework for everyone.

When do I need to call `sim.forward()`?#

Short answer: you almost certainly don’t.

sim.forward() wraps MuJoCo’s mj_forward, which runs the full forward dynamics pipeline (kinematics, contacts, forces, constraint solving, sensors) but skips integration, leaving qpos/qvel unchanged. It brings all derived quantities in mjData (xpos, xquat, site_xpos, cvel, sensordata, etc.) into a consistent state with the current qpos/qvel. The environment’s step() method calls it once per step, right before observation computation, so observations, commands, and interval events always see fresh derived quantities. Termination and reward managers run before this call and therefore see derived quantities that are stale by one physics substep, a deliberate tradeoff that avoids a second forward() call while keeping the MDP well-defined (the staleness is consistent across all envs and all steps).

The one case where this matters is if you write an event or command that both writes state and reads derived quantities in the same function. For example, if Event A calls entity.write_root_velocity_to_sim() (which modifies qvel) and then immediately reads entity.data.root_link_vel_w (which comes from cvel), the read will see stale values from before the write.

Warning

Write methods (write_root_state_to_sim, write_joint_state_to_sim, etc.) modify qpos/qvel directly. Read properties (root_link_pose_w, body_link_vel_w, etc.) return derived quantities that are only current as of the last sim.forward() call. If you need to write then read in the same function, call env.sim.forward() between them.

For a deeper explanation, see Discussion #289.

Why aren’t my training runs reproducible even with a fixed seed?#

MuJoCo Warp does not yet guarantee determinism, so running the same simulation with identical inputs may produce slightly different outputs. This is a known limitation being tracked in mujoco_warp#562.

Until determinism is implemented upstream, mjlab training runs will not be perfectly reproducible even when setting a seed.

Rendering & Visualization#

What visualization options are available?#

mjlab currently supports two visualizers for policy evaluation and debugging:

Native MuJoCo visualizer - the built-in visualizer that ships with MuJoCo.
Viser - Viser, a web-based 3D visualization tool.

We are exploring training-time visualization (e.g., live rollout viewers), but this is not yet available.

As an alternative, mjlab supports video logging to Weights & Biases (W&B), so you can monitor rollout videos directly in the experiment dashboard.

What about camera/pixel rendering for vision-based RL?#

Camera rendering for pixel-based agents is not yet available.

The MuJoCo Warp team is actively developing camera support. Once mature, it will be integrated into mjlab for vision-based RL workflows.

How many environments can I visualize at once?#

Visualizers are limited to 32 environments maximum for performance reasons.

Offscreen renderer (for video recording): Hard-capped at 32 envs (see _MAX_ENVS in viewer/offscreen_renderer.py:12)
Native/Viser viewers: Limited by MuJoCo’s geometry buffer (default 10,000 geoms, configurable via max_geom parameter)

With thousands of environments, only a subset will be rendered. The viewer shows whichever environments fit within the geometry budget.

Why are my fixed-base robots all stacked at the origin instead of in a grid?#

Fixed-base robots require an explicit reset event to position them at their env_origins. If your robots appear stacked at (0, 0, 0):

Common causes:

Missing reset event - Most common issue.
env_spacing is 0 or very small - Check your SceneCfg(env_spacing=...). Even with proper reset events, if env_spacing=0.0, all robots will be at the same position. If env_spacing is very small (e.g., 0.01), they’ll be clustered in a tiny area that looks like a line from a distance.

Solution: Add a reset event that calls reset_root_state_uniform:

# In your ManagerBasedRlEnvCfg
events = {
  # For positioning the base of the robot at env_origins.
  "reset_base": EventTermCfg(
    func=mdp.reset_root_state_uniform,
    mode="reset",
    params={
      "pose_range": {},  # Empty = use default pose + env_origins
      "velocity_range": {},
    },
  ),
  # ... other events
}

This pattern is used in the example manipulation task (see lift_cube_env_cfg.py:84-93).

Why this is needed: Fixed-base robots are automatically wrapped in mocap bodies by auto_wrap_fixed_base_mocap(), but mocap positioning only happens when you explicitly call a reset event. The env_origins offset is applied inside reset_root_state_uniform() at line 127 of envs/mdp/events.py.

See issue #560 for examples.

How does env_origins determine robot layout?#

Robot spacing depends on your terrain configuration:

Plane terrain (terrain_type="plane"):

Creates an approximately square grid automatically
Grid size: ceil(sqrt(num_envs)) rows x cols
Spacing controlled by env_spacing parameter (default: 2.0m)
Examples with env_spacing=2.0: - 32 envs → 7x5 grid spanning 12m x 8m - 4096 envs → 64x64 grid spanning 126m x 126m
Important: If env_spacing=0, all robots will be at (0, 0, 0)
Implementation: terrain_importer.py:_compute_env_origins_grid()

Procedural terrain (terrain_type="generator"):

Origins loaded from pre-generated terrain sub-patches
Grid size: TerrainGeneratorCfg.num_rows x num_cols
Row index = difficulty level (curriculum mode)
Column index = terrain type variant
Important allocation behavior: Columns (terrain types) are evenly distributed across environments, but rows (difficulty levels) are randomly sampled. This means multiple environments can spawn on the same (row, col) patch, leaving others unoccupied, even when num_envs > num_patches.
Example: 5x5 grid (25 patches), 100 envs → each column gets exactly 20 envs, but those 20 are randomly distributed across 5 rows, so some patches remain empty.
Supports randomize_env_origins() to shuffle positions during training

How do I ensure each terrain type gets its own column?#

Set curriculum=True in your TerrainGeneratorCfg. This makes column allocation deterministic, with each column getting one terrain type based on normalized proportions.

Example with 2 terrain types:

TerrainGeneratorCfg(
  num_rows=3,
  num_cols=2,
  curriculum=True,  # Required for deterministic column allocation!
  sub_terrains={
    "flat": BoxFlatTerrainCfg(proportion=0.5),  # Gets column 0
    "pillars": HfDiscreteObstaclesTerrainCfg(
      proportion=0.5,  # Gets column 1
    ),
  },
)

Without curriculum=True, every patch is randomly sampled and you’ll get a random mix of both terrain types scattered across all patches.

Note: When num_cols equals the number of terrain types, each terrain gets exactly one column regardless of proportion values (they’re normalized). When num_cols > num_terrain_types, proportions determine how many columns each terrain type occupies.

What is flat patch sampling and how does it affect robot spawning?#

Flat patch sampling detects flat regions on heightfield terrains where robots can safely spawn. It uses morphological filtering on the heightfield to find circular areas where height variation is within a tolerance.

Configure it on any sub-terrain via flat_patch_sampling:

from mjlab.terrains.terrain_generator import FlatPatchSamplingCfg

"obstacles": HfDiscreteObstaclesTerrainCfg(
  ...,
  flat_patch_sampling={
    "spawn": FlatPatchSamplingCfg(
      num_patches=10,      # patches to sample per sub-terrain
      patch_radius=0.5,    # flatness check radius (meters)
      max_height_diff=0.05,  # max height variation within radius
    ),
  },
)

Then use reset_root_state_from_flat_patches as your reset event to spawn robots on detected patches instead of at the sub-terrain center.

Key details:

Only heightfield (Hf*) terrains support actual flat patch detection. Box terrains (Box*) don’t have heightfield data to analyze.
If any sub-terrain in the grid configures flat_patch_sampling, the flat patches array is allocated for all cells. Sub-terrains that don’t produce patches have their slots filled with the sub-terrain’s spawn origin, so reset_root_state_from_flat_patches always gets valid positions.
Without flat_patch_sampling, use reset_root_state_uniform which spawns at the sub-terrain origin (env_origins) plus an optional random offset.

Development & Extensions#

Can I develop custom tasks in my own repository?#

Yes, mjlab has a plugin system that lets you develop tasks in separate repositories while still integrating seamlessly with the core:

Your tasks appear as regular entries for the train and play commands.
You can version and maintain your task repositories independently.

A complete guide will be available in a future release.

Assets & Compatibility#

What robots are included?#

mjlab includes two reference robots:

Unitree Go1 (quadruped).
Unitree G1 (humanoid).

These robots serve as:

Minimal examples for robot integration.
Stable, well-tested baselines for benchmark tasks.

To keep the core library lean, we do not plan to aggressively expand the built-in robot library. Additional robots may be provided in separate repositories or community-maintained packages.

Can I use USD or URDF models?#

No, mjlab expects MJCF (MuJoCo XML) models.

You will need to convert USD or URDF assets to MJCF.
For many common robots, you can directly use MuJoCo Menagerie, which ships high-quality MJCF models and assets.

Getting Help#

GitHub Issues#

Use GitHub issues for:

Bug reports
Performance regressions
Documentation gaps

When filing a bug, please include:

CUDA driver and runtime versions
GPU model
A minimal reproduction script
Complete error logs and stack traces
Appropriate labels (for example: bug, performance, docs)

Open an issue

Discussions#

Use GitHub Discussions for:

Usage questions (config, debugging, best practices)
Performance tuning tips
Asset conversion and modeling questions
Design discussions and roadmap ideas

Start a discussion

Known Limitations#

We’re tracking missing features for the stable release in mujocolab/mjlab#100. Check our open issues to see what’s actively being worked on.

If something isn’t working or if we’ve missed something, please file a bug report.

Important

mjlab has reached v1.0.0! While we strive for stability, please note that MuJoCo Warp (our physics backend) is still in beta. Feedback and contributions are very welcome.