Heterogeneous Worlds#
mjlab can run a single batched simulation in which different parallel
worlds use different mesh assets for the same logical entity. World 0
may simulate a cube, world 1 a sphere, world 2 a bowl. All worlds
share the same compiled scene and the same body and joint structure;
only the meshes and the per-geom attributes that travel with them
(friction, contact bits, mass, density, and a few more) differ across
worlds. Articulated props work too (you can have a hinge or slide
below the variant’s root), as long as the joint topology matches
across variants. The feature is exposed through VariantEntityCfg.
The full breakdown of what can and cannot vary across variants is in
the next section.
Quickstart#
Say you want some parallel worlds to hold a sphere and others to hold
a cone, with a single shared scene running both at once. Define each
variant as a function that returns an MjSpec, then group them
under one VariantEntityCfg:
import mujoco
from mjlab.entity import EntityCfg, VariantEntityCfg
def make_sphere_spec() -> mujoco.MjSpec:
spec = mujoco.MjSpec()
mesh = spec.add_mesh(name="visual")
mesh.make_sphere(subdivision=3)
mesh.scale[:] = (0.05,) * 3
body = spec.worldbody.add_body(name="prop")
body.add_freejoint()
body.add_geom(type=mujoco.mjtGeom.mjGEOM_MESH, meshname="visual")
return spec
def make_cone_spec() -> mujoco.MjSpec:
spec = mujoco.MjSpec()
mesh = spec.add_mesh(name="visual")
mesh.make_cone(nedge=16, radius=0.04)
body = spec.worldbody.add_body(name="prop")
body.add_freejoint()
body.add_geom(type=mujoco.mjtGeom.mjGEOM_MESH, meshname="visual")
return spec
object_cfg = VariantEntityCfg(
variants={
"sphere": make_sphere_spec,
"cone": make_cone_spec,
},
assignment={"cone": 2.0}, # twice as many cones as spheres
init_state=EntityCfg.InitialStateCfg(pos=(0.0, 0.0, 0.2)),
)
Plug the variant entity into a Scene exactly like a regular
EntityCfg:
from mjlab.scene import SceneCfg
scene_cfg = SceneCfg(
num_envs=4096,
entities={"object": object_cfg},
)
Twice as many worlds will hold a cone as a sphere. Variants not listed
in the assignment dict default to weight 1.0; omit assignment
entirely for uniform allocation across all variants.
What variants can differ in#
Free to vary across variants: the mesh asset assigned to each
slot, the number of mesh geoms per (body, role) bucket on the
variant body (one variant can have more collision meshes than
another), the per-mesh-geom attributes that travel with the mesh
(friction, contact bits, mass, density, condim, and a handful of
others), and explicit body inertial values within whichever single
inertial mode the variants agree on per body.
Must match across variants: the body tree, joint topology,
primitive (non-mesh) geoms, and any actuators / sensors / tendons /
equalities. Variants must also agree on the inertial representation
per body (mesh-derived, diagonal, or fullinertia), and may not use the
reserved mjlab/pad/ name prefix on any element. Variant entities
must also be floating-base: the root body declares a freejoint.
The validator runs at entity build time and raises ValueError
naming the offending variant and the exact mismatch.
How variants are assembled#
mjlab merges every variant’s mesh assets into a single MjSpec and
gives the variant body enough mesh-geom slots to cover the maximum
mesh count any variant uses for each (body, role) bucket. A slot
is identified by (body_path, role, ordinal). role is “visual”
or “collision”, derived from contype/conaffinity;
mujoco_warp’s geom_contype/geom_conaffinity are 1D shared
(not per-world), so a slot’s role is fixed across worlds by
construction.
A worked example#
Say variant sphere has 1 visual mesh geom and 2 collision mesh
geoms on the prop body, and variant cone has 1 visual mesh geom
and 4 collision mesh geoms on the same body.
sphere variant body cone variant body
------------------- -------------------
prop body prop body
[visual] sphere_vis [visual] cone_vis
[coll] sphere_col_0 [coll] cone_col_0
[coll] sphere_col_1 [coll] cone_col_1
[coll] cone_col_2
[coll] cone_col_3
mjlab walks each variant’s body tree, buckets mesh geoms by
(body_path, role), and lays the union out as slots:
Slot |
body_path |
role |
ordinal |
sphere fills with |
cone fills with |
|---|---|---|---|---|---|
0 |
/prop |
visual |
0 |
sphere_vis |
cone_vis |
1 |
/prop |
collision |
0 |
sphere_col_0 |
cone_col_0 |
2 |
/prop |
collision |
1 |
sphere_col_1 |
cone_col_1 |
3 |
/prop |
collision |
2 |
(unfilled) |
cone_col_2 |
4 |
/prop |
collision |
3 |
(unfilled) |
cone_col_3 |
Five slots total. The merged scene’s prop body has five mesh geoms:
slot 0 plus four collision slots (the union of sphere’s two and
cone’s four). At merge time, every variant’s mesh asset is added to
the merged spec under a unique name (e.g.
sphere/sphere_vis, cone/cone_col_2).
The merged scene compiles once into a single canonical MjModel
that every world in the batch agrees on layout-wise: same nbody,
ngeom, same body and geom IDs. mjlab’s per-world overrides on top of
that one model are what make worlds heterogeneous.
What each world sees at runtime#
Worlds where sphere is active see only its three meshes; the two
extra collision slots are disabled via per-world geom_dataid = -1,
and mujoco_warp skips them. Worlds where cone is active see all
five meshes wired up.
World |
variant |
slot 0 |
slot 1 |
slot 2 |
slot 3 |
slot 4 |
|---|---|---|---|---|---|---|
0 |
sphere |
sphere_vis |
sphere_col_0 |
sphere_col_1 |
off (-1) |
off (-1) |
1 |
cone |
cone_vis |
cone_col_0 |
cone_col_1 |
cone_col_2 |
cone_col_3 |
Three categories of per-world override carry the variation:
geom_dataid is a
(num_envs, ngeom)table. Its row for world W picks which compiled mesh each slot points at.-1is the “skip me” sentinel mujoco_warp already understands.Mesh-derived fields (
geom_size,geom_rbound,geom_aabb,geom_pos,geom_quat,body_mass,body_subtreemass,body_inertia,body_invweight0,body_ipos,body_iquat) are stored as(num_envs, ...)arrays. The values for sphere worlds reflect a sphere-shaped inertia tensor and sphere-sized AABBs; the values for cone worlds reflect the cone. The full list is inmjlab.entity.variants.VARIANT_DEPENDENT_FIELDS.Per-mesh-geom attributes (contact bits, friction, mass, density, condim, group, priority, rgba, solref, solimp, margin, gap) are captured per variant in
VariantGeomSpecat merge time and restored verbatim on the slot geom during the per-variant reference compile. So if sphere’s collision geoms havefriction=0.5and cone’s havefriction=1.2, world W’s per-step friction reflects the assigned variant’s source value. The one exception ismaterial, which is not propagated across variants; if you need per-world appearance variation use DR ongeom_rgba/mat_rgba.
If sphere adds a body that cone lacks (or vice versa), the
validator rejects the configuration before any of the merge logic
runs. The slot mechanism only flexes mesh geom counts within
matching bodies; everything structural above the geom level must
agree.
Note
Doesn’t compiling the merged scene ruin the prop body’s inertia?
No, but it’s worth understanding why, because the naive intuition
says it should. If you stuck every variant’s mesh geoms on the
prop body and called spec.compile(), MuJoCo would sum each
geom’s inertial contribution, and you would get a body whose mass
and inertia tensor are a meaningless mix of every variant’s shape.
mjlab avoids this in two layers:
The merged scene does not stick every variant’s geoms on the body. The prop body in the merged spec carries variant 0’s mesh geoms (with their original mass and density) plus, for any slot variant 0 doesn’t fill, a synthesized padding geom that has
mass = 0anddensity = 0. Padding contributes nothing to body inertia. Other variants’ meshes are present in the merged spec only as mesh assets (in the assets section, not as geoms on any body). They get wired in at runtime via per-worldgeom_dataidand never affect the host compile’s inertial sums.Per-world overrides come from per-variant source compiles. Even with the above, the merged-scene compile’s prop body inertia is only correct for variant 0. For every other variant, mjlab compiles that variant’s original source spec in isolation (one body, one variant’s worth of meshes), reads the resulting
body_mass,body_inertia,body_ipos,body_iquat,body_invweight0, andbody_subtreemass, and writes them into the per-world arrays at the prop body’s index.
Net result: world W’s prop body inertia is byte-equal to what you
would get by compiling variant W’s source spec on its own. There
is a regression test
(test_visual_collision_split_inertia_matches_independent_compile
in tests/test_variants.py) that asserts exactly this against
independent per-variant compiles.
World assignment#
How worlds get mapped to variants is controlled by the assignment
field on VariantEntityCfg. It accepts three shapes:
None(default): uniform allocation across variants.dict[str, float]: per-variant weights. Variants not listed default to weight 1.0.Callable[[int], Sequence[int]]: an explicit assignment function called withnum_envsat simulation init.
Both the None and dict cases use the
largest remainder method. Each
variant’s quota is q_i = (w_i / sum(w)) * num_envs; each variant
first receives floor(q_i) worlds, and the remaining
num_envs - sum(floors) worlds go to the variants with the largest
fractional remainders, with ties broken by declaration order. For
num_envs = 10 and weights (1.0, 2.0, 1.0) this gives
(3, 5, 2) worlds per variant. Weights are normalized internally,
so {"a": 1, "b": 2, "c": 1} and {"a": 0.25, "b": 0.5, "c": 0.25}
produce identical assignments. A weight of zero is allowed and
produces zero worlds for that variant; at least one variant must end
up with positive weight.
The default and dict paths are purely deterministic given
(assignment, num_envs). With assignment={"a": 1, "b": 1} and
num_envs = 8 you always get [0, 0, 0, 0, 1, 1, 1, 1]. There is
no seed involved; rerunning the same config produces the same
partition every time. Note that the partition’s boundaries depend
on num_envs, so world W’s variant is not necessarily stable when
you change num_envs. If you need explicit per-world stability
across batch sizes (e.g. “world 0 is always variant 0, world 1 is
always variant 1, regardless of how many envs I launch”), use a
callable assignment as below.
Variant assignment is fixed at Simulation initialization and does
not resample on episode reset. The intended use is heterogeneous
training across the batch, not per-episode mesh randomization.
Read the resolved assignment from user code via
env.sim.world_to_variant:
>>> env.sim.world_to_variant["object"]
tensor([0, 0, 0, 1, 1, 1, 1, 1, 1, 1])
The mapping is keyed by entity name (without trailing slash) and
returns a (num_envs,) tensor of variant indices in the order
variants were declared in VariantEntityCfg.variants. The dict is
empty for non-variant scenes.
Custom assignment with a callable#
When the weighted default is not what you want, pass a callable to
assignment. The callable receives num_envs and must return a
length-num_envs sequence of variant indices in
[0, len(variants)). The returned sequence’s length and bounds are
validated at sim init; mismatches raise a ValueError naming the
offending entity.
A few patterns:
Round-robin - cycle through variants by world index.
cfg = VariantEntityCfg(
variants={"a": make_a, "b": make_b, "c": make_c},
assignment=lambda n: [w % 3 for w in range(n)],
)
Stratified halves - first half is variant 0, second half is variant 1.
cfg = VariantEntityCfg(
variants={"easy": make_easy, "hard": make_hard},
assignment=lambda n: [0] * (n // 2) + [1] * (n - n // 2),
)
Domain randomization#
Domain randomization on variant scenes preserves per-variant baselines
automatically. When the simulation initializes, mjlab snapshots the
variant-dependent fields as (num_envs, ...) tensors and registers
them in sim.per_world_default_fields. DR operations that read
defaults (scale, additive offsets) detect this registration and index
the per-world default array by environment, so a 10% mass scale
applied across a batch containing a 100 g sphere variant and a 1 kg
cube variant produces 10% perturbations around each variant’s own
mass, not 10% of a shared template mass.
Fields that are not variant-dependent (geom_friction,
dof_armature, dof_damping, and so on) behave identically on
variant and non-variant scenes.
For inertial randomization the recommended path is
dr.pseudo_inertia, which jointly randomizes mass, COM offset,
principal moments of inertia, and principal frame orientation through
the pseudo-inertia matrix factorization of Rucker and Wensing (2022). It is exact for any
perturbation magnitude and remains physically consistent across
variants of different scale. dr.body_mass modifies body_mass
without touching the inertia tensor and emits a UserWarning when
called; it is appropriate only for modeling a point mass added at the
COM, not for density-like randomization. The distinction matters more
on variant scenes than on single-asset scenes because variants often
differ in mass by an order of magnitude.
Viewers#
The native viewer, offscreen renderer, and Viser viewer all sync the
selected environment’s per-world fields into the host MjModel
before rendering, so the rendered geometry matches the variant
assigned to the viewed environment. Switching environments in the
native viewer (the , and . keys) updates the displayed mesh
accordingly.
Viser bakes mesh data into batched handles and cannot rely on a live
view of geom_dataid. It groups worlds by visual fingerprint (mesh
selection, local geom frames, baked appearance) and builds one batched
handle per group, with each environment assigned to its handle. A
scene with N variants typically produces up to N handles per body.
Convex hull visualization is computed per variant from the variant’s
mesh vertices.
Performance#
Per-step cost is unaffected by variant count. Variant-dependent fields are stored as per-world arrays accessed by world index in the existing kernels, with no branching or dispatch on variant.
Construction cost is linear in the total variant count. mjlab
compiles the merged scene once to produce the canonical MjModel,
then compiles each variant’s original (un-merged) source spec in
isolation to recover that variant’s per-body and per-geom mesh-derived
fields. Each per-variant compile sees only that variant’s single body
and mesh, so its cost is independent of the total number of variants
in the scene.
For a scene with one variant entity declaring k variants, construction
runs 1 + k compiles. With multiple variant entities, compiles
decouple across entities: two variant entities of 5 variants each cost
1 + 5 + 5 = 11 compiles, not 1 + 5 * 5 = 26. As an order of
magnitude on CPU with typical procedural meshes, each per-variant
compile takes around 1-2 ms, so a scene with 100 variants pays a few
hundred milliseconds at startup and a scene with 1000 variants pays
roughly two seconds.
The merged spec contains every variant’s mesh assets simultaneously, so memory at scene-build time scales with the total mesh vertex / face count across all variants. This is paid once at startup and does not affect training throughput.
Limitations#
Floating-base only. Each variant’s root body must declare a free joint. Fixed-base variants are rejected; mocap auto-wrapping that applies to non-variant entities is not applied here.
Material assets are not propagated. Each variant’s contype,
conaffinity, condim, friction, mass, density,
group, priority, rgba, solref, solimp, margin,
and gap are restored per-world during compile, but the
material reference on slot geoms inherits whichever material the
template variant set. Use DR on geom_rgba / mat_rgba for
per-world appearance variation.
Assignment is fixed at sim init. There is no API to swap a world to a different variant on episode reset. World W’s mesh asset is whatever it was assigned at init for the lifetime of the simulation. Per-episode mesh randomization is not supported today; DR can vary scalar properties (mass, friction, color, scale) on a fixed variant but cannot swap one mesh for another.
No support for per-world differing kinematic topology. Variants must share the same body tree, joints, and actuator/sensor counts, so you cannot configure things like:
a different number of objects per world (world 0 has two props on the table, world 1 has three);
different articulation per world (world 0’s prop is an articulated drawer with a slider joint, world 1’s prop is a rigid block).
True heterogeneous topology requires upstream support in mujoco_warp that does not currently exist.