mjlab.managers#
Environment managers.
Classes
Base#
- class mjlab.managers.ManagerBase[source]#
Bases:
ABCBase class for all managers.
- __init__(env: ManagerBasedRlEnv)[source]#
- class mjlab.managers.ManagerTermBase[source]#
Bases:
object- __init__(env: ManagerBasedRlEnv)[source]#
- class mjlab.managers.ManagerTermBaseCfg[source]#
Base configuration for manager terms.
This is the base config for terms in observation, reward, termination, curriculum, and event managers. It provides a common interface for specifying a callable and its parameters.
The
funcfield accepts either a function or a class:Function-based terms are simpler and suitable for stateless computations:
RewardTermCfg(func=mdp.joint_torques_l2, weight=-0.01)
Class-based terms are instantiated with
(cfg, env)and useful when you need to:Cache computed values at initialization (e.g., resolve regex patterns to indices)
Maintain state across calls
Perform expensive setup once rather than every call
class posture: def __init__(self, cfg: RewardTermCfg, env: ManagerBasedRlEnv): # Resolve std dict to tensor once at init self.std = resolve_std_to_tensor(cfg.params["std"], env) def __call__(self, env, **kwargs) -> torch.Tensor: # Use cached self.std return compute_posture_reward(env, self.std) RewardTermCfg(func=posture, params={"std": {".*knee.*": 0.3}}, weight=1.0)
Class-based terms can optionally implement
reset(env_ids)for per-episode state.
- class mjlab.managers.SceneEntityCfg[source]#
Configuration for a scene entity that is used by the manager’s term.
This configuration allows flexible specification of entity components either by name or by ID. During resolution, it ensures consistency between names and IDs, and can optimize to slice(None) when all components are selected.
- joint_names: str | tuple[str, ...] | None = None#
Names of joints to include. Can be a single string or tuple.
- body_names: str | tuple[str, ...] | None = None#
Names of bodies to include. Can be a single string or tuple.
- geom_names: str | tuple[str, ...] | None = None#
Names of geometries to include. Can be a single string or tuple.
- site_names: str | tuple[str, ...] | None = None#
Names of sites to include. Can be a single string or tuple.
- actuator_names: str | list[str] | None = None#
Names of actuators to include. Can be a single string or list.
- tendon_names: str | tuple[str, ...] | None = None#
Names of tendons to include. Can be a single string or tuple.
- camera_names: str | tuple[str, ...] | None = None#
Names of cameras to include. Can be a single string or tuple.
- light_names: str | tuple[str, ...] | None = None#
Names of lights to include. Can be a single string or tuple.
- material_names: str | tuple[str, ...] | None = None#
Names of materials to include. Can be a single string or tuple.
- pair_names: str | tuple[str, ...] | None = None#
Names of contact pairs to include. Can be a single string or tuple.
- resolve(scene: Scene) None[source]#
Resolve names and IDs for all configured fields.
This method ensures consistency between names and IDs for each field type. It handles three cases: 1. Both names and IDs provided: Validates they match 2. Only names provided: Computes IDs (optimizes to slice(None) if all selected) 3. Only IDs provided: Computes names
- Parameters:
scene – The scene containing the entity to resolve against.
- Raises:
ValueError – If provided names and IDs are inconsistent.
KeyError – If the entity name is not found in the scene.
Action Manager#
- class mjlab.managers.ActionManager[source]#
Bases:
ManagerBaseManages action processing for the environment.
The action manager aggregates multiple action terms, each controlling a different entity or aspect of the simulation. It splits the policy’s action tensor and routes each slice to the appropriate action term.
- __init__(cfg: dict[str, ActionTermCfg], env: ManagerBasedRlEnv)[source]#
- property action: Tensor#
Raw policy output from the current step, before per-term scale/offset. Shape:
(num_envs, total_action_dim).
- property prev_action: Tensor#
Raw policy output from the previous step, before per-term scale/offset. Shape:
(num_envs, total_action_dim).
- property prev_prev_action: Tensor#
Raw policy output from two steps ago, before per-term scale/offset. Shape:
(num_envs, total_action_dim).
- reset(env_ids: Tensor | slice | None = None) dict[str, float][source]#
Resets the manager and returns logging info for the current step.
- process_action(action: Tensor) None[source]#
Store the raw policy output and route slices to each action term.
Called once per policy step. The raw action tensor is saved into the history buffers (
action,prev_action,prev_prev_action) before any per-term scale/offset is applied. Each term then receives its slice and independently applies its own affine transformation viaActionTerm.process_actions().
- class mjlab.managers.ActionTerm[source]#
Bases:
ManagerTermBaseBase class for action terms.
The action term is responsible for processing the raw actions sent to the environment and applying them to the entity managed by the term.
- __init__(cfg: ActionTermCfg, env: ManagerBasedRlEnv)[source]#
- class mjlab.managers.ActionTermCfg[source]#
Configuration for an action term.
Action terms process raw actions from the policy and apply them to entities in the scene (e.g., setting joint positions, velocities, or efforts).
- clip: dict[str, tuple] | None = None#
Optional clipping bounds applied to processed actions (after scale and offset). Dict maps actuator name regex patterns to (min, max) tuples, resolved the same way as
scaleandoffset.
- abstractmethod build(env: ManagerBasedRlEnv) ActionTerm[source]#
Build the action term from this config.
Observation Manager#
- class mjlab.managers.ObservationManager[source]#
Bases:
ManagerBaseManages observation computation for the environment.
The observation manager computes observations from multiple terms organized into groups. Each term can have noise, clipping, scaling, delay, and history applied. Groups can optionally concatenate their terms into a single tensor.
- __init__(cfg: dict[str, ObservationGroupCfg], env)[source]#
- class mjlab.managers.ObservationGroupCfg[source]#
Configuration for an observation group.
An observation group bundles multiple observation terms together. Groups are typically used to separate observations for different purposes (e.g., “actor” for the actor, “critic” for the value function).
- terms: dict[str, ObservationTermCfg]#
Dictionary mapping term names to their configurations.
- concatenate_terms: bool = True#
Whether to concatenate all terms into a single tensor. If False, returns a dict mapping term names to their individual tensors.
- enable_corruption: bool = False#
Whether to apply noise corruption to observations. Set to True during training for domain randomization, False during evaluation.
- history_length: int | None = None#
Group-level history length override. If set, applies to all terms in this group. If None, each term uses its own
history_lengthsetting.
- flatten_history_dim: bool = True#
Whether to flatten history into the observation dimension. If True, observations have shape
(num_envs, obs_dim * history_length). If False, shape is(num_envs, history_length, obs_dim).
- nan_policy: Literal['disabled', 'warn', 'sanitize', 'error'] = 'disabled'#
NaN/Inf handling policy for observations in this group.
‘disabled’: No checks (default, fastest)
‘warn’: Log warning with term name and env IDs, then sanitize (debugging)
‘sanitize’: Silent sanitization to 0.0 like reward manager (safe for production)
‘error’: Raise ValueError on NaN/Inf (strict development mode)
- class mjlab.managers.ObservationTermCfg[source]#
Configuration for an observation term.
Processing pipeline: compute → noise → clip → scale → delay → history. Delay models sensor latency. History provides temporal context. Both are optional and can be combined.
- scale: tuple[float, ...] | float | Tensor | None = None#
Scaling factor(s) to multiply the observation by.
- delay_min_lag: int = 0#
Minimum lag (in steps) for delayed observations. Lag sampled uniformly from [min_lag, max_lag]. Convert to ms: lag * (1000 / control_hz).
- delay_max_lag: int = 0#
Maximum lag (in steps) for delayed observations. Use min=max for constant delay.
- delay_per_env: bool = True#
If True, each environment samples its own lag. If False, all environments share the same lag at each step.
- delay_hold_prob: float = 0.0#
Probability of reusing the previous lag instead of resampling. Useful for temporally correlated latency patterns.
- delay_update_period: int = 0#
Resample lag every N steps (models multi-rate sensors). If 0, update every step.
Reward Manager#
- class mjlab.managers.RewardManager[source]#
Bases:
ManagerBaseManages reward computation by aggregating weighted reward terms.
- Reward Scaling Behavior:
By default, rewards are scaled by the environment step duration (dt). This normalizes cumulative episodic rewards across different simulation frequencies. The scaling can be disabled via the
scale_by_dtparameter.- When
scale_by_dt=True(default): reward_buf(returned bycompute()) = raw_value * weight * dt_episode_sums(cumulative rewards) are scaled by dtEpisode_Reward/*logged metrics are scaled by dt
- When
scale_by_dt=False: reward_buf= raw_value * weight (no dt scaling)
- Regardless of the scaling setting:
_step_reward(viaget_active_iterable_terms()) always contains the unscaled reward rate (raw_value * weight)
- When
- __init__(cfg: dict[str, RewardTermCfg], env: ManagerBasedRlEnv, *, scale_by_dt: bool = True)[source]#
- reset(env_ids: Tensor | slice | None = None) dict[str, Tensor][source]#
Resets the manager and returns logging info for the current step.
Termination Manager#
- class mjlab.managers.TerminationManager[source]#
Bases:
ManagerBaseManages termination conditions for the environment.
The termination manager aggregates multiple termination terms to compute episode done signals. Terms can be either truncations (time-based) or terminations (failure conditions).
- __init__(cfg: dict[str, TerminationTermCfg], env: ManagerBasedRlEnv)[source]#
Command Manager#
- class mjlab.managers.CommandManager[source]#
Bases:
ManagerBaseManages command generation for the environment.
The command manager generates and updates goal commands for the agent (e.g., target velocity, target position). Commands are resampled at configurable intervals and can track metrics for logging.
- __init__(cfg: dict[str, CommandTermCfg], env: ManagerBasedRlEnv)[source]#
- create_gui(server: viser.ViserServer, get_env_idx: Callable[[], int], on_change: Callable[[], None] | None = None, request_action: Callable[[str, Any], None] | None = None) None[source]#
Let each command term create its GUI controls.
- apply_gui_reset(env_ids: Tensor) bool[source]#
Apply GUI-selected state from all terms. Returns True if any applied.
- class mjlab.managers.NullCommandManager[source]#
Bases:
objectPlaceholder for absent command manager that safely no-ops all operations.
- class mjlab.managers.CommandTerm[source]#
Bases:
ManagerTermBaseBase class for command terms.
- __init__(cfg: CommandTermCfg, env: ManagerBasedRlEnv)[source]#
- create_gui(name: str, server: viser.ViserServer, get_env_idx: Callable[[], int], on_change: Callable[[], None] | None = None, request_action: Callable[[str, Any], None] | None = None) None[source]#
Create interactive GUI controls for this command term.
Override in subclasses to add task-specific controls (e.g., velocity sliders) to the Viser viewer. Called once during viewer setup.
The name argument is the term’s key in the command manager config (e.g.,
"twist").
- class mjlab.managers.CommandTermCfg[source]#
Configuration for a command generator term.
Command terms generate goal commands for the agent (e.g., target velocity, target position). Commands are automatically resampled at configurable intervals and can track metrics for logging.
- resampling_time_range: tuple[float, float]#
Time range in seconds for command resampling. When the timer expires, a new command is sampled and the timer is reset to a value uniformly drawn from
[min, max]. Set both values equal for fixed-interval resampling.
- debug_vis: bool = False#
Whether to enable debug visualization for this command term. When True, the command term’s
_debug_vis_implmethod is called each frame to render visual aids (e.g., velocity arrows, target markers).
- abstractmethod build(env: ManagerBasedRlEnv) CommandTerm[source]#
Build the command term from this config.
Curriculum Manager#
- class mjlab.managers.CurriculumManager[source]#
Bases:
ManagerBaseManages curriculum learning for the environment.
The curriculum manager updates environment parameters during training based on agent performance. Each term can modify different aspects of the task difficulty (e.g., terrain complexity, command ranges).
- __init__(cfg: dict[str, CurriculumTermCfg], env: ManagerBasedRlEnv)[source]#
Event Manager#
- class mjlab.managers.EventManager[source]#
Bases:
ManagerBaseManages event-based operations for the environment.
The event manager triggers operations at different simulation events: startup (once at initialization), reset (on episode reset), or interval (periodically during simulation). Common uses include domain randomization and state resets.
- __init__(cfg: dict[str, EventTermCfg], env: ManagerBasedRlEnv)[source]#
- get_term_cfg(term_name: str) EventTermCfg[source]#
Get the configuration of a specific event term by name.
- class mjlab.managers.EventTermCfg[source]#
Configuration for an event term.
Event terms trigger operations at specific simulation events. They’re commonly used for domain randomization, state resets, and periodic perturbations.
The four modes determine when the event fires:
"startup": Once when the environment initializes. Use for parameters that should be randomized per-environment but stay constant within an episode (e.g., domain randomization)."reset": On every episode reset. Use for parameters that should vary between episodes (e.g., initial robot pose, domain randomization)."interval": Periodically during simulation, controlled byinterval_range_s. Use for perturbations that should happen during episodes (e.g., pushing the robot, external disturbances)."step": Every environment step, unconditionally on all envs. Use for terms that manage per-step state such as force lifetimes (e.g.,apply_body_impulse).
- mode: EventMode#
"startup"(once at init),"reset"(every episode),"interval"(periodically during simulation), or"step"(every environment step).- Type:
When the event triggers
- interval_range_s: tuple[float, float] | None = None#
Time range in seconds for interval mode. The next trigger time is uniformly sampled from
[min, max]. Required whenmode="interval".
Metrics Manager#
- class mjlab.managers.MetricsManager[source]#
Bases:
ManagerBaseAccumulates per-step metric values, reports episode averages.
Unlike rewards, metrics have no weight, no dt scaling, and no normalization by episode length. Episode values are true per-step averages (sum / step_count), so a metric in [0,1] stays in [0,1] in the logger.
- __init__(cfg: dict[str, MetricsTermCfg], env: ManagerBasedRlEnv)[source]#
- class mjlab.managers.NullMetricsManager[source]#
Bases:
objectPlaceholder for absent metrics manager that safely no-ops all operations.
- class mjlab.managers.MetricsTermCfg[source]#
Configuration for a metrics term.
- decimation loop and report the per-step mean. Only the integrated state
- (qpos, qvel, act) is current mid-loop; all derived quantities (xpos, xquat,
- site_xpos, actuator_force, contacts, ...) are stale.
- reduce#
How to aggregate per-step values into an episode metric.
- Type:
Literal[‘mean’, ‘last’]
- ``"mean"``
- Type:
default
- the value from the final step of the episode, which is useful for binary
- success metrics that should not be averaged over timesteps.
Recorder Manager#
- class mjlab.managers.RecorderManager[source]#
Bases:
ManagerBaseOrchestrates recorder terms during environment rollouts.
Holds a collection of
RecorderTerminstances and calls their lifecycle methods at the appropriate points in the environment loop. The manager has no opinion on how data is stored; each term handles its own I/O entirely.Register terms by adding them to the
recordersdict onManagerBasedRlEnvCfg. If the dict is empty, the environment substitutes aNullRecorderManagerwith zero overhead.- __init__(cfg: dict[str, RecorderTermCfg], env: ManagerBasedRlEnv)[source]#
- record_pre_reset(env_ids: Tensor) None[source]#
Forward to each term’s
RecorderTerm.record_pre_reset().
- record_post_reset(env_ids: Tensor) None[source]#
Forward to each term’s
RecorderTerm.record_post_reset().
- record_post_step() None[source]#
Forward to each term’s
RecorderTerm.record_post_step().
- close() None[source]#
Forward to each term’s
RecorderTerm.close().
- class mjlab.managers.NullRecorderManager[source]#
Bases:
objectNo-op fallback used when no recorder terms are configured.
All methods are no-ops. This class is not a
ManagerBasesubclass so it carries zero overhead.
- class mjlab.managers.RecorderTerm[source]#
Bases:
ManagerTermBaseBase class for recorder terms.
Override only the lifecycle methods you need. Each method is a no-op by default so subclasses are not required to implement all of them.
The environment is available as
self._env, giving access toself._env.obs_buf,self._env.action_manager.action, and all other environment state.Example:
class CsvRecorder(RecorderTerm): def __init__(self, cfg, env): super().__init__(cfg, env) self._file = open(cfg.params["path"], "w", newline="") self._writer = csv.writer(self._file) def record_pre_reset(self, env_ids): # Terminal transition: action is still intact here. # It will be zeroed by _reset_idx immediately after this returns. obs = self._env.obs_buf["actor"][env_ids].cpu().numpy() act = self._env.action_manager.action[env_ids].cpu().numpy() for o, a in zip(obs, act): self._writer.writerow(o.tolist() + a.tolist()) def record_post_step(self): # Skip envs that just reset: their terminal pair was written in record_pre_reset # and their action is now zeroed. mask = ~self._env.reset_buf obs = self._env.obs_buf["actor"][mask].cpu().numpy() act = self._env.action_manager.action[mask].cpu().numpy() for o, a in zip(obs, act): self._writer.writerow(o.tolist() + a.tolist()) def close(self): self._file.close()
- __init__(cfg: RecorderTermCfg, env: ManagerBasedRlEnv)[source]#
- record_pre_reset(env_ids: Tensor) None[source]#
Called in
env.step()before terminated environments are reset.What is available:
obs_bufcontains the observation from the end of the previous step (the input the agent used to choose the terminal action). It does not contain the post-action terminal observation (the state reached after applying the action), which is never computed for resetting environments.action_manager.actioncontains the action applied during this step. This is the correct terminal action. It will be zeroed for these environments by_reset_idximmediately after this hook returns, so capture it here if you need it later.reward_bufcontains the reward for this terminal step.reset_terminatedandreset_time_outsreflect why each environment is resetting.
This is the right hook to record the terminal transition
(obs_t, action_t, reward_t, done=True)for each resetting environment.- Parameters:
env_ids – Indices of environments that are about to be reset.
- record_post_reset(env_ids: Tensor) None[source]#
Called after a reset completes with fresh observations computed.
Fires at the end of
env.reset()(covering all environments on the initial call) and withinenv.step()for each batch of environments that terminates, after state has been overwritten and new observations computed.At this point
obs_buf[env_ids]holds the initial observation of the new episode andaction_manager.action[env_ids]is zero (no action has been taken in the new episode yet).Use this hook to initialize per-episode state or record the first observation of a new episode.
- Parameters:
env_ids – Indices of environments that were reset.
- record_post_step() None[source]#
Called at the end of every
env.step()with fresh observations.At this point
obs_bufholds the new observation for every environment andaction_manager.actionholds the action that was applied during this step. Exception: for environments that reset during this step,action_manager.actionhas been zeroed by_reset_idxandobs_bufholds the initial observation of the new episode rather than the post-action terminal observation. Userecord_pre_resetto capture the terminal(obs, action)pair for those environments. Resetting environments are identified byself._env.reset_buf.
- class mjlab.managers.RecorderTermCfg[source]#
Configuration for a recorder term.
funcmust be aRecorderTermsubclass. Function-based terms are not supported because recorder terms are stateful (file handles, buffers, etc.).