mjlab.managers#
Environment managers for actions, observations, rewards, terminations, commands, and curriculum.
Environment managers.
- class mjlab.managers.ActionManager[source]#
Bases:
ManagerBaseManages action processing for the environment.
The action manager aggregates multiple action terms, each controlling a different entity or aspect of the simulation. It splits the policy’s action tensor and routes each slice to the appropriate action term.
- __init__(cfg: dict[str, ActionTermCfg], env: ManagerBasedRlEnv)[source]#
- property action: Tensor#
- get_term(name: str) ActionTerm[source]#
- property prev_action: Tensor#
- property prev_prev_action: Tensor#
- class mjlab.managers.ActionTerm[source]#
Bases:
ManagerTermBaseBase class for action terms.
The action term is responsible for processing the raw actions sent to the environment and applying them to the entity managed by the term.
- __init__(cfg: ActionTermCfg, env: ManagerBasedRlEnv)[source]#
- abstract property raw_action: Tensor#
- class mjlab.managers.ActionTermCfg[source]#
Bases:
ABCConfiguration for an action term.
Action terms process raw actions from the policy and apply them to entities in the scene (e.g., setting joint positions, velocities, or efforts).
- abstractmethod build(env: ManagerBasedRlEnv) ActionTerm[source]#
Build the action term from this config.
- class mjlab.managers.CommandManager[source]#
Bases:
ManagerBaseManages command generation for the environment.
The command manager generates and updates goal commands for the agent (e.g., target velocity, target position). Commands are resampled at configurable intervals and can track metrics for logging.
- __init__(cfg: dict[str, CommandTermCfg], env: ManagerBasedRlEnv)[source]#
- get_term(name: str) CommandTerm[source]#
- get_term_cfg(name: str) CommandTermCfg[source]#
- class mjlab.managers.CommandTerm[source]#
Bases:
ManagerTermBaseBase class for command terms.
- __init__(cfg: CommandTermCfg, env: ManagerBasedRlEnv)[source]#
- abstract property command#
- class mjlab.managers.CommandTermCfg[source]#
Bases:
ABCConfiguration for a command generator term.
Command terms generate goal commands for the agent (e.g., target velocity, target position). Commands are automatically resampled at configurable intervals and can track metrics for logging.
- abstractmethod build(env: ManagerBasedRlEnv) CommandTerm[source]#
Build the command term from this config.
- class mjlab.managers.NullCommandManager[source]#
Bases:
objectPlaceholder for absent command manager that safely no-ops all operations.
- class mjlab.managers.CurriculumManager[source]#
Bases:
ManagerBaseManages curriculum learning for the environment.
The curriculum manager updates environment parameters during training based on agent performance. Each term can modify different aspects of the task difficulty (e.g., terrain complexity, command ranges).
- __init__(cfg: dict[str, CurriculumTermCfg], env: ManagerBasedRlEnv)[source]#
- get_term_cfg(term_name: str) CurriculumTermCfg[source]#
- class mjlab.managers.CurriculumTermCfg[source]#
Bases:
ManagerTermBaseCfgConfiguration for a curriculum term.
Curriculum terms modify environment parameters during training to implement curriculum learning strategies (e.g., gradually increasing task difficulty).
- class mjlab.managers.NullCurriculumManager[source]#
Bases:
objectPlaceholder for absent curriculum manager that safely no-ops all operations.
- class mjlab.managers.EventManager[source]#
Bases:
ManagerBaseManages event-based operations for the environment.
The event manager triggers operations at different simulation events: startup (once at initialization), reset (on episode reset), or interval (periodically during simulation). Common uses include domain randomization and state resets.
- __init__(cfg: dict[str, EventTermCfg], env: ManagerBasedRlEnv)[source]#
- apply(mode: Literal['startup', 'reset', 'interval'], env_ids: Tensor | slice | None = None, dt: float | None = None, global_env_step_count: int | None = None)[source]#
- get_term_cfg(term_name: str) EventTermCfg[source]#
Get the configuration of a specific event term by name.
- class mjlab.managers.EventTermCfg[source]#
Bases:
ManagerTermBaseCfgConfiguration for an event term.
Event terms trigger operations at specific simulation events. They’re commonly used for domain randomization, state resets, and periodic perturbations.
The three modes determine when the event fires:
"startup": Once when the environment initializes. Use for parameters that should be randomized per-environment but stay constant within an episode ( e.g., domain randomization)."reset": On every episode reset. Use for parameters that should vary between episodes (e.g., initial robot pose, domain randomization)."interval": Periodically during simulation, controlled byinterval_range_s. Use for perturbations that should happen during episodes (e.g., pushing the robot, external disturbances).
- __init__(func: Any, params: dict[str, Any] = <factory>, *, mode: EventMode, interval_range_s: tuple[float, float] | None = None, is_global_time: bool = False, min_step_count_between_reset: int = 0, domain_randomization: bool = False) None#
- domain_randomization: bool = False#
Whether this event performs domain randomization. If True, the field name from
params["field"]is tracked and exposed viaEventManager.domain_randomization_fieldsfor logging/debugging.
- interval_range_s: tuple[float, float] | None = None#
Time range in seconds for interval mode. The next trigger time is uniformly sampled from
[min, max]. Required whenmode="interval".
- is_global_time: bool = False#
Whether all environments share the same timer. If True, all envs trigger simultaneously. If False (default), each env has an independent timer that resets on episode reset. Only applies to
mode="interval".
- min_step_count_between_reset: int = 0#
Minimum environment steps between triggers. Prevents the event from firing too frequently when episodes reset rapidly. Only applies to
mode="reset". Set to 0 (default) to trigger on every reset.
- mode: EventMode#
"startup"(once at init),"reset"(every episode), or"interval"(periodically during simulation).- Type:
When the event triggers
- class mjlab.managers.ManagerBase[source]#
Bases:
ABCBase class for all managers.
- __init__(env: ManagerBasedRlEnv)[source]#
- class mjlab.managers.ManagerTermBase[source]#
Bases:
object- __init__(env: ManagerBasedRlEnv)[source]#
- class mjlab.managers.ManagerTermBaseCfg[source]#
Bases:
objectBase configuration for manager terms.
This is the base config for terms in observation, reward, termination, curriculum, and event managers. It provides a common interface for specifying a callable and its parameters.
The
funcfield accepts either a function or a class:Function-based terms are simpler and suitable for stateless computations:
RewardTermCfg(func=mdp.joint_torques_l2, weight=-0.01)
Class-based terms are instantiated with
(cfg, env)and useful when you need to:Cache computed values at initialization (e.g., resolve regex patterns to indices)
Maintain state across calls
Perform expensive setup once rather than every call
class posture: def __init__(self, cfg: RewardTermCfg, env: ManagerBasedRlEnv): # Resolve std dict to tensor once at init self.std = resolve_std_to_tensor(cfg.params["std"], env) def __call__(self, env, **kwargs) -> torch.Tensor: # Use cached self.std return compute_posture_reward(env, self.std) RewardTermCfg(func=posture, params={"std": {".*knee.*": 0.3}}, weight=1.0)
Class-based terms can optionally implement
reset(env_ids)for per-episode state.
- class mjlab.managers.MetricsManager[source]#
Bases:
ManagerBaseAccumulates per-step metric values, reports episode averages.
Unlike rewards, metrics have no weight, no dt scaling, and no normalization by episode length. Episode values are true per-step averages (sum / step_count), so a metric in [0,1] stays in [0,1] in the logger.
- __init__(cfg: dict[str, MetricsTermCfg], env: ManagerBasedRlEnv)[source]#
- class mjlab.managers.MetricsTermCfg[source]#
Bases:
ManagerTermBaseCfgConfiguration for a metrics term.
- class mjlab.managers.NullMetricsManager[source]#
Bases:
objectPlaceholder for absent metrics manager that safely no-ops all operations.
- class mjlab.managers.ObservationGroupCfg[source]#
Bases:
objectConfiguration for an observation group.
An observation group bundles multiple observation terms together. Groups are typically used to separate observations for different purposes (e.g., “actor” for the actor, “critic” for the value function).
- __init__(terms: dict[str, ObservationTermCfg], concatenate_terms: bool = True, concatenate_dim: int = -1, enable_corruption: bool = False, history_length: int | None = None, flatten_history_dim: bool = True, nan_policy: Literal['disabled', 'warn', 'sanitize', 'error'] = 'disabled', nan_check_per_term: bool = True) None#
- concatenate_terms: bool = True#
Whether to concatenate all terms into a single tensor. If False, returns a dict mapping term names to their individual tensors.
- enable_corruption: bool = False#
Whether to apply noise corruption to observations. Set to True during training for domain randomization, False during evaluation.
- flatten_history_dim: bool = True#
Whether to flatten history into the observation dimension. If True, observations have shape
(num_envs, obs_dim * history_length). If False, shape is(num_envs, history_length, obs_dim).
- history_length: int | None = None#
Group-level history length override. If set, applies to all terms in this group. If None, each term uses its own
history_lengthsetting.
- nan_check_per_term: bool = True#
If True, check each observation term individually to identify NaN source. If False, check only the final concatenated output (faster but less informative). Only applies when nan_policy != ‘disabled’.
- nan_policy: Literal['disabled', 'warn', 'sanitize', 'error'] = 'disabled'#
NaN/Inf handling policy for observations in this group.
‘disabled’: No checks (default, fastest)
‘warn’: Log warning with term name and env IDs, then sanitize (debugging)
‘sanitize’: Silent sanitization to 0.0 like reward manager (safe for production)
‘error’: Raise ValueError on NaN/Inf (strict development mode)
- terms: dict[str, ObservationTermCfg]#
Dictionary mapping term names to their configurations.
- class mjlab.managers.ObservationManager[source]#
Bases:
ManagerBaseManages observation computation for the environment.
The observation manager computes observations from multiple terms organized into groups. Each term can have noise, clipping, scaling, delay, and history applied. Groups can optionally concatenate their terms into a single tensor.
- __init__(cfg: dict[str, ObservationGroupCfg], env)[source]#
- get_term_cfg(group_name: str, term_name: str) ObservationTermCfg[source]#
- class mjlab.managers.ObservationTermCfg[source]#
Bases:
ManagerTermBaseCfgConfiguration for an observation term.
Processing pipeline: compute → noise → clip → scale → delay → history. Delay models sensor latency. History provides temporal context. Both are optional and can be combined.
- __init__(func: Any, params: dict[str, Any] = <factory>, noise: ~mjlab.utils.noise.noise_cfg.NoiseCfg | ~mjlab.utils.noise.noise_cfg.NoiseModelCfg | None = None, clip: tuple[float, float] | None = None, scale: tuple[float, ...] | float | ~torch.Tensor | None = None, delay_min_lag: int = 0, delay_max_lag: int = 0, delay_per_env: bool = True, delay_hold_prob: float = 0.0, delay_update_period: int = 0, delay_per_env_phase: bool = True, history_length: int = 0, flatten_history_dim: bool = True) None#
- delay_hold_prob: float = 0.0#
Probability of reusing the previous lag instead of resampling. Useful for temporally correlated latency patterns.
- delay_max_lag: int = 0#
Maximum lag (in steps) for delayed observations. Use min=max for constant delay.
- delay_min_lag: int = 0#
Minimum lag (in steps) for delayed observations. Lag sampled uniformly from [min_lag, max_lag]. Convert to ms: lag * (1000 / control_hz).
- delay_per_env: bool = True#
If True, each environment samples its own lag. If False, all environments share the same lag at each step.
- delay_per_env_phase: bool = True#
If True and update_period > 0, stagger update timing across envs to avoid synchronized resampling.
- delay_update_period: int = 0#
Resample lag every N steps (models multi-rate sensors). If 0, update every step.
- class mjlab.managers.RewardManager[source]#
Bases:
ManagerBaseManages reward computation by aggregating weighted reward terms.
- Reward Scaling Behavior:
By default, rewards are scaled by the environment step duration (dt). This normalizes cumulative episodic rewards across different simulation frequencies. The scaling can be disabled via the
scale_by_dtparameter.- When
scale_by_dt=True(default): reward_buf(returned bycompute()) = raw_value * weight * dt_episode_sums(cumulative rewards) are scaled by dtEpisode_Reward/*logged metrics are scaled by dt
- When
scale_by_dt=False: reward_buf= raw_value * weight (no dt scaling)
- Regardless of the scaling setting:
_step_reward(viaget_active_iterable_terms()) always contains the unscaled reward rate (raw_value * weight)
- When
- __init__(cfg: dict[str, RewardTermCfg], env: ManagerBasedRlEnv, *, scale_by_dt: bool = True)[source]#
- get_term_cfg(term_name: str) RewardTermCfg[source]#
- class mjlab.managers.RewardTermCfg[source]#
Bases:
ManagerTermBaseCfgConfiguration for a reward term.
- class mjlab.managers.SceneEntityCfg[source]#
Bases:
objectConfiguration for a scene entity that is used by the manager’s term.
This configuration allows flexible specification of entity components either by name or by ID. During resolution, it ensures consistency between names and IDs, and can optimize to slice(None) when all components are selected.
- __init__(name: str, joint_names: str | tuple[str, ...] | None = None, joint_ids: list[int] | slice = <factory>, body_names: str | tuple[str, ...] | None = None, body_ids: list[int] | slice = <factory>, geom_names: str | tuple[str, ...] | None = None, geom_ids: list[int] | slice = <factory>, site_names: str | tuple[str, ...] | None = None, site_ids: list[int] | slice = <factory>, actuator_names: str | list[str] | None = None, actuator_ids: list[int] | slice = <factory>, preserve_order: bool = False) None#
- actuator_names: str | list[str] | None = None#
Names of actuators to include. Can be a single string or list.
- body_names: str | tuple[str, ...] | None = None#
Names of bodies to include. Can be a single string or tuple.
- geom_names: str | tuple[str, ...] | None = None#
Names of geometries to include. Can be a single string or tuple.
- joint_names: str | tuple[str, ...] | None = None#
Names of joints to include. Can be a single string or tuple.
- resolve(scene: Scene) None[source]#
Resolve names and IDs for all configured fields.
This method ensures consistency between names and IDs for each field type. It handles three cases: 1. Both names and IDs provided: Validates they match 2. Only names provided: Computes IDs (optimizes to slice(None) if all selected) 3. Only IDs provided: Computes names
- Parameters:
scene – The scene containing the entity to resolve against.
- Raises:
ValueError – If provided names and IDs are inconsistent.
KeyError – If the entity name is not found in the scene.
- class mjlab.managers.TerminationManager[source]#
Bases:
ManagerBaseManages termination conditions for the environment.
The termination manager aggregates multiple termination terms to compute episode done signals. Terms can be either truncations (time-based) or terminations (failure conditions).
- __init__(cfg: dict[str, TerminationTermCfg], env: ManagerBasedRlEnv)[source]#
- property dones: Tensor#
- get_term_cfg(term_name: str) TerminationTermCfg[source]#
- reset(env_ids: Tensor | slice | None = None) dict[str, Tensor][source]#
Resets the manager and returns logging info for the current step.
- property terminated: Tensor#
- property time_outs: Tensor#