mjlab.managers

Contents

mjlab.managers#

Environment managers for actions, observations, rewards, terminations, commands, and curriculum.

Environment managers.

Classes:

ActionManager

Manages action processing for the environment.

ActionTerm

Base class for action terms.

ActionTermCfg

Configuration for an action term.

CommandManager

Manages command generation for the environment.

CommandTerm

Base class for command terms.

CommandTermCfg

Configuration for a command generator term.

NullCommandManager

Placeholder for absent command manager that safely no-ops all operations.

CurriculumManager

Manages curriculum learning for the environment.

CurriculumTermCfg

Configuration for a curriculum term.

NullCurriculumManager

Placeholder for absent curriculum manager that safely no-ops all operations.

EventManager

Manages event-based operations for the environment.

EventTermCfg

Configuration for an event term.

ManagerBase

Base class for all managers.

ManagerTermBase

ManagerTermBaseCfg

Base configuration for manager terms.

MetricsManager

Accumulates per-step metric values, reports episode averages.

MetricsTermCfg

Configuration for a metrics term.

NullMetricsManager

Placeholder for absent metrics manager that safely no-ops all operations.

ObservationGroupCfg

Configuration for an observation group.

ObservationManager

Manages observation computation for the environment.

ObservationTermCfg

Configuration for an observation term.

RewardManager

Manages reward computation by aggregating weighted reward terms.

RewardTermCfg

Configuration for a reward term.

SceneEntityCfg

Configuration for a scene entity that is used by the manager's term.

TerminationManager

Manages termination conditions for the environment.

TerminationTermCfg

Configuration for a termination term.

class mjlab.managers.ActionManager[source]#

Bases: ManagerBase

Manages action processing for the environment.

The action manager aggregates multiple action terms, each controlling a different entity or aspect of the simulation. It splits the policy’s action tensor and routes each slice to the appropriate action term.

Methods:

__init__(cfg, env)

get_term(name)

reset([env_ids])

Resets the manager and returns logging info for the current step.

process_action(action)

apply_action()

get_active_iterable_terms(env_idx)

Attributes:

__init__(cfg: dict[str, ActionTermCfg], env: ManagerBasedRlEnv)[source]#
property total_action_dim: int#
property action_term_dim: list[int]#
property action: Tensor#
property prev_action: Tensor#
property prev_prev_action: Tensor#
property active_terms: list[str]#
get_term(name: str) ActionTerm[source]#
reset(env_ids: Tensor | slice | None = None) dict[str, float][source]#

Resets the manager and returns logging info for the current step.

process_action(action: Tensor) None[source]#
apply_action() None[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
class mjlab.managers.ActionTerm[source]#

Bases: ManagerTermBase

Base class for action terms.

The action term is responsible for processing the raw actions sent to the environment and applying them to the entity managed by the term.

Methods:

__init__(cfg, env)

process_actions(actions)

apply_actions()

Attributes:

__init__(cfg: ActionTermCfg, env: ManagerBasedRlEnv)[source]#
abstract property action_dim: int#
abstractmethod process_actions(actions: Tensor) None[source]#
abstractmethod apply_actions() None[source]#
abstract property raw_action: Tensor#
class mjlab.managers.ActionTermCfg[source]#

Bases: ABC

Configuration for an action term.

Action terms process raw actions from the policy and apply them to entities in the scene (e.g., setting joint positions, velocities, or efforts).

Attributes:

entity_name

Name of the entity in the scene that this action term controls.

clip

Optional clipping bounds per transmission type.

Methods:

build(env)

Build the action term from this config.

__init__(*, entity_name[, clip])

entity_name: str#

Name of the entity in the scene that this action term controls.

clip: dict[str, tuple] | None = None#

Optional clipping bounds per transmission type. Maps transmission name (e.g., ‘position’, ‘velocity’) to (min, max) tuple.

abstractmethod build(env: ManagerBasedRlEnv) ActionTerm[source]#

Build the action term from this config.

__init__(*, entity_name: str, clip: dict[str, tuple] | None = None) None#
class mjlab.managers.CommandManager[source]#

Bases: ManagerBase

Manages command generation for the environment.

The command manager generates and updates goal commands for the agent (e.g., target velocity, target position). Commands are resampled at configurable intervals and can track metrics for logging.

Methods:

__init__(cfg, env)

debug_vis(visualizer)

get_active_iterable_terms(env_idx)

reset(env_ids)

Resets the manager and returns logging info for the current step.

compute(dt)

get_command(name)

get_term(name)

get_term_cfg(name)

Attributes:

__init__(cfg: dict[str, CommandTermCfg], env: ManagerBasedRlEnv)[source]#
debug_vis(visualizer: DebugVisualizer) None[source]#
property active_terms: list[str]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
reset(env_ids: Tensor | None) dict[str, Tensor][source]#

Resets the manager and returns logging info for the current step.

compute(dt: float)[source]#
get_command(name: str) Tensor[source]#
get_term(name: str) CommandTerm[source]#
get_term_cfg(name: str) CommandTermCfg[source]#
class mjlab.managers.CommandTerm[source]#

Bases: ManagerTermBase

Base class for command terms.

Methods:

__init__(cfg, env)

debug_vis(visualizer)

reset(env_ids)

Resets the manager term.

compute(dt)

Attributes:

__init__(cfg: CommandTermCfg, env: ManagerBasedRlEnv)[source]#
debug_vis(visualizer: DebugVisualizer) None[source]#
abstract property command#
reset(env_ids: Tensor | slice | None) dict[str, float][source]#

Resets the manager term.

compute(dt: float) None[source]#
class mjlab.managers.CommandTermCfg[source]#

Bases: ABC

Configuration for a command generator term.

Command terms generate goal commands for the agent (e.g., target velocity, target position). Commands are automatically resampled at configurable intervals and can track metrics for logging.

Attributes:

resampling_time_range

Time range in seconds for command resampling.

debug_vis

Whether to enable debug visualization for this command term.

Methods:

build(env)

Build the command term from this config.

__init__(*, resampling_time_range[, debug_vis])

resampling_time_range: tuple[float, float]#

Time range in seconds for command resampling. When the timer expires, a new command is sampled and the timer is reset to a value uniformly drawn from [min, max]. Set both values equal for fixed-interval resampling.

debug_vis: bool = False#

Whether to enable debug visualization for this command term. When True, the command term’s _debug_vis_impl method is called each frame to render visual aids (e.g., velocity arrows, target markers).

abstractmethod build(env: ManagerBasedRlEnv) CommandTerm[source]#

Build the command term from this config.

__init__(*, resampling_time_range: tuple[float, float], debug_vis: bool = False) None#
class mjlab.managers.NullCommandManager[source]#

Bases: object

Placeholder for absent command manager that safely no-ops all operations.

Methods:

__init__()

debug_vis(visualizer)

get_active_iterable_terms(env_idx)

reset([env_ids])

compute(dt)

get_command(name)

get_term(name)

get_term_cfg(name)

__init__()[source]#
debug_vis(visualizer: DebugVisualizer) None[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
reset(env_ids: Tensor | None = None) dict[str, Tensor][source]#
compute(dt: float) None[source]#
get_command(name: str) None[source]#
get_term(name: str) None[source]#
get_term_cfg(name: str) None[source]#
class mjlab.managers.CurriculumManager[source]#

Bases: ManagerBase

Manages curriculum learning for the environment.

The curriculum manager updates environment parameters during training based on agent performance. Each term can modify different aspects of the task difficulty (e.g., terrain complexity, command ranges).

Methods:

__init__(cfg, env)

get_term_cfg(term_name)

get_active_iterable_terms(env_idx)

reset([env_ids])

Resets the manager and returns logging info for the current step.

compute([env_ids])

Attributes:

__init__(cfg: dict[str, CurriculumTermCfg], env: ManagerBasedRlEnv)[source]#
property active_terms: list[str]#
get_term_cfg(term_name: str) CurriculumTermCfg[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
reset(env_ids: Tensor | slice | None = None) dict[str, float][source]#

Resets the manager and returns logging info for the current step.

compute(env_ids: Tensor | slice | None = None)[source]#
class mjlab.managers.CurriculumTermCfg[source]#

Bases: ManagerTermBaseCfg

Configuration for a curriculum term.

Curriculum terms modify environment parameters during training to implement curriculum learning strategies (e.g., gradually increasing task difficulty).

Methods:

__init__(func[, params])

__init__(func: ~typing.Any, params: dict[str, ~typing.Any] = <factory>) None#
class mjlab.managers.NullCurriculumManager[source]#

Bases: object

Placeholder for absent curriculum manager that safely no-ops all operations.

Methods:

__init__()

get_active_iterable_terms(env_idx)

reset([env_ids])

compute([env_ids])

__init__()[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
reset(env_ids: Tensor | None = None) dict[str, float][source]#
compute(env_ids: Tensor | None = None) None[source]#
class mjlab.managers.EventManager[source]#

Bases: ManagerBase

Manages event-based operations for the environment.

The event manager triggers operations at different simulation events: startup (once at initialization), reset (on episode reset), or interval (periodically during simulation). Common uses include domain randomization and state resets.

Methods:

__init__(cfg, env)

get_term_cfg(term_name)

Get the configuration of a specific event term by name.

reset([env_ids])

Resets the manager and returns logging info for the current step.

apply(mode[, env_ids, dt, global_env_step_count])

Attributes:

__init__(cfg: dict[str, EventTermCfg], env: ManagerBasedRlEnv)[source]#
property active_terms: dict[Literal['startup', 'reset', 'interval'], list[str]]#
property available_modes: list[Literal['startup', 'reset', 'interval']]#
property domain_randomization_fields: tuple[str, ...]#
get_term_cfg(term_name: str) EventTermCfg[source]#

Get the configuration of a specific event term by name.

reset(env_ids: Tensor | None = None)[source]#

Resets the manager and returns logging info for the current step.

apply(mode: Literal['startup', 'reset', 'interval'], env_ids: Tensor | slice | None = None, dt: float | None = None, global_env_step_count: int | None = None)[source]#
class mjlab.managers.EventTermCfg[source]#

Bases: ManagerTermBaseCfg

Configuration for an event term.

Event terms trigger operations at specific simulation events. They’re commonly used for domain randomization, state resets, and periodic perturbations.

The three modes determine when the event fires:

  • "startup": Once when the environment initializes. Use for parameters that should be randomized per-environment but stay constant within an episode ( e.g., domain randomization).

  • "reset": On every episode reset. Use for parameters that should vary between episodes (e.g., initial robot pose, domain randomization).

  • "interval": Periodically during simulation, controlled by interval_range_s. Use for perturbations that should happen during episodes (e.g., pushing the robot, external disturbances).

Attributes:

mode

"startup" (once at init), "reset" (every episode), or "interval" (periodically during simulation).

interval_range_s

Time range in seconds for interval mode.

is_global_time

Whether all environments share the same timer.

min_step_count_between_reset

Minimum environment steps between triggers.

domain_randomization

Whether this event performs domain randomization.

Methods:

__init__(func[, params, interval_range_s, ...])

mode: EventMode#

"startup" (once at init), "reset" (every episode), or "interval" (periodically during simulation).

Type:

When the event triggers

interval_range_s: tuple[float, float] | None = None#

Time range in seconds for interval mode. The next trigger time is uniformly sampled from [min, max]. Required when mode="interval".

is_global_time: bool = False#

Whether all environments share the same timer. If True, all envs trigger simultaneously. If False (default), each env has an independent timer that resets on episode reset. Only applies to mode="interval".

min_step_count_between_reset: int = 0#

Minimum environment steps between triggers. Prevents the event from firing too frequently when episodes reset rapidly. Only applies to mode="reset". Set to 0 (default) to trigger on every reset.

domain_randomization: bool = False#

Whether this event performs domain randomization. If True, the field name from params["field"] is tracked and exposed via EventManager.domain_randomization_fields for logging/debugging.

__init__(func: Any, params: dict[str, Any] = <factory>, *, mode: EventMode, interval_range_s: tuple[float, float] | None = None, is_global_time: bool = False, min_step_count_between_reset: int = 0, domain_randomization: bool = False) None#
func: Any#

The callable that computes this term’s value. Can be a function or a class. Classes are auto-instantiated with (cfg=term_cfg, env=env).

params: dict[str, Any]#

Additional keyword arguments passed to func when called.

class mjlab.managers.ManagerBase[source]#

Bases: ABC

Base class for all managers.

Methods:

__init__(env)

reset(env_ids)

Resets the manager and returns logging info for the current step.

get_active_iterable_terms(env_idx)

Attributes:

__init__(env: ManagerBasedRlEnv)[source]#
property num_envs: int#
property device: str#
abstract property active_terms: list[str] | dict[Any, list[str]]#
reset(env_ids: Tensor) dict[str, Any][source]#

Resets the manager and returns logging info for the current step.

get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
class mjlab.managers.ManagerTermBase[source]#

Bases: object

Methods:

__init__(env)

reset(env_ids)

Resets the manager term.

Attributes:

__init__(env: ManagerBasedRlEnv)[source]#
property num_envs: int#
property device: str#
property name: str#
reset(env_ids: Tensor | slice | None) Any[source]#

Resets the manager term.

class mjlab.managers.ManagerTermBaseCfg[source]#

Bases: object

Base configuration for manager terms.

This is the base config for terms in observation, reward, termination, curriculum, and event managers. It provides a common interface for specifying a callable and its parameters.

The func field accepts either a function or a class:

Function-based terms are simpler and suitable for stateless computations:

RewardTermCfg(func=mdp.joint_torques_l2, weight=-0.01)

Class-based terms are instantiated with (cfg, env) and useful when you need to:

  • Cache computed values at initialization (e.g., resolve regex patterns to indices)

  • Maintain state across calls

  • Perform expensive setup once rather than every call

class posture:
  def __init__(self, cfg: RewardTermCfg, env: ManagerBasedRlEnv):
    # Resolve std dict to tensor once at init
    self.std = resolve_std_to_tensor(cfg.params["std"], env)

  def __call__(self, env, **kwargs) -> torch.Tensor:
    # Use cached self.std
    return compute_posture_reward(env, self.std)

RewardTermCfg(func=posture, params={"std": {".*knee.*": 0.3}}, weight=1.0)

Class-based terms can optionally implement reset(env_ids) for per-episode state.

Attributes:

func

The callable that computes this term's value.

params

Additional keyword arguments passed to func when called.

Methods:

__init__(func[, params])

func: Any#

The callable that computes this term’s value. Can be a function or a class. Classes are auto-instantiated with (cfg=term_cfg, env=env).

params: dict[str, Any]#

Additional keyword arguments passed to func when called.

__init__(func: ~typing.Any, params: dict[str, ~typing.Any] = <factory>) None#
class mjlab.managers.MetricsManager[source]#

Bases: ManagerBase

Accumulates per-step metric values, reports episode averages.

Unlike rewards, metrics have no weight, no dt scaling, and no normalization by episode length. Episode values are true per-step averages (sum / step_count), so a metric in [0,1] stays in [0,1] in the logger.

Methods:

__init__(cfg, env)

reset([env_ids])

Resets the manager and returns logging info for the current step.

compute()

get_active_iterable_terms(env_idx)

Attributes:

__init__(cfg: dict[str, MetricsTermCfg], env: ManagerBasedRlEnv)[source]#
property active_terms: list[str]#
reset(env_ids: Tensor | slice | None = None) dict[str, Tensor][source]#

Resets the manager and returns logging info for the current step.

compute() None[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
class mjlab.managers.MetricsTermCfg[source]#

Bases: ManagerTermBaseCfg

Configuration for a metrics term.

Methods:

__init__(func[, params])

__init__(func: Any, params: dict[str, Any] = <factory>) None#
class mjlab.managers.NullMetricsManager[source]#

Bases: object

Placeholder for absent metrics manager that safely no-ops all operations.

Methods:

__init__()[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
reset(env_ids: Tensor | None = None) dict[str, float][source]#
compute() None[source]#
class mjlab.managers.ObservationGroupCfg[source]#

Bases: object

Configuration for an observation group.

An observation group bundles multiple observation terms together. Groups are typically used to separate observations for different purposes (e.g., “actor” for the actor, “critic” for the value function).

Attributes:

terms

Dictionary mapping term names to their configurations.

concatenate_terms

Whether to concatenate all terms into a single tensor.

concatenate_dim

Dimension along which to concatenate terms.

enable_corruption

Whether to apply noise corruption to observations.

history_length

Group-level history length override.

flatten_history_dim

Whether to flatten history into the observation dimension.

nan_policy

NaN/Inf handling policy for observations in this group.

nan_check_per_term

If True, check each observation term individually to identify NaN source.

Methods:

__init__(terms[, concatenate_terms, ...])

terms: dict[str, ObservationTermCfg]#

Dictionary mapping term names to their configurations.

concatenate_terms: bool = True#

Whether to concatenate all terms into a single tensor. If False, returns a dict mapping term names to their individual tensors.

concatenate_dim: int = -1#

Dimension along which to concatenate terms. Default -1 (last dimension).

enable_corruption: bool = False#

Whether to apply noise corruption to observations. Set to True during training for domain randomization, False during evaluation.

history_length: int | None = None#

Group-level history length override. If set, applies to all terms in this group. If None, each term uses its own history_length setting.

flatten_history_dim: bool = True#

Whether to flatten history into the observation dimension. If True, observations have shape (num_envs, obs_dim * history_length). If False, shape is (num_envs, history_length, obs_dim).

nan_policy: Literal['disabled', 'warn', 'sanitize', 'error'] = 'disabled'#

NaN/Inf handling policy for observations in this group.

  • ‘disabled’: No checks (default, fastest)

  • ‘warn’: Log warning with term name and env IDs, then sanitize (debugging)

  • ‘sanitize’: Silent sanitization to 0.0 like reward manager (safe for production)

  • ‘error’: Raise ValueError on NaN/Inf (strict development mode)

__init__(terms: dict[str, ObservationTermCfg], concatenate_terms: bool = True, concatenate_dim: int = -1, enable_corruption: bool = False, history_length: int | None = None, flatten_history_dim: bool = True, nan_policy: Literal['disabled', 'warn', 'sanitize', 'error'] = 'disabled', nan_check_per_term: bool = True) None#
nan_check_per_term: bool = True#

If True, check each observation term individually to identify NaN source. If False, check only the final concatenated output (faster but less informative). Only applies when nan_policy != ‘disabled’.

class mjlab.managers.ObservationManager[source]#

Bases: ManagerBase

Manages observation computation for the environment.

The observation manager computes observations from multiple terms organized into groups. Each term can have noise, clipping, scaling, delay, and history applied. Groups can optionally concatenate their terms into a single tensor.

Methods:

__init__(cfg, env)

get_active_iterable_terms(env_idx)

get_term_cfg(group_name, term_name)

reset([env_ids])

Resets the manager and returns logging info for the current step.

compute([update_history])

compute_group(group_name[, update_history])

Attributes:

__init__(cfg: dict[str, ObservationGroupCfg], env)[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
property active_terms: dict[str, list[str]]#
property group_obs_dim: dict[str, tuple[int, ...] | list[tuple[int, ...]]]#
property group_obs_term_dim: dict[str, list[tuple[int, ...]]]#
property group_obs_concatenate: dict[str, bool]#
get_term_cfg(group_name: str, term_name: str) ObservationTermCfg[source]#
reset(env_ids: Tensor | slice | None = None) dict[str, float][source]#

Resets the manager and returns logging info for the current step.

compute(update_history: bool = False) dict[str, Tensor | dict[str, Tensor]][source]#
compute_group(group_name: str, update_history: bool = False) Tensor | dict[str, Tensor][source]#
class mjlab.managers.ObservationTermCfg[source]#

Bases: ManagerTermBaseCfg

Configuration for an observation term.

Processing pipeline: compute → noise → clip → scale → delay → history. Delay models sensor latency. History provides temporal context. Both are optional and can be combined.

Attributes:

noise

Noise model to apply to the observation.

clip

Range (min, max) to clip the observation values.

scale

Scaling factor(s) to multiply the observation by.

delay_min_lag

Minimum lag (in steps) for delayed observations.

delay_max_lag

Maximum lag (in steps) for delayed observations.

delay_per_env

If True, each environment samples its own lag.

delay_hold_prob

Probability of reusing the previous lag instead of resampling.

delay_update_period

Resample lag every N steps (models multi-rate sensors).

delay_per_env_phase

If True and update_period > 0, stagger update timing across envs to avoid synchronized resampling.

history_length

Number of past observations to keep in history.

flatten_history_dim

Whether to flatten the history dimension into observation.

Methods:

__init__(func[, params, noise, clip, scale, ...])

noise: NoiseCfg | NoiseModelCfg | None = None#

Noise model to apply to the observation.

clip: tuple[float, float] | None = None#

Range (min, max) to clip the observation values.

scale: tuple[float, ...] | float | Tensor | None = None#

Scaling factor(s) to multiply the observation by.

delay_min_lag: int = 0#

Minimum lag (in steps) for delayed observations. Lag sampled uniformly from [min_lag, max_lag]. Convert to ms: lag * (1000 / control_hz).

delay_max_lag: int = 0#

Maximum lag (in steps) for delayed observations. Use min=max for constant delay.

delay_per_env: bool = True#

If True, each environment samples its own lag. If False, all environments share the same lag at each step.

delay_hold_prob: float = 0.0#

Probability of reusing the previous lag instead of resampling. Useful for temporally correlated latency patterns.

delay_update_period: int = 0#

Resample lag every N steps (models multi-rate sensors). If 0, update every step.

delay_per_env_phase: bool = True#

If True and update_period > 0, stagger update timing across envs to avoid synchronized resampling.

history_length: int = 0#

Number of past observations to keep in history. 0 = no history.

flatten_history_dim: bool = True#

Whether to flatten the history dimension into observation.

When True and concatenate_terms=True, uses term-major ordering: [A_t0, A_t1, …, A_tH-1, B_t0, B_t1, …, B_tH-1, …] See docs/source/observation.rst for details on ordering.

__init__(func: Any, params: dict[str, Any] = <factory>, noise: ~mjlab.utils.noise.noise_cfg.NoiseCfg | ~mjlab.utils.noise.noise_cfg.NoiseModelCfg | None = None, clip: tuple[float, float] | None = None, scale: tuple[float, ...] | float | ~torch.Tensor | None = None, delay_min_lag: int = 0, delay_max_lag: int = 0, delay_per_env: bool = True, delay_hold_prob: float = 0.0, delay_update_period: int = 0, delay_per_env_phase: bool = True, history_length: int = 0, flatten_history_dim: bool = True) None#
func: Any#

The callable that computes this term’s value. Can be a function or a class. Classes are auto-instantiated with (cfg=term_cfg, env=env).

params: dict[str, Any]#

Additional keyword arguments passed to func when called.

class mjlab.managers.RewardManager[source]#

Bases: ManagerBase

Manages reward computation by aggregating weighted reward terms.

Reward Scaling Behavior:

By default, rewards are scaled by the environment step duration (dt). This normalizes cumulative episodic rewards across different simulation frequencies. The scaling can be disabled via the scale_by_dt parameter.

When scale_by_dt=True (default):
  • reward_buf (returned by compute()) = raw_value * weight * dt

  • _episode_sums (cumulative rewards) are scaled by dt

  • Episode_Reward/* logged metrics are scaled by dt

When scale_by_dt=False:
  • reward_buf = raw_value * weight (no dt scaling)

Regardless of the scaling setting:
  • _step_reward (via get_active_iterable_terms()) always contains the unscaled reward rate (raw_value * weight)

Methods:

__init__(cfg, env, *[, scale_by_dt])

reset([env_ids])

Resets the manager and returns logging info for the current step.

compute(dt)

get_active_iterable_terms(env_idx)

get_term_cfg(term_name)

Attributes:

__init__(cfg: dict[str, RewardTermCfg], env: ManagerBasedRlEnv, *, scale_by_dt: bool = True)[source]#
property active_terms: list[str]#
reset(env_ids: Tensor | slice | None = None) dict[str, Tensor][source]#

Resets the manager and returns logging info for the current step.

compute(dt: float) Tensor[source]#
get_active_iterable_terms(env_idx)[source]#
get_term_cfg(term_name: str) RewardTermCfg[source]#
class mjlab.managers.RewardTermCfg[source]#

Bases: ManagerTermBaseCfg

Configuration for a reward term.

Attributes:

func

The callable that computes this reward term's value.

weight

Weight multiplier for this reward term.

Methods:

__init__([params])

func: Any#

The callable that computes this reward term’s value.

weight: float#

Weight multiplier for this reward term.

__init__(params: dict[str, ~typing.Any] = <factory>, *, func: ~typing.Any, weight: float) None#
params: dict[str, Any]#

Additional keyword arguments passed to func when called.

class mjlab.managers.SceneEntityCfg[source]#

Bases: object

Configuration for a scene entity that is used by the manager’s term.

This configuration allows flexible specification of entity components either by name or by ID. During resolution, it ensures consistency between names and IDs, and can optimize to slice(None) when all components are selected.

Attributes:

name

The name of the entity in the scene.

joint_names

Names of joints to include.

joint_ids

IDs of joints to include.

body_names

Names of bodies to include.

body_ids

IDs of bodies to include.

geom_names

Names of geometries to include.

geom_ids

IDs of geometries to include.

site_names

Names of sites to include.

site_ids

IDs of sites to include.

actuator_names

Names of actuators to include.

actuator_ids

IDs of actuators to include.

preserve_order

If True, maintains the order of components as specified.

Methods:

__init__(name[, joint_names, joint_ids, ...])

resolve(scene)

Resolve names and IDs for all configured fields.

name: str#

The name of the entity in the scene.

joint_names: str | tuple[str, ...] | None = None#

Names of joints to include. Can be a single string or tuple.

joint_ids: list[int] | slice#

IDs of joints to include. Can be a list or slice.

body_names: str | tuple[str, ...] | None = None#

Names of bodies to include. Can be a single string or tuple.

body_ids: list[int] | slice#

IDs of bodies to include. Can be a list or slice.

geom_names: str | tuple[str, ...] | None = None#

Names of geometries to include. Can be a single string or tuple.

geom_ids: list[int] | slice#

IDs of geometries to include. Can be a list or slice.

site_names: str | tuple[str, ...] | None = None#

Names of sites to include. Can be a single string or tuple.

site_ids: list[int] | slice#

IDs of sites to include. Can be a list or slice.

__init__(name: str, joint_names: str | tuple[str, ...] | None = None, joint_ids: list[int] | slice = <factory>, body_names: str | tuple[str, ...] | None = None, body_ids: list[int] | slice = <factory>, geom_names: str | tuple[str, ...] | None = None, geom_ids: list[int] | slice = <factory>, site_names: str | tuple[str, ...] | None = None, site_ids: list[int] | slice = <factory>, actuator_names: str | list[str] | None = None, actuator_ids: list[int] | slice = <factory>, preserve_order: bool = False) None#
actuator_names: str | list[str] | None = None#

Names of actuators to include. Can be a single string or list.

actuator_ids: list[int] | slice#

IDs of actuators to include. Can be a list or slice.

preserve_order: bool = False#

If True, maintains the order of components as specified.

resolve(scene: Scene) None[source]#

Resolve names and IDs for all configured fields.

This method ensures consistency between names and IDs for each field type. It handles three cases: 1. Both names and IDs provided: Validates they match 2. Only names provided: Computes IDs (optimizes to slice(None) if all selected) 3. Only IDs provided: Computes names

Parameters:

scene – The scene containing the entity to resolve against.

Raises:
  • ValueError – If provided names and IDs are inconsistent.

  • KeyError – If the entity name is not found in the scene.

class mjlab.managers.TerminationManager[source]#

Bases: ManagerBase

Manages termination conditions for the environment.

The termination manager aggregates multiple termination terms to compute episode done signals. Terms can be either truncations (time-based) or terminations (failure conditions).

Methods:

__init__(cfg, env)

reset([env_ids])

Resets the manager and returns logging info for the current step.

compute()

get_term(name)

get_term_cfg(term_name)

get_active_iterable_terms(env_idx)

Attributes:

__init__(cfg: dict[str, TerminationTermCfg], env: ManagerBasedRlEnv)[source]#
property active_terms: list[str]#
property dones: Tensor#
property time_outs: Tensor#
property terminated: Tensor#
reset(env_ids: Tensor | slice | None = None) dict[str, Tensor][source]#

Resets the manager and returns logging info for the current step.

compute() Tensor[source]#
get_term(name: str) Tensor[source]#
get_term_cfg(term_name: str) TerminationTermCfg[source]#
get_active_iterable_terms(env_idx: int) Sequence[tuple[str, Sequence[float]]][source]#
class mjlab.managers.TerminationTermCfg[source]#

Bases: ManagerTermBaseCfg

Configuration for a termination term.

Attributes:

time_out

Whether the term contributes towards episodic timeouts.

Methods:

__init__(func[, params, time_out])

time_out: bool = False#

Whether the term contributes towards episodic timeouts.

__init__(func: Any, params: dict[str, Any] = <factory>, time_out: bool = False) None#
func: Any#

The callable that computes this term’s value. Can be a function or a class. Classes are auto-instantiated with (cfg=term_cfg, env=env).

params: dict[str, Any]#

Additional keyword arguments passed to func when called.