mjlab.envs#

RL environment classes.

Classes:

ManagerBasedRlEnv

Manager-based RL environment.

ManagerBasedRlEnvCfg

Configuration for a manager-based RL environment.

class mjlab.envs.ManagerBasedRlEnv[source]#

Bases: object

Manager-based RL environment.

Attributes:

is_vector_env

metadata

cfg

num_envs

Number of parallel environments.

physics_dt

Physics simulation step size.

step_dt

Environment step size (physics_dt * decimation).

device

Device for computation.

max_episode_length_s

Maximum episode length in seconds.

max_episode_length

Maximum episode length in steps.

unwrapped

Get the unwrapped environment (base case for wrapper chains).

Methods:

__init__(cfg, device[, render_mode])

setup_manager_visualizers()

load_managers()

Load and initialize all managers.

reset(*[, seed, env_ids, options])

step(action)

render()

close()

seed([seed])

update_visualizers(visualizer)

is_vector_env = True#
metadata = {'mujoco_version': '3.4.1', 'render_modes': [None, 'rgb_array'], 'warp_version': warp.config.version}#
__init__(cfg: ManagerBasedRlEnvCfg, device: str, render_mode: str | None = None, **kwargs) None[source]#
cfg: ManagerBasedRlEnvCfg#
property num_envs: int#

Number of parallel environments.

property physics_dt: float#

Physics simulation step size.

property step_dt: float#

Environment step size (physics_dt * decimation).

property device: str#

Device for computation.

property max_episode_length_s: float#

Maximum episode length in seconds.

property max_episode_length: int#

Maximum episode length in steps.

property unwrapped: ManagerBasedRlEnv#

Get the unwrapped environment (base case for wrapper chains).

setup_manager_visualizers() None[source]#
load_managers() None[source]#

Load and initialize all managers.

Order is important! Event and command managers must be loaded first, then action and observation managers, then other RL managers.

reset(*, seed: int | None = None, env_ids: Tensor | None = None, options: dict[str, Any] | None = None) tuple[Dict[str, Tensor | Dict[str, Tensor]], dict][source]#
step(action: Tensor) tuple[Dict[str, Tensor | Dict[str, Tensor]], Tensor, Tensor, Tensor, dict][source]#
render() ndarray | None[source]#
close() None[source]#
static seed(seed: int = -1) int[source]#
update_visualizers(visualizer: DebugVisualizer) None[source]#
class mjlab.envs.ManagerBasedRlEnvCfg[source]#

Bases: object

Configuration for a manager-based RL environment.

Attributes:

decimation

Number of simulation steps per environment step.

scene

observations

actions

events

seed

sim

viewer

episode_length_s

Duration of an episode (in seconds).

rewards

Reward terms configuration.

terminations

Termination terms configuration.

commands

Command terms configuration.

curriculum

Curriculum terms configuration.

is_finite_horizon

Whether the task has a finite or infinite horizon.

Methods:

__init__(*, decimation, scene, observations, ...)

decimation: int#

Number of simulation steps per environment step.

scene: SceneCfg#
observations: dict[str, ObservationGroupCfg]#
actions: dict[str, ActionTermCfg]#
events: dict[str, EventTermCfg]#
seed: int | None = None#
sim: SimulationCfg#
viewer: ViewerConfig#
episode_length_s: float = 0.0#

Duration of an episode (in seconds).

Episode length in steps is computed as:

ceil(episode_length_s / (sim.mujoco.timestep * decimation))

rewards: dict[str, RewardTermCfg]#

Reward terms configuration.

terminations: dict[str, TerminationTermCfg]#

Termination terms configuration.

commands: dict[str, CommandTermCfg] | None = None#

Command terms configuration. If None, no commands are used.

curriculum: dict[str, CurriculumTermCfg] | None = None#

Curriculum terms configuration. If None, no curriculum is used.

is_finite_horizon: bool = False#

Whether the task has a finite or infinite horizon. Defaults to False (infinite).

  • Finite horizon (True): The time limit defines the task boundary. When reached, no future value exists beyond it, so the agent receives a terminal done signal.

  • Infinite horizon (False): The time limit is an artificial cutoff. The agent receives a truncated done signal to bootstrap the value of continuing beyond the limit.

__init__(*, decimation: int, scene: ~mjlab.scene.scene.SceneCfg, observations: dict[str, ~mjlab.managers.manager_term_config.ObservationGroupCfg], actions: dict[str, ~mjlab.managers.manager_term_config.ActionTermCfg], events: dict[str, ~mjlab.managers.manager_term_config.EventTermCfg] = <factory>, seed: int | None = None, sim: ~mjlab.sim.sim.SimulationCfg = <factory>, viewer: ~mjlab.viewer.viewer_config.ViewerConfig = <factory>, episode_length_s: float = 0.0, rewards: dict[str, ~mjlab.managers.manager_term_config.RewardTermCfg] = <factory>, terminations: dict[str, ~mjlab.managers.manager_term_config.TerminationTermCfg] = <factory>, commands: dict[str, ~mjlab.managers.manager_term_config.CommandTermCfg] | None = None, curriculum: dict[str, ~mjlab.managers.manager_term_config.CurriculumTermCfg] | None = None, is_finite_horizon: bool = False) None#