mjlab.envs#
RL environment classes.
Classes:
Manager-based RL environment. |
|
Configuration for a manager-based RL environment. |
- class mjlab.envs.ManagerBasedRlEnv[source]#
Bases:
objectManager-based RL environment.
Attributes:
Number of parallel environments.
Physics simulation step size.
Environment step size (physics_dt * decimation).
Device for computation.
Maximum episode length in seconds.
Maximum episode length in steps.
Get the unwrapped environment (base case for wrapper chains).
Methods:
__init__(cfg, device[, render_mode])Load and initialize all managers.
reset(*[, seed, env_ids, options])step(action)render()close()seed([seed])update_visualizers(visualizer)- is_vector_env = True#
- metadata = {'mujoco_version': '3.4.1', 'render_modes': [None, 'rgb_array'], 'warp_version': warp.config.version}#
- __init__(cfg: ManagerBasedRlEnvCfg, device: str, render_mode: str | None = None, **kwargs) None[source]#
- property unwrapped: ManagerBasedRlEnv#
Get the unwrapped environment (base case for wrapper chains).
- load_managers() None[source]#
Load and initialize all managers.
Order is important! Event and command managers must be loaded first, then action and observation managers, then other RL managers.
- reset(*, seed: int | None = None, env_ids: Tensor | None = None, options: dict[str, Any] | None = None) tuple[Dict[str, Tensor | Dict[str, Tensor]], dict][source]#
- class mjlab.envs.ManagerBasedRlEnvCfg[source]#
Bases:
objectConfiguration for a manager-based RL environment.
Attributes:
Number of simulation steps per environment step.
Duration of an episode (in seconds).
Reward terms configuration.
Termination terms configuration.
Command terms configuration.
Curriculum terms configuration.
Whether the task has a finite or infinite horizon.
Methods:
__init__(*, decimation, scene, observations, ...)- sim: SimulationCfg#
- viewer: ViewerConfig#
- episode_length_s: float = 0.0#
Duration of an episode (in seconds).
- Episode length in steps is computed as:
ceil(episode_length_s / (sim.mujoco.timestep * decimation))
- commands: dict[str, CommandTermCfg] | None = None#
Command terms configuration. If None, no commands are used.
- curriculum: dict[str, CurriculumTermCfg] | None = None#
Curriculum terms configuration. If None, no curriculum is used.
- is_finite_horizon: bool = False#
Whether the task has a finite or infinite horizon. Defaults to False (infinite).
Finite horizon (True): The time limit defines the task boundary. When reached, no future value exists beyond it, so the agent receives a terminal done signal.
Infinite horizon (False): The time limit is an artificial cutoff. The agent receives a truncated done signal to bootstrap the value of continuing beyond the limit.
- __init__(*, decimation: int, scene: ~mjlab.scene.scene.SceneCfg, observations: dict[str, ~mjlab.managers.manager_term_config.ObservationGroupCfg], actions: dict[str, ~mjlab.managers.manager_term_config.ActionTermCfg], events: dict[str, ~mjlab.managers.manager_term_config.EventTermCfg] = <factory>, seed: int | None = None, sim: ~mjlab.sim.sim.SimulationCfg = <factory>, viewer: ~mjlab.viewer.viewer_config.ViewerConfig = <factory>, episode_length_s: float = 0.0, rewards: dict[str, ~mjlab.managers.manager_term_config.RewardTermCfg] = <factory>, terminations: dict[str, ~mjlab.managers.manager_term_config.TerminationTermCfg] = <factory>, commands: dict[str, ~mjlab.managers.manager_term_config.CommandTermCfg] | None = None, curriculum: dict[str, ~mjlab.managers.manager_term_config.CurriculumTermCfg] | None = None, is_finite_horizon: bool = False) None#