mjlab.rl#
Classes
Runner#
- class mjlab.rl.MjlabOnPolicyRunner[source]#
Bases:
OnPolicyRunnerBase runner that persists environment state across checkpoints.
- __init__(env: rsl_rl.env.VecEnv, train_cfg: dict, log_dir: str | None = None, device: str = 'cpu') None[source]#
- export_policy_to_onnx(path: str, filename: str = 'policy.onnx', verbose: bool = False) None[source]#
Export policy to ONNX format using legacy export path.
Overrides the base implementation to set dynamo=False, avoiding warnings about dynamic_axes being deprecated with the new TorchDynamo export path (torch>=2.9 default).
- save(path: str, infos=None) None[source]#
Save checkpoint.
Extends the base implementation to persist the environment’s common_step_counter and to respect the
upload_modelconfig flag.
- load(path: str, load_cfg: dict | None = None, strict: bool = True, map_location: str | None = None) dict[source]#
Load checkpoint.
Extends the base implementation to: 1. Restore common_step_counter to preserve curricula state. 2. Migrate legacy checkpoints (actor.* -> mlp.*, actor_obs_normalizer.*
-> obs_normalizer.*) to the current format (rsl-rl>=4.0).
Configuration#
- class mjlab.rl.RslRlOnPolicyRunnerCfg[source]#
-
- actor: RslRlModelCfg#
The actor configuration.
- critic: RslRlModelCfg#
The critic configuration.
- algorithm: RslRlPpoAlgorithmCfg#
The algorithm configuration.
- class mjlab.rl.RslRlPpoAlgorithmCfg[source]#
Config for the PPO algorithm.
- num_mini_batches: int = 4#
The number of mini-batches per update. mini batch size = num_envs * num_steps / num_mini_batches
- normalize_advantage_per_mini_batch: bool = False#
Whether to normalize the advantage per mini-batch. Default is False. If True, the advantage is normalized over the mini-batches only. Otherwise, the advantage is normalized over the entire collected trajectories.
Share CNN encoders between actor and critic.
- class mjlab.rl.RslRlModelCfg[source]#
Config for a single neural network model (Actor or Critic).
The hidden dimensions of the network.
- class mjlab.rl.RslRlBaseRunnerCfg[source]#
-
- experiment_name: str = 'exp1'#
Directory name used to group runs under
logs/rsl_rl/{experiment_name}/.
- run_name: str = ''#
Optional label appended to the timestamped run directory (e.g.
2025-01-27_14-30-00_{run_name}). Also becomes the display name for the run in wandb.
- load_run: str = '.*'#
The run directory to load. Default is “.*” which means all runs. If regex expression, the latest (alphabetical order) matching run will be loaded.
- load_checkpoint: str = 'model_.*.pt'#
The checkpoint file to load. Default is “model_.*.pt” (all). If regex expression, the latest (alphabetical order) matching file will be loaded.