Curriculum#
The curriculum manager adjusts training conditions based on policy performance. Training begins with an easier problem and difficulty increases as the policy demonstrates it can handle the current conditions. Common uses include advancing robots to harder terrain, widening command velocity ranges, and ramping reward penalty weights over the course of training.
Curriculum terms are called at each environment reset. Each term receives the environment and the set of resetting environment IDs, examines some performance signal, and applies changes to environment parameters directly.
from mjlab.managers.curriculum_manager import CurriculumTermCfg
curriculum = {
"terrain_levels": CurriculumTermCfg(
func=mdp.terrain_levels_vel,
params={"command_name": "twist"},
),
}
The return value of a curriculum function is logged under
Curriculum/<term_name> in the training metrics.
Built-in curriculum functions#
Function |
Description |
|---|---|
|
Measures how far each robot traveled during the episode. Robots that covered enough distance move up one difficulty row in the terrain grid; those that fell short move down. See the terrain curriculum section below. |
|
Widens velocity command ranges based on training step count. Each stage specifies a step threshold and the new ranges to apply once that threshold is exceeded. |
|
Ramps a reward term’s weight up or down according to training step thresholds. Useful for introducing penalty terms gradually after the policy has learned basic locomotion. |
Terrain curriculum#
The terrain grid used with procedural terrain is a
num_rows x num_cols matrix of patches. Columns represent terrain
type variants; rows represent difficulty levels, with row 0 being the
easiest and row num_rows - 1 the hardest. When
TerrainGeneratorCfg.curriculum=True, each column is assigned exactly
one terrain type so that difficulty increases monotonically along rows.
At environment construction each environment is assigned a random
starting row within [0, max_init_terrain_level]. The
terrain_levels_vel curriculum term promotes or demotes environments
on each reset based on distance traveled during the episode.
Environments that reach the maximum level are randomly reassigned to
any row, maintaining coverage across all difficulty levels. See
Terrain for details on configuring the terrain grid itself.
Writing custom curriculum functions#
A curriculum function accepts env and env_ids, applies
parameter changes, and returns a value to log (a scalar tensor, a dict
of tensors, or None). A typical implementation reads a performance
metric, decides whether to increase or decrease difficulty, mutates the
relevant configuration in place, and returns the current difficulty
level. See Term configuration pattern for the general pattern.