Actions#
Actions define how the policy controls the simulation. The action manager receives the policy’s output tensor each step, splits it across registered action terms, and routes each slice to the appropriate entity’s actuators. Each term maps a contiguous segment of the policy output to a control mode (position, velocity, effort) on a set of joints, tendons, or sites.
from mjlab.envs.mdp.actions import JointPositionActionCfg
actions = {
"joint_pos": JointPositionActionCfg(
entity_name="robot",
actuator_names=(".*",), # regex matching actuator names
scale=0.5,
use_default_offset=True, # action 0 = default pose
),
}
Common parameters#
All action types share a base set of parameters inherited from
BaseActionCfg.
entity_name identifies the scene entity to control. actuator_names
is a tuple of regex patterns matched against actuator (or tendon/site)
names to select the controlled targets.
scale multiplies the raw policy output before any offset is applied.
It accepts a scalar or a dict mapping actuator name patterns to
per-target values. This keeps policy outputs in a normalized range while
mapping to physically meaningful units. offset is added after
scaling; joint action types also provide use_default_offset, which
automatically loads the entity’s default joint positions or velocities
as the offset so that a raw output of zero produces the default pose.
clip optionally clamps the processed action before it reaches the
actuator.
Actions are written to actuator targets on every decimation substep (physics step), not just once per policy step. This is in contrast to observation delay, which operates in units of policy steps.
Action types#
Type |
Description |
|---|---|
|
Sets joint position targets. With |
|
Sets joint velocity targets. |
|
Sets joint effort (torque) targets directly. No default offset. |
|
Sets tendon length targets. Targets are resolved by matching
|
|
Sets tendon velocity targets. |
|
Sets tendon effort targets. |
|
Applies forces and torques at named sites. Useful for quadrotors and drones where thrust is applied at rotor sites rather than through joint actuators. |
Task-space actions#
DifferentialIKAction converts Cartesian position and orientation
commands into joint-space position targets via damped least-squares
inverse kinematics. One IK step is executed per decimation substep, so
the end-effector tracks the target continuously across substeps rather
than only at policy frequency.
The action dimension is selected automatically based on configuration:
orientation_weight == 0: 3D (position only)orientation_weight > 0, use_relative_mode=True: 6D (delta position + delta axis-angle)orientation_weight > 0, use_relative_mode=False: 7D (absolute position + quaternion)
All objectives (position, orientation, joint limits, posture) are stacked into a single DLS system. Setting a weight to zero disables that objective with no overhead in the solve.
The compute_dq() method returns joint displacements without writing
to actuator targets, enabling multi-iteration IK in standalone scripts
outside of RL training.
Action dimensions and history#
The total action dimension presented to the policy is the sum of each
registered term’s action_dim. For joint, tendon, and site actions
this equals the number of matched targets. For DifferentialIKAction
it is 3, 6, or 7 depending on the active objectives.
The action manager tracks the three most recent action vectors:
action, prev_action, and prev_prev_action. Observation terms
such as last_action and reward terms such as action_rate_l2 and
action_acc_l2 read from these buffers. Action history is zeroed on
environment reset so that episode boundaries do not leak information.
Multiple action terms#
An environment can register any number of terms. The action manager concatenates their dimensions in registration order, splits the policy’s output tensor at the corresponding boundaries, and routes each slice independently.
from mjlab.envs.mdp.actions import (
JointPositionActionCfg,
JointVelocityActionCfg,
)
actions = {
"arm_joints": JointPositionActionCfg(
entity_name="robot",
actuator_names=(".*_arm_.*",),
scale=0.5,
),
"wheel_joints": JointVelocityActionCfg(
entity_name="robot",
actuator_names=(".*_wheel_.*",),
scale=10.0,
),
}
The policy outputs a tensor whose width equals the total number of matched targets across all terms. Terms can also target different entities, for example one term for a robot and another for an object being manipulated.