Environment API¶
Our environments differ slightly from pettingzoo
in two aspects. Our environments operate in vectorized batches.
The intention of this is to take advantage of batched computation provided by SIMD architectures, and allow for
multithreaded operations to make environment transitions faster. Additionally, our environments use standard
Configuration
objects to organize environment parameters.
Directory Structure¶
Each environment follows the same directory structure for defining environment dynamics.
envs
├── <environment> # <Environment implementation>
│ ├── configs # Benchmark configurations
│ └── env # Environment definitions
│ ├── spaces # Action / observation spaces
│ ├── structures #
| | |_ configuration.py # "Configuration" for env settings
| | |_ state.py # "State" class to store curr state
│ ├── transitions # Environment transition functions
│ ├── utils # Misc. tools
│ └── <environment>.py # Main environment definition
└── <environment>_vX.py # Environment import file
Importing Environments¶
The method for importing environments for utilization is identical to pettingzoo
.
# free-range-zoo
from free_range_zoo.envs import wildfire_v0
# vs. pettingzoo
from pettingzoo.butterfly import pistonball_v6
The environment can then be used in a manner similar to pettingzoo
environements.
Configuring Environments¶
In order to make environments more flexible and parameterize transition functions, free-range-zoo
utilizes
Configuration
classes to group parameters for the environment. Each environment’s respective
Configuration
objects and structure are documented in their Specification page, and implementations
are included in envs/<environment_name>/env/structures/configuration.py
.
For simplicity, generators for configurations have been included with the competition materials. Configuration generators for each environment are documented here.
Each environment module exposes two methods parallel_env()
and aec_env()
. parallel_env
is more commonly used and
follows the Parallel AEC loop where all agent actions are submitted simultaneously.
Both functions take in several parameters:
configuration
:Configuration
- the configuration object for the environment to use at initialization.parallel_envs
:int
- the number of parallel episodes to run simultaneously within the environment.max_steps
:int
- the number of environment steps to run before termination.wildfire
has extra termination criteria.device
:torch.DeviceObjType
- the device to use for environment data storage and processing. All outputs from the environment will also be on this device.single_seeding
:bool
- Whether all environment parameters should operate on a single seed. This parallelizes random generation across environments rather than having each environment have its own seed. This significantly increases speed for each environment step.buffer_size
:int
- The size of the buffer for random generation. Generally this should be set to the value ofmax_steps * parallel_envs
to make it so that random generation is only done once per complete episode.
There are additional parameters available in each environment file. Note that action space and observability parameters will be kept on the default values during evaluation, so it is advised that the same is done during training so that observations are kept consistent with evaluation.
To see more details about how to interact with environments, see Environment Usage.
Actions¶
Actions in free-range-zoo
have a consistent structure, however must be represented in a way that is compatible with
task openness. Actions have two categories, task actions, which are represented by the action input
(task id, action id)
, where action id is subject to change based on what actions are available to the agent. The
other category is task agnostic actions, which are represented by action input (padding, action id)
. Action ids
for task agnostic actions are always consistent and always negative.
The reason why we represent action spaces in this way is due to having an unbounded number of tasks due to task openness. In order to handle this, action spaces are non-static and must grow and shrink during execution in order to accomodate additional tasks.
Task agnostic actions represent actions for which the association with an individual task is not known. An example
of this would be noop
in all domains. noop
has an effect within the environment, however that effect is independent
from any individual task. Task actions are actions with which a direct relationship can be drawn between the task
and the associated action. For example, pick
within rideshare has a clear association with an inidividual task, and
thus must be targeted towards a specific task.
Additional Useful Attributes¶
Methods¶
reset(seed: List[int], *args, **kwargs)
: Used for resetting the environment state to the initial state defined on
initialization by configuration
. Takes initial seeds for each environment.
step(actions: torch.Tensor, *args, **kwargs)
: Used to advance the entire environment a single step. Actions must be
provided in the form of IntTensor
with shape (parallel_envs, 2)
, where element (1) of the actions represents the
task which the agent is acting on, and (2) represents the action taken on that task.
action_space(agent: str)
: Used to retrieve the action space of agent
. Will return spaces which are similar to
gymnasium.Space
objects, but provide faster functionality.
observation_space(agent: str)
: Used to retrieve the observation space of agent
. Will return spaces which are similar to
gymnasium.Space
objects, but provide faster functionality.
state()
: Used to retrive the current state of the environment. State
will be the vectorized state of all parallel
environments at the time of execution.
Attributes¶
agents
: The list of all agents names in all environments.
agent_name_mapping
: The mapping of agent names to agent indices for all environments.
environment_task_count
: An IntTensor
representing the number of tasks currently present in each parallel environment.
The tensor has the shape (parallel_envs)
.
agent_task_count
: An IntTensor
reprsenting the number of tasks available to each agent. Holds the form
Dict[str, IntTensor]
and is keyed by agent name, where IntTensor
has shape (parallel_envs)
.
task_store
: A ragged tensor of the task observations for all environments. Individualized per environment.
agent_action_mapping
: A mapping of task indices from an agent’s action space to task_store
. This is used to map actions
from an individual agent’s action space back to the global task store. Holds the form of Dict[str, IntTensor]
and is keyed
by agent name, where IntTensor
is ragged, of shape (parallel_envs, tasks)
and is representative of indices in
task_store
.
agent_observation_mapping
: A mapping of task indices from an agent’s observation space to task_store
. This is used to
map observations from an individual agent’s observation space back to the global task store. Holds the form of
Dict[str, IntTensor]
and is keyed by agent name, where IntTensor
is ragged, of shape (parallel_envs, tasks)
and
is representative of indices in task_store
.
terminated
: A dict of termination state of every current agent in every environment at the time called. The returned
dict is keyed by agent name and returns a BoolTensor
of the termination state of the agent in each parallel
environment. The inner tensors have the shape (parallel_envs)
.
truncated
: A dict of trunctation state of every current agent in every environment at the time called. The returned
dict is keyed by agent name and returns a BoolTensor
of the truncation state of the agent in each parallel
environment. The inner tensors have the shape (parallel_envs)
.
finished
: A dict of terminated | truncated
. Holds the same form as terminated
and truncated
. The inner tensors have the shape (parallel_envs)
.