Specification¶

Import	`from free_range_zoo.envs import rideshare_v0`
Actions	Discrete & Deterministic
Observations	Discrete and fully observed with private observations
Parallel API	Yes
Manual Control	No
Agent Names	[\(driver\)_0, … , \(driver\)_n]
# Agents	\(n\)
Action Shape	(\(envs\), 2)
Action Values	[\([accept (0)\|pick (1)\|drop (2)]\_0\), …, \([accept (0)\|pick (1)\|drop (2)]\_{tasks}\), \(noop\) (-1)]
Observation Shape	TensorDict: { self: \(<y, x, num_{accepted}, num_{riding}>\) others: \(<y, x, num_{accepted}, num_{riding}>\) tasks: \(<y, x, y_{dest}, x_{dest}, accepted_by, riding_by, entered_step>\) batch_size: \(num\_envs\) }
Observation Values	self: \(y\):\([0, max_y]\) \(x\): \([0, max_x]\) \(num\_accepted\): \([0, pooling\_limit]\) \(num_riding\): \([0, pooling\_limit]\) others: \(y\):\([0, max_y]\) \(x\): \([0, max_x]\) \(num\_accepted\): \([0, pooling\_limit]\) \(num_riding\): \([0, pooling\_limit]\)tasks: \(y\): \([0, max_y]\) \(x\): \([0, max_x]\) \(y_{dest}\): \([0, max_y]\) \(x_{dest}\): \([0, max_x]\) \(riding\_by\): \([0, num_{agents}]\) \(accepted\_by\): \([0, num_{agents}]\) \(entered\_step\): \([0, max_{steps}]\)

Usage¶

Parallel API¶

from free_range_zoo.envs import rideshare_v0

main_logger = logging.getLogger(__name__)

# Initialize and reset environment to initial state
env = rideshare_v0.parallel_env(render_mode="human")
observations, infos = env.reset()

# Initialize agents and give initial observations
agents = []

cumulative_rewards = {agent: 0 for agent in env.agents}

current_step = 0
while not torch.all(env.finished):
    agent_actions = {
        agent_name: torch.stack([agents[agent_name].act()])
        for agent_name in env.agents
    }  # Policy action determination here

    observations, rewards, terminations, truncations, infos = env.step(agent_actions)
    rewards = {agent_name: rewards[agent_name].item() for agent_name in env.agents}

    for agent_name, agent in agents.items():
        agent.observe(observations[agent_name][0])  # Policy observation processing here
        cumulative_rewards[agent_name] += rewards[agent_name]

    main_logger.info(f"Step {current_step}: {rewards}")
    current_step += 1

env.close()

AEC API¶

from free_range_zoo.envs import rideshare_v0

main_logger = logging.getLogger(__name__)

# Initialize and reset environment to initial state
env = rideshare_v0.parallel_env(render_mode="human")
observations, infos = env.reset()

# Initialize agents and give initial observations
agents = []

cumulative_rewards = {agent: 0 for agent in env.agents}

current_step = 0
while not torch.all(env.finished):
    for agent in env.agent_iter():
        observations, rewards, terminations, truncations, infos = env.last()

        # Policy action determination here
        action = env.action_space(agent).sample()

        env.step(action)

    rewards = {agent: rewards[agent].item() for agent in env.agents}
    cumulative_rewards[agent] += rewards[agent]

    current_step += 1
    main_logger.info(f"Step {current_step}: {rewards}")

env.close()

Configuration¶

class free_range_zoo.envs.rideshare.env.structures.configuration.AgentConfiguration(start_positions: IntTensor, pool_limit: int, use_diagonal_travel: bool, use_fast_travel: bool)[source]¶

Agent settings for rideshare.

Variables:

start_positions (torch.IntTensor) – torch.IntTensor - Starting positions of the agents
pool_limit (int) – int - Maximum number of passengers that can be in a car
use_diagonal_travel (bool) – bool - whether to enable diagonal travel for agents
use_fast_travel (bool) – bool - whether to enable fast travel for agents

class free_range_zoo.envs.rideshare.env.structures.configuration.PassengerConfiguration(schedule: IntTensor)[source]¶

Task settings for rideshare.

Variables:: schedule (torch.IntTensor) – torch.IntTensor: tensor in the shape of <tasks, (timestep, batch, y, x, y_dest, x_dest, fare)> where batch can be set to -1 to indicate a wildcard for all batches

class free_range_zoo.envs.rideshare.env.structures.configuration.RewardConfiguration(pick_cost: float, move_cost: float, drop_cost: float, noop_cost: float, accept_cost: float, pool_limit_cost: float, use_pooling_rewards: bool, use_variable_move_cost: bool, use_waiting_costs: bool, wait_limit: IntTensor, long_wait_time: int, general_wait_cost: float, long_wait_cost: float)[source]¶

Reward settings for rideshare.

Variables:

pick_cost (float) – torch.FloatTensor - Cost of picking up a passenger
move_cost (float) – torch.FloatTensor - Cost of moving to a new location
drop_cost (float) – torch.FloatTensor - Cost of dropping off a passenger
noop_cost (float) – torch.FloatTensor - Cost of taking no action
accept_cost (float) – torch.FloatTensor - Cost of accepting a passenger
pool_limit_cost (float) – torch.FloatTensor - Cost of exceeding the pool limit
use_variable_move_cost (bool) – torch.BoolTensor - Whether to use the variable move cost
use_variable_pick_cost – torch.BoolTensor - Whether to use the variable pick cost
use_waiting_costs (bool) – torch.BoolTensor - Whether to use waiting costs
wait_limit (torch.IntTensor) – List[int] - List of wait limits for each state of the passenger [unaccepted, accepted, riding]
long_wait_time (int) – int - Time after which a passenger is considered to be waiting for a long time (default maximum of wait_limit)
general_wait_cost (float) – torch.FloatTensor - Cost of waiting for a passenger
long_wait_cost (float) – torch.FloatTensor - Cost of waiting for a passenger for a long time (added to wait cost)

class free_range_zoo.envs.rideshare.env.structures.configuration.RideshareConfiguration(grid_height: int, grid_width: int, agent_config: AgentConfiguration, passenger_config: PassengerConfiguration, reward_config: RewardConfiguration)[source]¶

Configuration settings for rideshare environment.

Variables:

grid_height (int) – int - grid height for the rideshare environment space.
grid_width (int) – int - grid width for the rideshare environment space.
agent_config (free_range_zoo.envs.rideshare.env.structures.configuration.AgentConfiguration) – AgentConfiguration - Agent settings for the rideshare environment.
passenger_config (free_range_zoo.envs.rideshare.env.structures.configuration.PassengerConfiguration) – PassengerConfiguration - Passenger settings for the rideshare environment.
reward_config (free_range_zoo.envs.rideshare.env.structures.configuration.RewardConfiguration) – RewardConfiguration - Reward configuration for the rideshare environment.

API¶

class free_range_zoo.envs.rideshare.env.rideshare.env(wrappers: List[Callable] = [], **kwargs)[source]¶

AEC wrapped version of the rideshare environment.

Parameters:: wrappers – List[Callable[[BatchedAECEnv], BatchedAECEnv]] - the wrappers to apply to the environment
Returns:: BatchedAECEnv – the rideshare environment

class free_range_zoo.envs.rideshare.env.rideshare.raw_env(*args, **kwargs)[source]¶

Implementation of the dynamic rideshare environment.

Initialize the simulation.