Specification


Import

from free_range_zoo.envs import rideshare_v0

Actions

Discrete & Deterministic

Observations

Discrete and fully observed with private observations

Parallel API

Yes

Manual Control

No

Agent Names

[driver_0, … , driver_n]

# Agents

n

Action Shape

(envs, 2)

Action Values

[[accept(0)|pick(1)|drop(2)]_0, …, [accept(0)|pick(1)|drop(2)]_tasks, noop (-1)]

Observation Shape

TensorDict: {
self: <y,x,numaccepted,numriding>
others: <y,x,numaccepted,numriding>
tasks: <y,x,ydest,xdest,acceptedby,ridingby,enteredstep>
batch_size: num_envs }

Observation Values

self:
y:[0,maxy]
x: [0,maxx]
num_accepted: [0,pooling_limit]
numriding: [0,pooling_limit]
others:
y:[0,maxy]
x: [0,maxx]
num_accepted: [0,pooling_limit]
numriding: [0,pooling_limit]tasks:
y: [0,maxy]
x: [0,maxx]
ydest: [0,maxy]
xdest: [0,maxx]
riding_by: [0,numagents]
accepted_by: [0,numagents]
entered_step: [0,maxsteps]


Usage

Parallel API

from free_range_zoo.envs import rideshare_v0

main_logger = logging.getLogger(__name__)

# Initialize and reset environment to initial state
env = rideshare_v0.parallel_env(render_mode="human")
observations, infos = env.reset()

# Initialize agents and give initial observations
agents = []

cumulative_rewards = {agent: 0 for agent in env.agents}

current_step = 0
while not torch.all(env.finished):
    agent_actions = {
        agent_name: torch.stack([agents[agent_name].act()])
        for agent_name in env.agents
    }  # Policy action determination here

    observations, rewards, terminations, truncations, infos = env.step(agent_actions)
    rewards = {agent_name: rewards[agent_name].item() for agent_name in env.agents}

    for agent_name, agent in agents.items():
        agent.observe(observations[agent_name][0])  # Policy observation processing here
        cumulative_rewards[agent_name] += rewards[agent_name]

    main_logger.info(f"Step {current_step}: {rewards}")
    current_step += 1

env.close()

AEC API

from free_range_zoo.envs import rideshare_v0

main_logger = logging.getLogger(__name__)

# Initialize and reset environment to initial state
env = rideshare_v0.parallel_env(render_mode="human")
observations, infos = env.reset()

# Initialize agents and give initial observations
agents = []

cumulative_rewards = {agent: 0 for agent in env.agents}

current_step = 0
while not torch.all(env.finished):
    for agent in env.agent_iter():
        observations, rewards, terminations, truncations, infos = env.last()

        # Policy action determination here
        action = env.action_space(agent).sample()

        env.step(action)

    rewards = {agent: rewards[agent].item() for agent in env.agents}
    cumulative_rewards[agent] += rewards[agent]

    current_step += 1
    main_logger.info(f"Step {current_step}: {rewards}")

env.close()

Configuration

class free_range_zoo.envs.rideshare.env.structures.configuration.AgentConfiguration(start_positions: IntTensor, pool_limit: int, use_diagonal_travel: bool, use_fast_travel: bool)[source]

Agent settings for rideshare.

Variables:
  • start_positions (torch.IntTensor) – torch.IntTensor - Starting positions of the agents

  • pool_limit (int) – int - Maximum number of passengers that can be in a car

  • use_diagonal_travel (bool) – bool - whether to enable diagonal travel for agents

  • use_fast_travel (bool) – bool - whether to enable fast travel for agents

class free_range_zoo.envs.rideshare.env.structures.configuration.PassengerConfiguration(schedule: IntTensor)[source]

Task settings for rideshare.

Variables:

schedule (torch.IntTensor) – torch.IntTensor: tensor in the shape of <tasks, (timestep, batch, y, x, y_dest, x_dest, fare)> where batch can be set to -1 to indicate a wildcard for all batches

class free_range_zoo.envs.rideshare.env.structures.configuration.RewardConfiguration(pick_cost: float, move_cost: float, drop_cost: float, noop_cost: float, accept_cost: float, pool_limit_cost: float, use_pooling_rewards: bool, use_variable_move_cost: bool, use_waiting_costs: bool, wait_limit: IntTensor, long_wait_time: int, general_wait_cost: float, long_wait_cost: float)[source]

Reward settings for rideshare.

Variables:
  • pick_cost (float) – torch.FloatTensor - Cost of picking up a passenger

  • move_cost (float) – torch.FloatTensor - Cost of moving to a new location

  • drop_cost (float) – torch.FloatTensor - Cost of dropping off a passenger

  • noop_cost (float) – torch.FloatTensor - Cost of taking no action

  • accept_cost (float) – torch.FloatTensor - Cost of accepting a passenger

  • pool_limit_cost (float) – torch.FloatTensor - Cost of exceeding the pool limit

  • use_variable_move_cost (bool) – torch.BoolTensor - Whether to use the variable move cost

  • use_variable_pick_cost – torch.BoolTensor - Whether to use the variable pick cost

  • use_waiting_costs (bool) – torch.BoolTensor - Whether to use waiting costs

  • wait_limit (torch.IntTensor) – List[int] - List of wait limits for each state of the passenger [unaccepted, accepted, riding]

  • long_wait_time (int) – int - Time after which a passenger is considered to be waiting for a long time (default maximum of wait_limit)

  • general_wait_cost (float) – torch.FloatTensor - Cost of waiting for a passenger

  • long_wait_cost (float) – torch.FloatTensor - Cost of waiting for a passenger for a long time (added to wait cost)

class free_range_zoo.envs.rideshare.env.structures.configuration.RideshareConfiguration(grid_height: int, grid_width: int, agent_config: AgentConfiguration, passenger_config: PassengerConfiguration, reward_config: RewardConfiguration)[source]

Configuration settings for rideshare environment.

Variables:

API

class free_range_zoo.envs.rideshare.env.rideshare.env(wrappers: List[Callable] = [], **kwargs)[source]

AEC wrapped version of the rideshare environment.

Parameters:

wrappers – List[Callable[[BatchedAECEnv], BatchedAECEnv]] - the wrappers to apply to the environment

Returns:

BatchedAECEnv – the rideshare environment

class free_range_zoo.envs.rideshare.env.rideshare.raw_env(*args, **kwargs)[source]

Implementation of the dynamic rideshare environment.

Initialize the simulation.