Quickstart - Rideshare¶

Overview¶

Here we show a brief introduction on how to use a free-range-zoo environment, and how to use our baselines. For further detail see Basic Usage and Logging.

Here we use Rideshare as a example domain, and we use Random and noop as example agents. We provide the full script of this tutorial at the bottom of this page for convenient copy and pasting.

Step #1: Environment Configurations¶

Configurations for each environment can be found at this Kaggle link. Configurations must be loaded using pickle. An example loading is shown below.

import pickle
import torch

with open('<path to configuration>.pkl', 'rb') as f:
    configuration = pickle.load(f)

Step #2: Environment Creation¶

Now we create our environment giving it the configuration, and a number of parallel environments to create. Here log_directory is the location of a empty or nonexistant directory which environment logs will be saved. If not given (or if None) automatic logging will not occur.

from free_range_zoo.envs import rideshare_v0
env = rideshare_v0.parallel_env(
    max_steps = 100,
    parallel_envs = 1,
    configuration = rideshare_configuration,
    device=torch.device('cpu'),
    log_directory = "test_logging"
)

Our baselines use the action_mapping_wrapper. This modifies the output observation to be a uple space.Dict -> (space.Dict, space.Dict). The second dictionary, t_mapping allow us to identify which task is associated with which observation in cases where all agents do not observe the same tasks. Like with accepted passengers in rideshare.

from free_range_zoo.wrappers.action_task import action_mapping_wrapper_v0
env = action_mapping_wrapper_v0(env)

Step #3: Environment Step¶

Now we can create our baseline agents and execute our policy. Here each agent must perform observe before each act which stores and process the prior observation.

from free_range_zoo.envs.rideshare.baselines import NoopBaseline, RandomBaseline

agents = {
    env.agents[0]: NoopBaseline(agent_name = "driver_1", parallel_envs = 1),
    env.agents[1]: NoopBaseline(agent_name = "driver_2", parallel_envs = 1),
    env.agents[2]: NoopBaseline(agent_name = "driver_3", parallel_envs = 1),
}

while not torch.all(env.finished):
    for agent_name, agent in agents.items():
        agent.observe(observations[agent_name])  # Policy observation 

    agent_actions = {
        agent_name: agents[agent_name].act(env.action_space(agent_name))
        for agent_name in env.agents
    }  # Policy action determination here

    observations, rewards, terminations, truncations, infos = env.step(agent_actions)

env.close()

Now you should see the directory test_logging with test_logging/0.csv where the number indicates the parallel_env index.

Full Quickstart Script¶

from free_range_zoo.envs import rideshare_v0
from free_range_zoo.wrappers.action_task import action_mapping_wrapper_v0
import torch
import pickle

with open('<path to configuration>.pkl','rb') as f:
    rideshare_configuration = pickle.load(f)

env = rideshare_v0.parallel_env(
    max_steps = 100,
    parallel_envs = 1,
    configuration = rideshare_configuration,
    device=torch.device('cpu'),
    log_directory = "test_logging"
)
env.reset()
env = action_mapping_wrapper_v0(env)
observations, infos = env.reset()

from free_range_zoo.envs.rideshare.baselines import NoopBaseline, RandomBaseline

# Modify agents based on loaded pkl configuration file  
agents = {
    env.agents[0]: NoopBaseline(agent_name = "driver_1", parallel_envs = 1),
    env.agents[1]: NoopBaseline(agent_name = "driver_2", parallel_envs = 1),
    env.agents[2]: NoopBaseline(agent_name = "driver_3", parallel_envs = 1),
}

while not torch.all(env.finished):
    for agent_name, agent in agents.items():
        agent.observe(observations[agent_name])  # Policy observation 

    agent_actions = {
        agent_name: agents[agent_name].act(env.action_space(agent_name))
        for agent_name in env.agents
    }  # Policy action determination here

    observations, rewards, terminations, truncations, infos = env.step(agent_actions)

env.close()