Specification¶

Import	`from free_range_zoo.envs import cybersecurity_v0`
Actions	Discrete & Stochastic
Observations	Discrete and Partially Observed with Private Observations
Parallel API	Yes
Manual Control	No
Agent Names	[\(attacker_0\), …, \(attacker_n\), \(defender_0\), …, \(defender_n\)]
# Agents	[0, \(n_{attackers}\) + \(n_{defenders}\)]
Action Shape	(\(envs\), 2)
Action Values	Attackers: [\(attack_0\), …, \(attack_{tasks}\), \(noop\) (-1)] Defenders: [\(move_0\), …, \(move_{tasks}\), \(noop\) (-1), \(patch\) (-2), \(monitor\) (-3)]
Observation Shape	Attackers: TensorDict { self: \(<power, presence>\) others: \(<power, presence>\) tasks: \(<state>\) batch_size: \(num\_envs\) } Defenders: TensorDict { self: \(<power, presence, location>\) others: \(<power, presence, location>\) tasks: \(<state>\) batch_size: \(num\_envs\)}
Observation Values	Attackers: self \(power\): [\(0\), \(max\_power_{attacker}\)] \(presence\): [\(0\), \(1\)] others \(power\): [\(0\), \(max\_power_{attacker}\)] \(presence\): [\(0\), \(1\)] tasks \(state\): [\(0\), \(n_{network\_states}\)] Defenders: self \(power\): [\(0\), \(max\_power_{defender}\)] \(presence\): [\(0\), \(1\)] \(location\): [\(0\), \(n_{subnetworks}\)] others \(power\): [\(0\), \(max\_power_{defender}\)] \(presence\): [\(0\), \(1\)] \(location\): [\(0\), \(n_{subnetworks}\)] tasks \(state\): [\(0\), \(n_{network\_states}\)]

Usage¶

Parallel API¶

from free_range_zoo.envs import cybersecurity_v0

main_logger = logging.getLogger(__name__)

# Initialize and reset environment to initial state
env = cybersecurity_v0.parallel_env(render_mode="human")
observations, infos = env.reset()

# Initialize agents and give initial observations
agents = []

cumulative_rewards = {agent: 0 for agent in env.agents}

current_step = 0
while not torch.all(env.finished):
    agent_actions = {
        agent_name: torch.stack([agents[agent_name].act()])
        for agent_name in env.agents
    }  # Policy action determination here

    observations, rewards, terminations, truncations, infos = env.step(agent_actions)
    rewards = {agent_name: rewards[agent_name].item() for agent_name in env.agents}

    for agent_name, agent in agents.items():
        agent.observe(observations[agent_name][0])  # Policy observation processing here
        cumulative_rewards[agent_name] += rewards[agent_name]

    main_logger.info(f"Step {current_step}: {rewards}")
    current_step += 1

env.close()

AEC API¶

from free_range_zoo.envs import cybersecurity_v0

main_logger = logging.getLogger(__name__)

# Initialize and reset environment to initial state
env = cybersecurity_v0.parallel_env(render_mode="human")
observations, infos = env.reset()

# Initialize agents and give initial observations
agents = []

cumulative_rewards = {agent: 0 for agent in env.agents}

current_step = 0
while not torch.all(env.finished):
    for agent in env.agent_iter():
        observations, rewards, terminations, truncations, infos = env.last()

        # Policy action determination here
        action = env.action_space(agent).sample()

        env.step(action)

    rewards = {agent: rewards[agent].item() for agent in env.agents}
    cumulative_rewards[agent] += rewards[agent]

    current_step += 1
    main_logger.info(f"Step {current_step}: {rewards}")

env.close()

Configuration¶

class free_range_zoo.envs.cybersecurity.env.structures.configuration.AttackerConfiguration(initial_presence: BoolTensor, threat: FloatTensor, persist_probs: FloatTensor, return_probs: FloatTensor)[source]¶

Configuration for the attacker in the cybersecurity environment.

Variables:

initial_presence (torch.BoolTensor) – torch.BoolTensor - Initial presence of each attacking agent
threat (torch.FloatTensor) – torch.FloatTensor - Threat values for each attacking agent
persist_probs (torch.FloatTensor) – torch.FloatTensor - Probability for each attacking agent to leave the environment
return_probs (torch.FloatTensor) – torch.FloatTensor - Probability for each attacking agent to return to the environment

class free_range_zoo.envs.cybersecurity.env.structures.configuration.CybersecurityConfiguration(attacker_config: AttackerConfiguration, defender_config: DefenderConfiguration, network_config: NetworkConfiguration, reward_config: RewardConfiguration, stochastic_config: StochasticConfiguration)[source]¶

Configuration for the cybersecurity environment.

Variables:

attacker_config (free_range_zoo.envs.cybersecurity.env.structures.configuration.AttackerConfiguration) – AttackerConfiguration - Configuration for the attacker agent properties
defender_config (free_range_zoo.envs.cybersecurity.env.structures.configuration.DefenderConfiguration) – DefenderConfiguration - Configuration for the defender agent properties
network_config (free_range_zoo.envs.cybersecurity.env.structures.configuration.NetworkConfiguration) – NetworkConfiguration - Configuration for the network nodes
reward_config (free_range_zoo.envs.cybersecurity.env.structures.configuration.RewardConfiguration) – RewardConfiguration - Configuration for the environment rewards
stochastic_config (free_range_zoo.envs.cybersecurity.env.structures.configuration.StochasticConfiguration) – StochasticConfiguration - Configuration for the stochastic components of the environment

class free_range_zoo.envs.cybersecurity.env.structures.configuration.DefenderConfiguration(initial_location: IntTensor, initial_presence: BoolTensor, mitigation: FloatTensor, persist_probs: FloatTensor, return_probs: FloatTensor)[source]¶

Configuration for the defender in the cybersecurity environment.

Variables:

initial_location (torch.IntTensor) – torch.IntTensor - Initial location of each defending agent
initial_presence (torch.BoolTensor) – torch.BoolTensor - Initial presence of each defending agent
mitigation (torch.FloatTensor) – torch.FloatTensor - mitigation values for each defending agent
persist_probs (torch.FloatTensor) – torch.FloatTensor - Probability for each defending agent to leave the environment
return_probs (torch.FloatTensor) – torch.FloatTensor - Probability for each defending agent to return to the environment

class free_range_zoo.envs.cybersecurity.env.structures.configuration.NetworkConfiguration(patched_states: int, vulnerable_states: int, exploited_states: int, temperature: float, initial_state: IntTensor, adj_matrix: BoolTensor)[source]¶

Configuration for the network components of the cybersecurity simulation.

The home node for the simulation is automatically defined as node -1.

Variables:

patched_states (int) – int - Number of patched states in the network
vulnerable_states (int) – int - Number of vulnerable states in the network
exploited_states (int) – int - Number of exploited states in the network
temperature (float) – float - Temperature for the softmax function for the danger score
initial_state (torch.IntTensor) – torch.IntTensor - Subnetwork-parallel array representing the exploitment state of each subnetwork
adj_matrix (torch.BoolTensor) – torch.BoolTensor - 2D array representing adjacency matrix for all subnetwork connections

class free_range_zoo.envs.cybersecurity.env.structures.configuration.RewardConfiguration(bad_action_penalty: float, patch_reward: float, network_state_rewards: FloatTensor)[source]¶

Configuration for the rewards in the cybersecurity environment.

Variables:

bad_action_penalty (float) – float - Penalty for committing a bad action (patching while at the home node)
patch_reward (float) – float - Reward (or penalty) for patching a node
network_state_rewards (torch.FloatTensor) – torch.FloatTensor - Subnetwork-parallel array representing the rewards for each

class free_range_zoo.envs.cybersecurity.env.structures.configuration.StochasticConfiguration(network_state: bool)[source]¶

Configuration for the stochastic components of the cybersecurity simulation.

Variables:: network_state (bool) – bool - Whether the subnetwork states degrade / repair stochastically

API¶

class free_range_zoo.envs.cybersecurity.env.cybersecurity.env(wrappers: List[Callable] = [], **kwargs)[source]¶

AEC wrapped version of the cybersecurity environment.

Parameters:: wrappers – List[Callable[[BatchedAECEnv], BatchedAECEnv]] - the wrappers to apply to the environment
Returns:: BatchedAECEnv – the cybersecurity environment

class free_range_zoo.envs.cybersecurity.env.cybersecurity.raw_env(*args, observe_other_location: bool = False, observe_other_presence: bool = False, observe_other_power: bool = True, partially_observable: bool = True, show_bad_actions: bool = True, **kwargs)[source]¶

Environment definition for the cybersecurity environment.

Initialize the cybersecurity environment.

Parameters:

observe_other_location – bool - whether to observe the location of other agents
observe_other_presence – bool - whether to observe the presence of other agents
observe_other_power – bool - whether to observe the power of other agents
partially_observable – bool - whether observations of subnetwork states should only be returned on monitor
show_bad_actions – bool - whether to show bad actions (patch at home node)