artlib.reinforcement.FALCON

FALCON [16], [17].

Classes

`FALCON`	FALCON for Reinforcement Learning.
`TD_FALCON`	TD-FALCON for Reinforcement Learning.

Module Contents

class artlib.reinforcement.FALCON.FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: List[float] | numpy.ndarray = np.array([0.33, 0.33, 0.34]), channel_dims: List[int] | numpy.ndarray = list[int])

FALCON for Reinforcement Learning.

This module implements the reactive FALCON as first described in: [16].

FALCON is based on a FusionART backbone but only accepts 3 channels: State, Action, and Reward. Specific functions are implemented for getting optimal reward and action predictions.

fusion_art

prepare_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

Prepare data for clustering.

Parameters:

states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.

Returns:

Normalized state, action, and reward data.

Return type:

tuple of np.ndarray

restore_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

Restore data to its original form before preparation.

Parameters:

states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.

Returns:

Restored state, action, and reward data.

Return type:

tuple of np.ndarray

fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)

Fit the FALCON model to the data.

Parameters:

states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.

Returns:

The fitted FALCON model.

Return type:

FALCON

partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)

Partially fit the FALCON model to the data.

Parameters:

states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.

Returns:

The partially fitted FALCON model.

Return type:

FALCON

get_actions_and_rewards(state: numpy.ndarray, action_space: numpy.ndarray | None = None) → Tuple[numpy.ndarray, numpy.ndarray]

Get possible actions and their associated rewards for a given state.

Parameters:

state (np.ndarray) – The current state.
action_space (np.ndarray, optional) – The available action space, by default None.

Returns:

The possible actions and their corresponding rewards.

Return type:

tuple of np.ndarray

get_action(state: numpy.ndarray, action_space: numpy.ndarray | None = None, optimality: Literal['min', 'max'] = 'max') → numpy.ndarray

Get the best action for a given state based on optimality.

Parameters:

state (np.ndarray) – The current state.
action_space (np.ndarray, optional) – The available action space, by default None.
optimality ({"min", "max"}, optional) – Whether to choose the action with the minimum or maximum reward, by default “max”.

Returns:

The optimal action.

Return type:

np.ndarray

get_probabilistic_action(state: numpy.ndarray, action_space: numpy.ndarray | None = None, offset: float = 0.1, optimality: Literal['min', 'max'] = 'max') → numpy.ndarray

Get a probabilistic action for a given state based on reward distribution.

Parameters:

state (np.ndarray) – The current state.
action_space (np.ndarray, optional) – The available action space, by default None.
offset (float, optional) – The reward offset to adjust probability distribution, by default 0.1.
optimality ({"min", "max"}, optional) – Whether to prefer minimum or maximum rewards, by default “max”.

Returns:

The chosen action based on probability.

Return type:

np.ndarray

get_rewards(states: numpy.ndarray, actions: numpy.ndarray) → numpy.ndarray

Get the rewards for given states and actions.

Parameters:

states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.

Returns:

The rewards corresponding to the given state-action pairs.

Return type:

np.ndarray

class artlib.reinforcement.FALCON.TD_FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: List[float] | numpy.ndarray = np.array([0.33, 0.33, 0.34]), channel_dims: List[int] | numpy.ndarray = list[int], td_alpha: float = 1.0, td_lambda: float = 1.0)

Bases: FALCON

TD-FALCON for Reinforcement Learning.

This module implements TD-FALCON as first described in: [17].

TD-FALCON is based on a FALCON backbone but includes specific function for temporal-difference learning. Currently, only SARSA is implemented and only FuzzyART base modules are supported.

td_alpha = 1.0

td_lambda = 1.0

abstract fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)

Fit the TD-FALCON model to the data.

Raises:: NotImplementedError – TD-FALCON can only be trained with partial fit.

calculate_SARSA(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: float | None = None)

Calculate the SARSA values for reinforcement learning.

Parameters:

states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
single_sample_reward (float, optional) – The reward for a single sample, if applicable, by default None.

Returns:

The state, action, and SARSA-adjusted reward data to be used for fitting.

Return type:

tuple of np.ndarray

partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: float | None = None)

Partially fit the TD-FALCON model using SARSA.

Parameters:

states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
single_sample_reward (float, optional) – The reward for a single sample, if applicable, by default None.

Returns:

The partially fitted TD-FALCON model.

Return type:

TD_FALCON