artlib.reinforcement.FALCON
Classes
FALCON for Reinforcement Learning. |
|
TD-FALCON for Reinforcement Learning. |
Module Contents
- class artlib.reinforcement.FALCON.FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: List[float] | numpy.ndarray = np.array([0.33, 0.33, 0.34]), channel_dims: List[int] | numpy.ndarray = list[int])
FALCON for Reinforcement Learning.
This module implements the reactive FALCON as first described in: [16].
FALCON is based on a
FusionARTbackbone but only accepts 3 channels: State, Action, and Reward. Specific functions are implemented for getting optimal reward and action predictions.- fusion_art
- prepare_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]
Prepare data for clustering.
- Parameters:
states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
- Returns:
Normalized state, action, and reward data.
- Return type:
tuple of np.ndarray
- restore_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]
Restore data to its original form before preparation.
- Parameters:
states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
- Returns:
Restored state, action, and reward data.
- Return type:
tuple of np.ndarray
- fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)
Fit the FALCON model to the data.
- Parameters:
states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
- Returns:
The fitted FALCON model.
- Return type:
- partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)
Partially fit the FALCON model to the data.
- Parameters:
states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
- Returns:
The partially fitted FALCON model.
- Return type:
- get_actions_and_rewards(state: numpy.ndarray, action_space: numpy.ndarray | None = None) Tuple[numpy.ndarray, numpy.ndarray]
Get possible actions and their associated rewards for a given state.
- Parameters:
state (np.ndarray) – The current state.
action_space (np.ndarray, optional) – The available action space, by default None.
- Returns:
The possible actions and their corresponding rewards.
- Return type:
tuple of np.ndarray
- get_action(state: numpy.ndarray, action_space: numpy.ndarray | None = None, optimality: Literal['min', 'max'] = 'max') numpy.ndarray
Get the best action for a given state based on optimality.
- Parameters:
state (np.ndarray) – The current state.
action_space (np.ndarray, optional) – The available action space, by default None.
optimality ({"min", "max"}, optional) – Whether to choose the action with the minimum or maximum reward, by default “max”.
- Returns:
The optimal action.
- Return type:
np.ndarray
- get_probabilistic_action(state: numpy.ndarray, action_space: numpy.ndarray | None = None, offset: float = 0.1, optimality: Literal['min', 'max'] = 'max') numpy.ndarray
Get a probabilistic action for a given state based on reward distribution.
- Parameters:
state (np.ndarray) – The current state.
action_space (np.ndarray, optional) – The available action space, by default None.
offset (float, optional) – The reward offset to adjust probability distribution, by default 0.1.
optimality ({"min", "max"}, optional) – Whether to prefer minimum or maximum rewards, by default “max”.
- Returns:
The chosen action based on probability.
- Return type:
np.ndarray
- get_rewards(states: numpy.ndarray, actions: numpy.ndarray) numpy.ndarray
Get the rewards for given states and actions.
- Parameters:
states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
- Returns:
The rewards corresponding to the given state-action pairs.
- Return type:
np.ndarray
- class artlib.reinforcement.FALCON.TD_FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: List[float] | numpy.ndarray = np.array([0.33, 0.33, 0.34]), channel_dims: List[int] | numpy.ndarray = list[int], td_alpha: float = 1.0, td_lambda: float = 1.0)
Bases:
FALCONTD-FALCON for Reinforcement Learning.
This module implements TD-FALCON as first described in: [17].
TD-FALCON is based on a
FALCONbackbone but includes specific function for temporal-difference learning. Currently, only SARSA is implemented and onlyFuzzyARTbase modules are supported.- td_alpha = 1.0
- td_lambda = 1.0
- abstract fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)
Fit the TD-FALCON model to the data.
- Raises:
NotImplementedError – TD-FALCON can only be trained with partial fit.
- calculate_SARSA(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: float | None = None)
Calculate the SARSA values for reinforcement learning.
- Parameters:
states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
single_sample_reward (float, optional) – The reward for a single sample, if applicable, by default None.
- Returns:
The state, action, and SARSA-adjusted reward data to be used for fitting.
- Return type:
tuple of np.ndarray
- partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: float | None = None)
Partially fit the TD-FALCON model using SARSA.
- Parameters:
states (np.ndarray) – The state data.
actions (np.ndarray) – The action data.
rewards (np.ndarray) – The reward data.
single_sample_reward (float, optional) – The reward for a single sample, if applicable, by default None.
- Returns:
The partially fitted TD-FALCON model.
- Return type: