artlib.reinforcement.FALCON =========================== .. py:module:: artlib.reinforcement.FALCON .. autoapi-nested-parse:: FALCON :cite:`tan2004falcon`, :cite:`tan2008integrating`. Classes ------- .. autoapisummary:: artlib.reinforcement.FALCON.FALCON artlib.reinforcement.FALCON.TD_FALCON Module Contents --------------- .. py:class:: FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: Union[List[float], numpy.ndarray] = np.array([0.33, 0.33, 0.34]), channel_dims: Union[List[int], numpy.ndarray] = list[int]) FALCON for Reinforcement Learning. This module implements the reactive FALCON as first described in: :cite:`tan2004falcon`. .. # Tan, A.-H. (2004). .. # FALCON: a fusion architecture for learning, cognition, and navigation. .. # In Proc. IEEE International Joint Conference on Neural Networks (IJCNN) .. # (pp. 3297–3302). volume 4. doi:10.1109/ IJCNN.2004.1381208. FALCON is based on a :class:`~artlib.fusion.FusionART.FusionART` backbone but only accepts 3 channels: State, Action, and Reward. Specific functions are implemented for getting optimal reward and action predictions. .. py:attribute:: fusion_art .. py:method:: prepare_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) -> Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] Prepare data for clustering. :param states: The state data. :type states: np.ndarray :param actions: The action data. :type actions: np.ndarray :param rewards: The reward data. :type rewards: np.ndarray :returns: Normalized state, action, and reward data. :rtype: tuple of np.ndarray .. py:method:: restore_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) -> Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] Restore data to its original form before preparation. :param states: The state data. :type states: np.ndarray :param actions: The action data. :type actions: np.ndarray :param rewards: The reward data. :type rewards: np.ndarray :returns: Restored state, action, and reward data. :rtype: tuple of np.ndarray .. py:method:: fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) Fit the FALCON model to the data. :param states: The state data. :type states: np.ndarray :param actions: The action data. :type actions: np.ndarray :param rewards: The reward data. :type rewards: np.ndarray :returns: The fitted FALCON model. :rtype: FALCON .. py:method:: partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) Partially fit the FALCON model to the data. :param states: The state data. :type states: np.ndarray :param actions: The action data. :type actions: np.ndarray :param rewards: The reward data. :type rewards: np.ndarray :returns: The partially fitted FALCON model. :rtype: FALCON .. py:method:: get_actions_and_rewards(state: numpy.ndarray, action_space: Optional[numpy.ndarray] = None) -> Tuple[numpy.ndarray, numpy.ndarray] Get possible actions and their associated rewards for a given state. :param state: The current state. :type state: np.ndarray :param action_space: The available action space, by default None. :type action_space: np.ndarray, optional :returns: The possible actions and their corresponding rewards. :rtype: tuple of np.ndarray .. py:method:: get_action(state: numpy.ndarray, action_space: Optional[numpy.ndarray] = None, optimality: Literal['min', 'max'] = 'max') -> numpy.ndarray Get the best action for a given state based on optimality. :param state: The current state. :type state: np.ndarray :param action_space: The available action space, by default None. :type action_space: np.ndarray, optional :param optimality: Whether to choose the action with the minimum or maximum reward, by default "max". :type optimality: {"min", "max"}, optional :returns: The optimal action. :rtype: np.ndarray .. py:method:: get_probabilistic_action(state: numpy.ndarray, action_space: Optional[numpy.ndarray] = None, offset: float = 0.1, optimality: Literal['min', 'max'] = 'max') -> numpy.ndarray Get a probabilistic action for a given state based on reward distribution. :param state: The current state. :type state: np.ndarray :param action_space: The available action space, by default None. :type action_space: np.ndarray, optional :param offset: The reward offset to adjust probability distribution, by default 0.1. :type offset: float, optional :param optimality: Whether to prefer minimum or maximum rewards, by default "max". :type optimality: {"min", "max"}, optional :returns: The chosen action based on probability. :rtype: np.ndarray .. py:method:: get_rewards(states: numpy.ndarray, actions: numpy.ndarray) -> numpy.ndarray Get the rewards for given states and actions. :param states: The state data. :type states: np.ndarray :param actions: The action data. :type actions: np.ndarray :returns: The rewards corresponding to the given state-action pairs. :rtype: np.ndarray .. py:class:: TD_FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: Union[List[float], numpy.ndarray] = np.array([0.33, 0.33, 0.34]), channel_dims: Union[List[int], numpy.ndarray] = list[int], td_alpha: float = 1.0, td_lambda: float = 1.0) Bases: :py:obj:`FALCON` TD-FALCON for Reinforcement Learning. This module implements TD-FALCON as first described in: :cite:`tan2008integrating`. .. # Tan, A.-H., Lu, N., & Xiao, D. (2008). .. # Integrating Temporal Difference Methods and Self-Organizing Neural Networks for .. # Reinforcement Learning With Delayed Evaluative Feedback. .. # IEEE Transactions on Neural Networks, 19 , 230–244. doi:10.1109/TNN.2007.905839 TD-FALCON is based on a :class:`FALCON` backbone but includes specific function for temporal-difference learning. Currently, only SARSA is implemented and only :class:`~artlib.elementary.FuzzyART.FuzzyART` base modules are supported. .. py:attribute:: td_alpha :value: 1.0 .. py:attribute:: td_lambda :value: 1.0 .. py:method:: fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) :abstractmethod: Fit the TD-FALCON model to the data. :raises NotImplementedError: TD-FALCON can only be trained with partial fit. .. py:method:: calculate_SARSA(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: Optional[float] = None) Calculate the SARSA values for reinforcement learning. :param states: The state data. :type states: np.ndarray :param actions: The action data. :type actions: np.ndarray :param rewards: The reward data. :type rewards: np.ndarray :param single_sample_reward: The reward for a single sample, if applicable, by default None. :type single_sample_reward: float, optional :returns: The state, action, and SARSA-adjusted reward data to be used for fitting. :rtype: tuple of np.ndarray .. py:method:: partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: Optional[float] = None) Partially fit the TD-FALCON model using SARSA. :param states: The state data. :type states: np.ndarray :param actions: The action data. :type actions: np.ndarray :param rewards: The reward data. :type rewards: np.ndarray :param single_sample_reward: The reward for a single sample, if applicable, by default None. :type single_sample_reward: float, optional :returns: The partially fitted TD-FALCON model. :rtype: TD_FALCON