artlib.reinforcement.FALCON
===========================

.. py:module:: artlib.reinforcement.FALCON

.. autoapi-nested-parse::

   FALCON :cite:`tan2004falcon`, :cite:`tan2008integrating`.


Classes
-------

.. autoapisummary::

   artlib.reinforcement.FALCON.FALCON
   artlib.reinforcement.FALCON.TD_FALCON


Module Contents
---------------

.. py:class:: FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: Union[List[float], numpy.ndarray] = np.array([0.33, 0.33, 0.34]), channel_dims: Union[List[int], numpy.ndarray] = list[int])

   FALCON for Reinforcement Learning.

   This module implements the reactive FALCON as first described in:
   :cite:`tan2004falcon`.

   .. # Tan, A.-H. (2004).
   .. # FALCON: a fusion architecture for learning, cognition, and navigation.
   .. # In Proc. IEEE International Joint Conference on Neural Networks (IJCNN)
   .. # (pp. 3297–3302). volume 4. doi:10.1109/ IJCNN.2004.1381208.

   FALCON is based on a :class:`~artlib.fusion.FusionART.FusionART` backbone but only
   accepts 3 channels: State, Action, and Reward. Specific functions are implemented
   for getting optimal reward and action predictions.


   .. py:attribute:: fusion_art


   .. py:method:: prepare_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) -> Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

      Prepare data for clustering.

      :param states: The state data.
      :type states: np.ndarray
      :param actions: The action data.
      :type actions: np.ndarray
      :param rewards: The reward data.
      :type rewards: np.ndarray

      :returns: Normalized state, action, and reward data.
      :rtype: tuple of np.ndarray


   .. py:method:: restore_data(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray) -> Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]

      Restore data to its original form before preparation.

      :param states: The state data.
      :type states: np.ndarray
      :param actions: The action data.
      :type actions: np.ndarray
      :param rewards: The reward data.
      :type rewards: np.ndarray

      :returns: Restored state, action, and reward data.
      :rtype: tuple of np.ndarray


   .. py:method:: fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)

      Fit the FALCON model to the data.

      :param states: The state data.
      :type states: np.ndarray
      :param actions: The action data.
      :type actions: np.ndarray
      :param rewards: The reward data.
      :type rewards: np.ndarray

      :returns: The fitted FALCON model.
      :rtype: FALCON


   .. py:method:: partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)

      Partially fit the FALCON model to the data.

      :param states: The state data.
      :type states: np.ndarray
      :param actions: The action data.
      :type actions: np.ndarray
      :param rewards: The reward data.
      :type rewards: np.ndarray

      :returns: The partially fitted FALCON model.
      :rtype: FALCON


   .. py:method:: get_actions_and_rewards(state: numpy.ndarray, action_space: Optional[numpy.ndarray] = None) -> Tuple[numpy.ndarray, numpy.ndarray]

      Get possible actions and their associated rewards for a given state.

      :param state: The current state.
      :type state: np.ndarray
      :param action_space: The available action space, by default None.
      :type action_space: np.ndarray, optional

      :returns: The possible actions and their corresponding rewards.
      :rtype: tuple of np.ndarray


   .. py:method:: get_action(state: numpy.ndarray, action_space: Optional[numpy.ndarray] = None, optimality: Literal['min', 'max'] = 'max') -> numpy.ndarray

      Get the best action for a given state based on optimality.

      :param state: The current state.
      :type state: np.ndarray
      :param action_space: The available action space, by default None.
      :type action_space: np.ndarray, optional
      :param optimality: Whether to choose the action with the minimum or maximum reward,
                         by default "max".
      :type optimality: {"min", "max"}, optional

      :returns: The optimal action.
      :rtype: np.ndarray


   .. py:method:: get_probabilistic_action(state: numpy.ndarray, action_space: Optional[numpy.ndarray] = None, offset: float = 0.1, optimality: Literal['min', 'max'] = 'max') -> numpy.ndarray

      Get a probabilistic action for a given state based on reward distribution.

      :param state: The current state.
      :type state: np.ndarray
      :param action_space: The available action space, by default None.
      :type action_space: np.ndarray, optional
      :param offset: The reward offset to adjust probability distribution, by default 0.1.
      :type offset: float, optional
      :param optimality: Whether to prefer minimum or maximum rewards, by default "max".
      :type optimality: {"min", "max"}, optional

      :returns: The chosen action based on probability.
      :rtype: np.ndarray


   .. py:method:: get_rewards(states: numpy.ndarray, actions: numpy.ndarray) -> numpy.ndarray

      Get the rewards for given states and actions.

      :param states: The state data.
      :type states: np.ndarray
      :param actions: The action data.
      :type actions: np.ndarray

      :returns: The rewards corresponding to the given state-action pairs.
      :rtype: np.ndarray


.. py:class:: TD_FALCON(state_art: artlib.common.BaseART.BaseART, action_art: artlib.common.BaseART.BaseART, reward_art: artlib.common.BaseART.BaseART, gamma_values: Union[List[float], numpy.ndarray] = np.array([0.33, 0.33, 0.34]), channel_dims: Union[List[int], numpy.ndarray] = list[int], td_alpha: float = 1.0, td_lambda: float = 1.0)

   Bases: :py:obj:`FALCON`


   TD-FALCON for Reinforcement Learning.

   This module implements TD-FALCON as first described in:
   :cite:`tan2008integrating`.

   .. # Tan, A.-H., Lu, N., & Xiao, D. (2008).
   .. # Integrating Temporal Difference Methods and Self-Organizing Neural Networks for
   .. # Reinforcement Learning With Delayed Evaluative Feedback.
   .. # IEEE Transactions on Neural Networks, 19 , 230–244. doi:10.1109/TNN.2007.905839

   TD-FALCON is based on a :class:`FALCON` backbone but includes specific function for
   temporal-difference learning. Currently, only SARSA is implemented and only
   :class:`~artlib.elementary.FuzzyART.FuzzyART` base modules are supported.


   .. py:attribute:: td_alpha
      :value: 1.0


   .. py:attribute:: td_lambda
      :value: 1.0


   .. py:method:: fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray)
      :abstractmethod:


      Fit the TD-FALCON model to the data.

      :raises NotImplementedError: TD-FALCON can only be trained with partial fit.


   .. py:method:: calculate_SARSA(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: Optional[float] = None)

      Calculate the SARSA values for reinforcement learning.

      :param states: The state data.
      :type states: np.ndarray
      :param actions: The action data.
      :type actions: np.ndarray
      :param rewards: The reward data.
      :type rewards: np.ndarray
      :param single_sample_reward: The reward for a single sample, if applicable, by default None.
      :type single_sample_reward: float, optional

      :returns: The state, action, and SARSA-adjusted reward data to be used for fitting.
      :rtype: tuple of np.ndarray


   .. py:method:: partial_fit(states: numpy.ndarray, actions: numpy.ndarray, rewards: numpy.ndarray, single_sample_reward: Optional[float] = None)

      Partially fit the TD-FALCON model using SARSA.

      :param states: The state data.
      :type states: np.ndarray
      :param actions: The action data.
      :type actions: np.ndarray
      :param rewards: The reward data.
      :type rewards: np.ndarray
      :param single_sample_reward: The reward for a single sample, if applicable, by default None.
      :type single_sample_reward: float, optional

      :returns: The partially fitted TD-FALCON model.
      :rtype: TD_FALCON