QLearner

QLearner

new QLearner(parameters)

Description:
  • Implements a Q-learning reinforcement learning mechanism on a symboling space.

    QLearner parameters

    General parameter

    • state_type: ... The type element used to calculate the state distance, mandatory, default is "value".
    • action_type: ... The type element used to calculate the action distance, mandatory, default is "value".
    • initial_state: ... The initial current state, mandatory, default is "empty".
    • initial_action: ... The default and initial current state, mandatory, default is "empty".
    • initial_time: ... The initial discrete time step, default is 0.
    • gamma: ... The Q-Learner application dependent discount factor, default is 0.5.
    • alpha: ... The Q-Learner table learning rate, default is 0.1.
    • epsilon: ... The threshold between to indistinguishable numerical value, default is 1e-3.
    • K_max: ... The maximal cardinal of the interpolation neighborhood, default is -1 (i.e., no bound).
    • d_max: ... The maximal distance of a neighborhood, default is DBL_MAX (i.e., no bound).
    • bounds_count: ... The maximal number of bounds to take into account for local neighborhood extrapolation, default is 0 (i.e., all bounds).

    Implementation of a Q-Learner problem solver

    A Q-Learner problem solver

    • doAction(action): The implementation of the environment feedback to a given action.

    • getReward(state, action): The calculation of the reward given the last action and updated state.

      • Depending on the paradigm, the reward can be considered as given by the environment (external reward), or evaluated by the agent (internal reward).
    • A typical implementation of a derived type of name "mytype" is given by the following construct:

    #include "QLearner.hpp"
    namespace symboling {
      class MyQLearner : public QLearner {
      public:
        MyQLearner() : QLearner("{type: "MyType", ...}") {}
    
        // Considers the current state and action to evaluate the next state and reward accordingly.
        virtual void doAction(JSON action, unsigned int time, wjson::Value& state, double& reward) {
         ../..
        }
      };
    }
    
Parameters:
Name Type Description
parameters JSON | String

The Qlearner parameters.

Methods

getParameters() → {JSON}

Description:
  • Returns the QLearner parameters.

Returns:

The QLearner parameters.

Type
JSON

setParameters(parameters) → {QLearner}

Description:
  • Modifies some QLearner parameters.

    • All parameters but the state_type and action_type can be modified.
    • The Qtable is preserved when changing the parameters.
Parameters:
Name Type Description
parameters JSON

The QLearner parameters.

Returns:

This value allowing to chain methods.

Type
QLearner

getQtable() → {ScalarField}

Description:
  • Returns the Qtable.

Returns:

A field of format, as proposed in ScalarField::getValues().:

[
  {
    input: {
      state: …      # The chosen state corresponding to this reward, of type `state_type`.
      action: …     # The chosen action for this reward, of type `action_type`.
 Additional items not taken into account for the field indexing:
      reward: …     # The reward resulting of the action and the related state, of type double.
      time: …       # The time step corresponding to this element of last modification of it,
                   of type unsigned integer, starting to 1 after the 1st action (time 0 is the initial state).
     }
     output:        # The reward resulting of the action and the related state, of type double.
  }
../..
]
Type
ScalarField

doAction(result)

Description:
  • Defines the interaction with the environment.

    • This methods is to be overloaded to implements a given reinforcement learning environment.
Parameters:
Name Type Description
result Value

The actual result with:

  • Input:
    • action The action to be output in the environment.
    • time The current time at the beginning of the action.
  • Output"
    • state The updated next state value resulting of the action.
    • reward The updated obtained reward resulting of the action, given the present state.

doNext() → {Value}

Description:
  • Performs the next action.

    • A typical usage could be of the form
     // Memorizes the Q-Learner mechanism trajectory.
     std::Vector<wjson::Value> path;
     for(unsigned int t = 0; t < T; t++) {
       path.push_back(qlearning.doNext());
     }
    
Returns:

A temporary data structure of the form of a Qtable element as detailed in getQtable().

Type
Value

doPath(max_pathopt, max_rewardopt) → {Value}

Description:
  • Performs a sequence of actions with this QLearner.

Parameters:
Name Type Attributes Default Description
max_path unit <optional>
0

The maximal path length, default is 0 (no bound); it can also be sepcified as the "max_path" QLearner parameter.

max_reward double <optional>
DBL_MAX

The maximal reward value, after which the goal is considered as attained. Default is DBL_MAX; it can also be sepcified as the "max_reward" QLearner parameter.

Returns:

A temporary data structure of the form of a Qtable element's list, as detailed in getQtable().

Type
Value

(static) newQLearner(parameters, trajectory)

Description:
  • Creates a Qlearning module from a Trajectory specification.

Parameters:
Name Type Description
parameters JSON | String

The Qlearner and related Trajectory parameters.

  • The state corresponds to the (fully observable) trajectory current position.
  • The action corresponds to the (fully controalble) trajectory next position.
  • The reward corresponds to the complete trajectory subharmonic potential.
  • The QLearner action_type and state_type are equal to the trajectory data type, thus not be specified.
trajectory Trajectory | JSON | String

The trajectory specification.

Returns:

A Qlearner with the objective to reach a trajectory goal. To be deleted after use.