QLearner

new QLearner(parameters)

Description:

Implements a Q-learning reinforcement learning mechanism on a symboling space.

QLearner parameters

General parameter
- state_type: ... The type element used to calculate the state distance, mandatory, default is "value".
- action_type: ... The type element used to calculate the action distance, mandatory, default is "value".
- initial_state: ... The initial current state, mandatory, default is "empty".
- initial_action: ... The default and initial current state, mandatory, default is "empty".
- initial_time: ... The initial discrete time step, default is 0.
- gamma: ... The Q-Learner application dependent discount factor, default is 0.5.
- alpha: ... The Q-Learner table learning rate, default is 0.1.
- epsilon: ... The threshold between to indistinguishable numerical value, default is 1e-3.
- K_max: ... The maximal cardinal of the interpolation neighborhood, default is -1 (i.e., no bound).
- d_max: ... The maximal distance of a neighborhood, default is DBL_MAX (i.e., no bound).
- bounds_count: ... The maximal number of bounds to take into account for local neighborhood extrapolation, default is 0 (i.e., all bounds).
Implementation of a Q-Learner problem solver

A Q-Learner problem solver
- doAction(action): The implementation of the environment feedback to a given action.
- getReward(state, action): The calculation of the reward given the last action and updated state.
  - Depending on the paradigm, the reward can be considered as given by the environment (external reward), or evaluated by the agent (internal reward).
- A typical implementation of a derived type of name "mytype" is given by the following construct:
```
#include "QLearner.hpp"
namespace symboling {
  class MyQLearner : public QLearner {
  public:
    MyQLearner() : QLearner("{type: "MyType", ...}") {}

    // Considers the current state and action to evaluate the next state and reward accordingly.
    virtual void doAction(JSON action, unsigned int time, wjson::Value& state, double& reward) {
     ../..
    }
  };
}
```

Parameters:

Name	Type	Description
`parameters`	JSON \| String	The Qlearner parameters.

Methods

getParameters() → {JSON}

Description:

Returns the QLearner parameters.

Returns:

The QLearner parameters.

Type: JSON

setParameters(parameters) → {QLearner}

Description:

Modifies some QLearner parameters.
- All parameters but the state_type and action_type can be modified.
- The Qtable is preserved when changing the parameters.

Parameters:

Name	Type	Description
`parameters`	JSON	The QLearner parameters.

Returns:

This value allowing to chain methods.

Type: QLearner

getQtable() → {ScalarField}

Description:

Returns the Qtable.

Returns:

A field of format, as proposed in ScalarField::getValues().:

[
  {
    input: {
      state: …      # The chosen state corresponding to this reward, of type `state_type`.
      action: …     # The chosen action for this reward, of type `action_type`.
 Additional items not taken into account for the field indexing:
      reward: …     # The reward resulting of the action and the related state, of type double.
      time: …       # The time step corresponding to this element of last modification of it,
                   of type unsigned integer, starting to 1 after the 1st action (time 0 is the initial state).
     }
     output:        # The reward resulting of the action and the related state, of type double.
  }
../..
]

Type: ScalarField

doAction(result)

Description:

Defines the interaction with the environment.
- This methods is to be overloaded to implements a given reinforcement learning environment.

Parameters:

Name Type Description

Name	Type	Description
`result`	Value	The actual result with: Input: `action` The action to be output in the environment. `time` The current time at the beginning of the action. Output" `state` The updated next state value resulting of the action. `reward` The updated obtained reward resulting of the action, given the present state.

result

Value

The actual result with:

Input:
- action The action to be output in the environment.
- time The current time at the beginning of the action.
Output"
- state The updated next state value resulting of the action.
- reward The updated obtained reward resulting of the action, given the present state.

doNext() → {Value}

Description:

Performs the next action.

A typical usage could be of the form

 // Memorizes the Q-Learner mechanism trajectory.
 std::Vector<wjson::Value> path;
 for(unsigned int t = 0; t < T; t++) {
   path.push_back(qlearning.doNext());
 }

Returns:

A temporary data structure of the form of a Qtable element as detailed in getQtable().

Type: Value

doPath(max_pathopt, max_rewardopt) → {Value}

Description:

Performs a sequence of actions with this QLearner.

Parameters:

Name	Type	Attributes	Default	Description
`max_path`	unit	<optional>	`0`	The maximal path length, default is 0 (no bound); it can also be sepcified as the "max_path" QLearner parameter.
`max_reward`	double	<optional>	`DBL_MAX`	The maximal reward value, after which the goal is considered as attained. Default is DBL_MAX; it can also be sepcified as the "max_reward" QLearner parameter.

Returns:

A temporary data structure of the form of a Qtable element's list, as detailed in getQtable().

Type: Value

(static) newQLearner(parameters, trajectory)

Description:

Creates a Qlearning module from a Trajectory specification.

Parameters:

Name Type Description

Name	Type	Description
`parameters`	JSON \| String	The Qlearner and related Trajectory parameters. The state corresponds to the (fully observable) trajectory current position. The action corresponds to the (fully controalble) trajectory next position. The reward corresponds to the complete trajectory subharmonic potential. The QLearner `action_type` and `state_type` are equal to the trajectory data `type`, thus not be specified.
`trajectory`	Trajectory \| JSON \| String	The trajectory specification.

parameters

JSON | String

The Qlearner and related Trajectory parameters.

The state corresponds to the (fully observable) trajectory current position.
The action corresponds to the (fully controalble) trajectory next position.
The reward corresponds to the complete trajectory subharmonic potential.
The QLearner action_type and state_type are equal to the trajectory data type, thus not be specified.

trajectory

Trajectory | JSON | String

The trajectory specification.

Returns:

A Qlearner with the objective to reach a trajectory goal. To be deleted after use.

new QLearner(parameters)

QLearner parameters

General parameter

Implementation of a Q-Learner problem solver

Parameters:

Methods

getParameters() → {JSON}

Returns:

setParameters(parameters) → {QLearner}

Parameters:

Returns:

getQtable() → {ScalarField}

Returns:

doAction(result)

Parameters:

doNext() → {Value}

Returns:

doPath(max_pathopt, max_rewardopt) → {Value}

Parameters:

Returns:

(static) newQLearner(parameters, trajectory)

Parameters:

Returns: