new QLearner(parameters)
- Description:
Implements a Q-learning reinforcement learning mechanism on a symboling space.
QLearner parameters
General parameter
state_type: ...
The type element used to calculate the state distance, mandatory, default is "value".action_type: ...
The type element used to calculate the action distance, mandatory, default is "value".initial_state: ...
The initial current state, mandatory, default is "empty".initial_action: ...
The default and initial current state, mandatory, default is "empty".initial_time: ...
The initial discrete time step, default is 0.gamma: ...
The Q-Learner application dependent discount factor, default is 0.5.alpha: ...
The Q-Learner table learning rate, default is 0.1.epsilon: ...
The threshold between to indistinguishable numerical value, default is1e-3
.K_max: ...
The maximal cardinal of the interpolation neighborhood, default is -1 (i.e., no bound).d_max: ...
The maximal distance of a neighborhood, default is DBL_MAX (i.e., no bound).bounds_count: ...
The maximal number of bounds to take into account for local neighborhood extrapolation, default is 0 (i.e., all bounds).
Implementation of a Q-Learner problem solver
A Q-Learner problem solver
-
doAction(action)
: The implementation of the environment feedback to a given action. -
getReward(state, action)
: The calculation of the reward given the last action and updated state.- Depending on the paradigm, the reward can be considered as given by the environment (external reward), or evaluated by the agent (internal reward).
-
A typical implementation of a derived type of name "mytype" is given by the following construct:
#include "QLearner.hpp" namespace symboling { class MyQLearner : public QLearner { public: MyQLearner() : QLearner("{type: "MyType", ...}") {} // Considers the current state and action to evaluate the next state and reward accordingly. virtual void doAction(JSON action, unsigned int time, wjson::Value& state, double& reward) { ../.. } }; }
Parameters:
Name | Type | Description |
---|---|---|
parameters |
JSON | String | The Qlearner parameters. |
Methods
getParameters() → {JSON}
- Description:
Returns the QLearner parameters.
Returns:
The QLearner parameters.
- Type
- JSON
setParameters(parameters) → {QLearner}
- Description:
Modifies some QLearner parameters.
- All parameters but the
state_type
andaction_type
can be modified. - The Qtable is preserved when changing the parameters.
- All parameters but the
Parameters:
Name | Type | Description |
---|---|---|
parameters |
JSON | The QLearner parameters. |
Returns:
This value allowing to chain methods.
- Type
- QLearner
getQtable() → {ScalarField}
- Description:
Returns the Qtable.
Returns:
A field of format, as proposed in ScalarField::getValues().:
[
{
input: {
state: … # The chosen state corresponding to this reward, of type `state_type`.
action: … # The chosen action for this reward, of type `action_type`.
Additional items not taken into account for the field indexing:
reward: … # The reward resulting of the action and the related state, of type double.
time: … # The time step corresponding to this element of last modification of it,
of type unsigned integer, starting to 1 after the 1st action (time 0 is the initial state).
}
output: # The reward resulting of the action and the related state, of type double.
}
../..
]
- Type
- ScalarField
doAction(result)
- Description:
Defines the interaction with the environment.
- This methods is to be overloaded to implements a given reinforcement learning environment.
Parameters:
Name | Type | Description |
---|---|---|
result |
Value | The actual result with:
|
doNext() → {Value}
- Description:
Performs the next action.
- A typical usage could be of the form
// Memorizes the Q-Learner mechanism trajectory. std::Vector<wjson::Value> path; for(unsigned int t = 0; t < T; t++) { path.push_back(qlearning.doNext()); }
Returns:
A temporary data structure of the form of a Qtable element as detailed in getQtable().
- Type
- Value
doPath(max_pathopt, max_rewardopt) → {Value}
- Description:
Performs a sequence of actions with this QLearner.
Parameters:
Name | Type | Attributes | Default | Description |
---|---|---|---|---|
max_path |
unit |
<optional> |
0
|
The maximal path length, default is 0 (no bound); it can also be sepcified as the "max_path" QLearner parameter. |
max_reward |
double |
<optional> |
DBL_MAX
|
The maximal reward value, after which the goal is considered as attained. Default is DBL_MAX; it can also be sepcified as the "max_reward" QLearner parameter. |
Returns:
A temporary data structure of the form of a Qtable element's list, as detailed in getQtable().
- Type
- Value
(static) newQLearner(parameters, trajectory)
- Description:
Creates a Qlearning module from a Trajectory specification.
Parameters:
Name | Type | Description |
---|---|---|
parameters |
JSON | String | The Qlearner and related Trajectory parameters.
|
trajectory |
Trajectory | JSON | String | The trajectory specification. |
Returns:
A Qlearner with the objective to reach a trajectory goal. To be deleted after use.