Variables | |
const real | stepSize = (real)0.1 |
const real | eTraceTimeConstant = (real)0.1 |
const real | TDDiscountTimeConstant = (real)1.0 |
const real | valueFunctionLearningTimeConstant = (real)0.1 |
const real | policyLearningMultiplier = 5 |
const real | modelLearningTimeConstant = (real)0.001 |
const real | activeETraceThreshold = (real)0.01 |
const real | minActionSelectionProb = (real)0.0 |
const AgentArchitecture | agentArchitecture = RL |
const unsigned int | maxNumPlanningSteps = 10 |
const real | planningUncertaintyThreshold = (real)0.1 |
|
TDConnections with eligibility traces below this threshold are considered inactive, and their eligibility traces get set to zero.
Definition at line 135 of file Defines.h. Referenced by verve::ActiveTDConnectionList::decayETraces(). |
|
The default Agent architecture to use.
Definition at line 144 of file Defines.h. Referenced by verve::AgentDescriptor::AgentDescriptor(). |
|
Determines how fast eligibility traces change.
|
|
The maximum number of steps to take during a planning sequence.
Definition at line 147 of file Defines.h. Referenced by verve::AgentDescriptor::AgentDescriptor(). |
|
The minimum probability of choosing each action. This should be kept above zero to ensure that exploratory actions are never totally ignored. This value times the number of actions MUST be < 1. |
|
Determines the learning rate for the predictive model. This is how long it takes (in seconds) for errors to be reduced to 37% of their initial values. |
|
The maximum amount of estimated uncertainty to tolerate before ending a planning sequence.
Definition at line 151 of file Defines.h. Referenced by verve::AgentDescriptor::AgentDescriptor(). |
|
The policy's learning rate is a combination of this multiplier and the value function's learning rate.
|
|
Update step size.
|
|
Determines how much future rewards are discounted. For example, a discount time constant of 1 means that rewards received 1 second in the future are worth only 37% of what they are worth right now. |
|
Determines the learning rate for TDConnections in the value functions. This is how long it takes (in seconds) for errors to be reduced to 37% of their initial values. |