#include <Agent.h>
Public Member Functions | |
VERVE_DECL | Agent (const AgentDescriptor &desc) |
virtual VERVE_DECL | ~Agent () |
virtual VERVE_DECL void VERVE_CALL | destroy () |
virtual VERVE_DECL void VERVE_CALL | resetShortTermMemory () |
virtual VERVE_DECL unsigned int VERVE_CALL | update (real reinforcement, const Observation &obs, real dt) |
virtual VERVE_DECL unsigned int VERVE_CALL | getNumDiscreteSensors () const |
virtual VERVE_DECL unsigned int VERVE_CALL | getNumContinuousSensors () const |
virtual VERVE_DECL void VERVE_CALL | setETraceTimeConstant (real timeConstant) |
virtual VERVE_DECL void VERVE_CALL | setTDDiscountTimeConstant (real timeConstant) |
virtual VERVE_DECL void VERVE_CALL | setTDLearningRate (real valueFunctionTimeConstant, real policyLearningMultiplier) |
virtual VERVE_DECL void VERVE_CALL | setModelLearningRate (real timeConstant) |
virtual VERVE_DECL void VERVE_CALL | setLearningEnabled (bool enabled) |
virtual VERVE_DECL long unsigned int VERVE_CALL | getAge () const |
virtual VERVE_DECL std::string VERVE_CALL | getAgeString () const |
virtual VERVE_DECL real VERVE_CALL | getTDError () const |
virtual VERVE_DECL real VERVE_CALL | getModelMSE () const |
virtual VERVE_DECL unsigned int VERVE_CALL | getLastPlanLength () const |
virtual VERVE_DECL real VERVE_CALL | computeValueEstimation (const Observation &obs) |
virtual VERVE_DECL const AgentDescriptor *VERVE_CALL | getDescriptor () const |
virtual VERVE_DECL void VERVE_CALL | saveValueData (unsigned int continuousResolution, const std::string &filename="") |
virtual VERVE_DECL void VERVE_CALL | saveStateRBFData (const std::string &filename="") |
Protected Member Functions | |
void | setStepSize (real value) |
unsigned int | planningSequence (const Observation &predCurrentObs, real predCurrentReward, real currentUncertainty) |
void | incrementAge () |
Protected Attributes | |
AgentDescriptor | mDescriptor |
RLModule * | mRLModule |
PredictiveModel * | mPredictiveModel |
bool | mFirstStep |
unsigned int | mActionIndex |
Observation | mActualPrevObs |
Observation | mPredCurrentObs |
Observation | mTempPlanningObs |
bool | mLearningEnabled |
real | mStepSize |
long unsigned int | mAgeHours |
unsigned int | mAgeMinutes |
real | mAgeSeconds |
unsigned int | mLastPlanningSequenceLength |
Definition at line 39 of file Agent.h.
|
Creates an Agent using the given AgentDescriptor. Adds initial noise to the trainable weights. Never use this to create an Agent dynamically (i.e. never call "new Agent"). Instead, use the global factory functions. This ensures that memory is allocated from the correct heap. Definition at line 45 of file Agent.cpp. References verve::CURIOUS_MODEL_RL, verve::AgentDescriptor::getArchitecture(), verve::AgentDescriptor::getNumOutputs(), verve::Observation::init(), verve::AgentDescriptor::isDynamicRBFEnabled(), mActualPrevObs, verve::MODEL_RL, mPredCurrentObs, mPredictiveModel, mRLModule, mTempPlanningObs, and verve::RL. |
|
Note that "delete Agent" should never be called on a dynamically-allocated Agent. Instead, use the destroy function. This ensures that memory is deallocated from the correct heap. Definition at line 95 of file Agent.cpp. References mPredictiveModel, and mRLModule. |
|
Computes and returns the value estimation for the given observation. This should not be performed regularly as it is fairly expensive. Definition at line 610 of file Agent.cpp. References verve::RLModule::computeValueEstimation(), and mRLModule. |
|
Deallocates a dynamically-allocated Agent. Use this instead of "delete Agent" to ensure that memory is deallocated from the correct heap. |
|
Returns the age of the Agent in seconds. Note that the age is only incremented when the Agent is learning. Definition at line 620 of file Agent.cpp. References mAgeHours, mAgeMinutes, and mAgeSeconds. |
|
Returns the age of the Agent as a string containing the hours, minutes, and seconds. Note that the age is only incremented when the Agent is learning. Definition at line 626 of file Agent.cpp. References mAgeHours, mAgeMinutes, and mAgeSeconds. |
|
Returns a pointer to the Agent's descriptor.
Definition at line 615 of file Agent.cpp. References mDescriptor. Referenced by verve::Observation::init(). |
|
Returns the length of the most recent planning sequence (in number of steps).
Definition at line 650 of file Agent.cpp. References mLastPlanningSequenceLength. |
|
Returns the most recent mean squared error from the predictive model. Returns zero if this Agent was not constructed with a predictive model. Definition at line 638 of file Agent.cpp. References verve::PredictiveModel::getPredictionMSE(), and mPredictiveModel. |
|
Returns the number of continuous sensors.
Definition at line 563 of file Agent.cpp. References verve::AgentDescriptor::getNumContinuousSensors(), and mDescriptor. |
|
Returns the number of discrete sensors.
Definition at line 558 of file Agent.cpp. References verve::AgentDescriptor::getNumDiscreteSensors(), and mDescriptor. |
|
Returns the most recent TD error.
Definition at line 633 of file Agent.cpp. References verve::RLModule::getTDError(), and mRLModule. |
|
Increases the Agent's age by one time step.
Definition at line 666 of file Agent.cpp. References mAgeHours, mAgeMinutes, mAgeSeconds, and mStepSize. Referenced by update(). |
|
Performs a single planning sequence which trains the RLModule. This proceeds until either the prediciton uncertainty is too high or the sequence length is too long. Returns the length of the planning sequence. Definition at line 253 of file Agent.cpp. References verve::Observation::copyInputData(), verve::CURIOUS_MODEL_RL, verve::AgentDescriptor::getArchitecture(), verve::AgentDescriptor::getMaxNumPlanningSteps(), verve::AgentDescriptor::getPlanningUncertaintyThreshold(), mDescriptor, mPredictiveModel, mRLModule, mTempPlanningObs, verve::PredictiveModel::predict(), verve::RLModule::resetShortTermMemory(), and verve::RLModule::update(). Referenced by update(). |
|
Resets temporary dynamics without affecting learned parameters.
Definition at line 109 of file Agent.cpp. References mActionIndex, mActualPrevObs, mFirstStep, mLastPlanningSequenceLength, mPredCurrentObs, mPredictiveModel, mRLModule, mTempPlanningObs, verve::PredictiveModel::resetShortTermMemory(), verve::RLModule::resetShortTermMemory(), and verve::Observation::zeroInputData(). |
|
Outputs a data file containing the position of all RBFs in the state representation, including discrete and continuous data. Passing in an empty filename string will automatically generate a unique filename and save the file in the current working directory. This does nothing if the Agent uses no inputs. Definition at line 661 of file Agent.cpp. References mRLModule, and verve::RLModule::saveStateRBFData(). |
|
Outputs a data file containing estimated values for every possible state. The 'resolution' parameter determines how many values to check along each continuous input dimension. Passing in an empty filename string will automatically generate a unique filename and save the file in the current working directory. This does nothing if the Agent uses no inputs. The output file format is: First line: the number of distinct values along each input dimension All other lines: the inputs in each dimension and the value of the corresponding state. Definition at line 655 of file Agent.cpp. References mRLModule, and verve::RLModule::saveValueData(). |
|
Sets how fast the eligibility traces will decay. The time constant must be greater than zero. Definition at line 580 of file Agent.cpp. References mRLModule, mStepSize, and verve::RLModule::setETraceTimeConstant(). |
|
Enables and disables learning. Once the Agent performs adequately, learning can be disabled to improve runtime performance. Definition at line 605 of file Agent.cpp. References mLearningEnabled. |
|
Sets the learning rate for the predictive model. The time constant (which must be greater than zero) specifies how many seconds it takes for the prediction errors to be reduced to 37% of their initial values. This does nothing if this Agent was not constructed with a predictive model. Definition at line 597 of file Agent.cpp. References mPredictiveModel, mStepSize, and verve::PredictiveModel::setDeltaLearningRate(). |
|
Sets the size of the time steps used during each simulation step.
Definition at line 568 of file Agent.cpp. References verve::PredictiveModel::changeStepSize(), verve::RLModule::changeStepSize(), mPredictiveModel, mRLModule, and mStepSize. Referenced by update(). |
|
Sets how much future rewards are discounted. The time constant must be greater than zero. Definition at line 585 of file Agent.cpp. References mRLModule, mStepSize, and verve::RLModule::setTDDiscountTimeConstant(). |
|
Sets the TD learning rate for the value function and policy. The time constant (which must be greater than zero) specifies how many seconds it takes for the value function's prediction errors to be reduced to 37% of their initial values. The policy learning multiplier combined with the value function's learning rate determines the policy's learning rate (the multiplier usually ranges from 1-100). Definition at line 590 of file Agent.cpp. References mRLModule, mStepSize, and verve::RLModule::setTDLearningRate(). |
|
Gives the Agent reinforcement for the current state, the current observation (i.e. sensory input data from the current state), and how much time has elapsed since the previous update. Allows the Agent to learn (if learning is enabled). Returns the index of the action to perform. The reward value must be within the range [-1, 1] (this is to ensure that the reward magnitude is not used to affect the TD learning rate). It is best to pass in the same dt each time this is called (when the dt changes between successive calls, several things must be recomputed internally). Definition at line 125 of file Agent.cpp. References verve::AgentDescriptor::getArchitecture(), incrementAge(), mActionIndex, mActualPrevObs, mDescriptor, mFirstStep, mLastPlanningSequenceLength, mLearningEnabled, mPredCurrentObs, mPredictiveModel, mRLModule, mStepSize, planningSequence(), verve::PredictiveModel::predictAndTrain(), verve::RL, setStepSize(), verve::RLModule::update(), and verve::RLModule::updatePolicyOnly(). |
|
A stored copy of the most recent action index.
Definition at line 217 of file Agent.h. Referenced by resetShortTermMemory(), and update(). |
|
A copy of the previous actual Observation. This must be stored across time steps. Definition at line 221 of file Agent.h. Referenced by Agent(), resetShortTermMemory(), and update(). |
|
The hours component of the Agent's age.
Definition at line 238 of file Agent.h. Referenced by getAge(), getAgeString(), and incrementAge(). |
|
The minutes component of the Agent's age.
Definition at line 241 of file Agent.h. Referenced by getAge(), getAgeString(), and incrementAge(). |
|
The seconds component of the Agent's age.
Definition at line 244 of file Agent.h. Referenced by getAge(), getAgeString(), and incrementAge(). |
|
A saved copy of the AgentDescriptor used to create this Agent.
Definition at line 203 of file Agent.h. Referenced by getDescriptor(), getNumContinuousSensors(), getNumDiscreteSensors(), planningSequence(), and update(). |
|
Used to handle the first step differently. This is necessary because the Agent is trained using the state representation from the previous step. Definition at line 214 of file Agent.h. Referenced by resetShortTermMemory(), and update(). |
|
The length of the most recent planning sequence (in number of steps).
Definition at line 248 of file Agent.h. Referenced by getLastPlanLength(), resetShortTermMemory(), and update(). |
|
Determines whether the Agent learns.
Definition at line 232 of file Agent.h. Referenced by setLearningEnabled(), and update(). |
|
An allocated Observation, mainly used for convenience. This does not need to be valid across time steps. Definition at line 225 of file Agent.h. Referenced by Agent(), resetShortTermMemory(), and update(). |
|
The predictive model component.
Definition at line 209 of file Agent.h. Referenced by Agent(), getModelMSE(), planningSequence(), resetShortTermMemory(), setModelLearningRate(), setStepSize(), update(), and ~Agent(). |
|
The main reinforcement learning component.
Definition at line 206 of file Agent.h. Referenced by Agent(), computeValueEstimation(), getTDError(), planningSequence(), resetShortTermMemory(), saveStateRBFData(), saveValueData(), setETraceTimeConstant(), setStepSize(), setTDDiscountTimeConstant(), setTDLearningRate(), update(), and ~Agent(). |
|
The current step size being used.
Definition at line 235 of file Agent.h. Referenced by incrementAge(), setETraceTimeConstant(), setModelLearningRate(), setStepSize(), setTDDiscountTimeConstant(), setTDLearningRate(), and update(). |
|
An allocated Observation, mainly used for convenience. This does not need to be valid across time steps. Definition at line 229 of file Agent.h. Referenced by Agent(), planningSequence(), and resetShortTermMemory(). |