verve::RLModule Class Reference

An RLModule learns from reinforcements to improve its action selection in order to increase its future reinforcement intake. More...

#include <RLModule.h>

List of all members.

Public Member Functions

VERVE_DECL RLModule (const Observation &obs, bool isDynamicRBFEnabled, unsigned int numActions)
virtual VERVE_DECL ~RLModule ()
virtual VERVE_DECL void VERVE_CALL resetShortTermMemory ()
virtual VERVE_DECL unsigned
int VERVE_CALL 
update (const Observation &obs, real reinforcement)
virtual VERVE_DECL unsigned
int VERVE_CALL 
updatePolicyOnly (const Observation &obs)
virtual VERVE_DECL void VERVE_CALL changeStepSize (real newValue)
virtual VERVE_DECL void VERVE_CALL setETraceTimeConstant (real timeConstant, real stepSize)
virtual VERVE_DECL void VERVE_CALL setTDDiscountTimeConstant (real timeConstant, real stepSize)
virtual VERVE_DECL void VERVE_CALL setTDLearningRate (real valueFunctionTimeConstant, real policyLearningMultiplier, real stepSize)
virtual VERVE_DECL real VERVE_CALL getTDError ()
virtual VERVE_DECL void VERVE_CALL resetState (const Observation &obs)
virtual VERVE_DECL real VERVE_CALL computeValueEstimation (const Observation &obs)
virtual VERVE_DECL void VERVE_CALL saveValueData (unsigned int continuousResolution, const std::string &filename="")
virtual VERVE_DECL void VERVE_CALL saveStateRBFData (const std::string &filename="")

Protected Member Functions

void updateActiveTDConnectionList ()
void trainTDRule ()
real updateCriticOutput ()
unsigned int updateActorOutput ()

Protected Attributes

RBFInputData mLatestInputData
RBFPopulationmStateRepresentation
WinnerTakeAllPopulationmActorPopulation
PopulationmCriticPopulation
std::vector< Population * > mAllPopulations
ActiveTDConnectionList mActiveValueFunctionTDConnections
ActiveTDConnectionList mActivePolicyTDConnections
bool mFirstStep
real mTDError
real mOldValueEstimation
real mNewValueEstimation
real mETraceTimeConstant
real mTDDiscountTimeConstant
real mTDDiscountFactor
real mValueFunctionLearningTimeConstant
real mValueFunctionLearningFactor
real mPolicyLearningMultiplier


Detailed Description

An RLModule learns from reinforcements to improve its action selection in order to increase its future reinforcement intake.

Definition at line 41 of file RLModule.h.


Constructor & Destructor Documentation

verve::RLModule::RLModule const Observation obs,
bool  isDynamicRBFEnabled,
unsigned int  numActions
 

Sets up the RLModule to work with the given type of Observation.

Applies initial noise to trainable Connection weights.

Definition at line 38 of file RLModule.cpp.

References verve::RBFPopulation::computeMaxActivationSum(), verve::Population::init(), verve::UltraSparseCodePopulation::init(), verve::RBFPopulation::init(), verve::RBFInputData::init(), mActorPopulation, mAllPopulations, mCriticPopulation, mETraceTimeConstant, mFirstStep, mLatestInputData, mNewValueEstimation, mOldValueEstimation, mPolicyLearningMultiplier, mStateRepresentation, mTDDiscountFactor, mTDDiscountTimeConstant, mTDError, mValueFunctionLearningFactor, mValueFunctionLearningTimeConstant, verve::POLICY_TDCONNECTION, verve::Population::projectTD(), verve::VALUE_FUNCTION_TDCONNECTION, and verve::WEIGHTS_NEAR_0.

verve::RLModule::~RLModule  )  [virtual]
 

Definition at line 85 of file RLModule.cpp.

References mAllPopulations.


Member Function Documentation

void verve::RLModule::changeStepSize real  newValue  )  [virtual]
 

Updates all step size-dependent factors using the new step size.

Definition at line 227 of file RLModule.cpp.

References mETraceTimeConstant, mPolicyLearningMultiplier, mTDDiscountTimeConstant, mValueFunctionLearningTimeConstant, setETraceTimeConstant(), setTDDiscountTimeConstant(), and setTDLearningRate().

Referenced by verve::Agent::setStepSize().

real verve::RLModule::computeValueEstimation const Observation obs  )  [virtual]
 

Computes and returns the value estimation for the given observation.

This should not be performed regularly as it is fairly expensive.

Definition at line 296 of file RLModule.cpp.

References verve::RBFInputData::init(), mLatestInputData, mStateRepresentation, updateCriticOutput(), and verve::RBFPopulation::updateFiringRatesRBF().

Referenced by verve::Agent::computeValueEstimation().

real verve::RLModule::getTDError  )  [virtual]
 

Returns the current TD error.

Definition at line 281 of file RLModule.cpp.

References mTDError.

Referenced by verve::Agent::getTDError().

void verve::RLModule::resetShortTermMemory  )  [virtual]
 

Resets temporary dynamics without affecting learned parameters.

Definition at line 96 of file RLModule.cpp.

References mAllPopulations, mLatestInputData, and verve::RBFInputData::zeroInputData().

Referenced by verve::Agent::planningSequence(), and verve::Agent::resetShortTermMemory().

void verve::RLModule::resetState const Observation obs  )  [virtual]
 

"Resets" the internal state representation to represent the given Observation.

This is useful for situations when outside functions (like planning) must change the state temporarily, then reset it back to how it was before. Note: This function will not let the RBF state representation dynamically allocate new RBFs.

Definition at line 286 of file RLModule.cpp.

References verve::RBFInputData::copyInputData(), verve::Observation::getContinuousInputData(), verve::Observation::getDiscreteInputData(), mLatestInputData, mStateRepresentation, and verve::RBFPopulation::updateFiringRatesRBF().

void verve::RLModule::saveStateRBFData const std::string &  filename = ""  )  [virtual]
 

Outputs a data file containing the position of all RBFs in the state representation, including discrete and continuous data.

Passing in an empty filename string will automatically generate a unique filename and save the file in the current working directory. This does nothing if the Agent uses no inputs.

Definition at line 402 of file RLModule.cpp.

References mLatestInputData, verve::RBFInputData::numContInputs, and verve::RBFInputData::numDiscInputs.

Referenced by verve::Agent::saveStateRBFData().

void verve::RLModule::saveValueData unsigned int  continuousResolution,
const std::string &  filename = ""
[virtual]
 

Outputs a data file containing estimated values for every possible state.

The 'resolution' parameter determines how many values to check along each continuous input dimension. Passing in an empty filename string will automatically generate a unique filename and save the file in the current working directory. This does nothing if the Agent uses no inputs. The output file format is: First line: the number of distinct values along each input dimension All other lines: the inputs in each dimension and the value of the corresponding state.

Definition at line 320 of file RLModule.cpp.

References mLatestInputData, verve::RBFInputData::numContInputs, and verve::RBFInputData::numDiscInputs.

Referenced by verve::Agent::saveValueData().

void verve::RLModule::setETraceTimeConstant real  timeConstant,
real  stepSize
[virtual]
 

Sets how fast the eligibility traces will decay.

The time constant must be greater than zero. Automatically updates the eligibility trace decay factors in all TDConnections.

Definition at line 235 of file RLModule.cpp.

References verve::globals::calcDecayConstant(), mETraceTimeConstant, mStateRepresentation, and verve::Population::setPostETraceDecayFactors().

Referenced by changeStepSize(), and verve::Agent::setETraceTimeConstant().

void verve::RLModule::setTDDiscountTimeConstant real  timeConstant,
real  stepSize
[virtual]
 

Sets how much future rewards are discounted.

The time constant must be greater than zero. Automatically updates the discount factors in all TDConnections.

Definition at line 248 of file RLModule.cpp.

References verve::globals::calcDecayConstant(), mStateRepresentation, mTDDiscountFactor, mTDDiscountTimeConstant, and verve::Population::setPostTDDiscountFactors().

Referenced by changeStepSize(), and verve::Agent::setTDDiscountTimeConstant().

void verve::RLModule::setTDLearningRate real  valueFunctionTimeConstant,
real  policyLearningMultiplier,
real  stepSize
[virtual]
 

Sets the TD learning rate for the value function and policy.

The time constant (which must be greater than zero) specifies how many seconds it takes for the value function's prediction errors to be reduced to 37% of their initial values. The policy learning multiplier combined with the value function's learning rate determines the policy's learning rate (the multiplier usually ranges from 1-100).

Definition at line 261 of file RLModule.cpp.

References verve::globals::calcDecayConstant(), verve::RBFPopulation::computeMaxActivationSum(), mPolicyLearningMultiplier, mStateRepresentation, mValueFunctionLearningFactor, and mValueFunctionLearningTimeConstant.

Referenced by changeStepSize(), and verve::Agent::setTDLearningRate().

void verve::RLModule::trainTDRule  )  [protected]
 

Trains the active TDConnections.

Definition at line 507 of file RLModule.cpp.

References mActivePolicyTDConnections, mActiveValueFunctionTDConnections, mPolicyLearningMultiplier, mTDError, mValueFunctionLearningFactor, and verve::ActiveTDConnectionList::trainConnections().

Referenced by update().

unsigned int verve::RLModule::update const Observation obs,
real  reinforcement
[virtual]
 

Updates and trains the RLModule using the given reinforcement and corresponding Observation.

Returns the index of the next action to perform. The reward value SHOULD be within the range [-1, 1], but we can be lenient since the predictive model might output invalid reward values.

Definition at line 116 of file RLModule.cpp.

References verve::RBFInputData::copyInputData(), verve::ActiveTDConnectionList::decayETraces(), verve::Observation::getContinuousInputData(), verve::Observation::getDiscreteInputData(), verve::ActiveTDConnectionList::increaseETraces(), mActivePolicyTDConnections, mActiveValueFunctionTDConnections, mFirstStep, mLatestInputData, mNewValueEstimation, mOldValueEstimation, mStateRepresentation, mTDDiscountFactor, mTDError, trainTDRule(), updateActiveTDConnectionList(), updateActorOutput(), updateCriticOutput(), and verve::RBFPopulation::updateFiringRatesRBF().

Referenced by verve::Agent::planningSequence(), and verve::Agent::update().

void verve::RLModule::updateActiveTDConnectionList  )  [protected]
 

Adds new active TDConnections to the ActiveTDConnectionList.

Definition at line 452 of file RLModule.cpp.

References verve::RBFPopulation::getActiveNeuron(), verve::RBFPopulation::getNumActiveNeurons(), verve::Neuron::getNumAxons(), and mStateRepresentation.

Referenced by update().

unsigned int verve::RLModule::updateActorOutput  )  [protected]
 

Updates the actor's output and returns the next action index.

Definition at line 522 of file RLModule.cpp.

References verve::UltraSparseCodePopulation::getActiveOutput(), mActorPopulation, and verve::WinnerTakeAllPopulation::updateFiringRatesWTA().

Referenced by update(), and updatePolicyOnly().

real verve::RLModule::updateCriticOutput  )  [protected]
 

Updates the critic's output (i.e.

the value estimation) and returns the current value estimation.

Definition at line 516 of file RLModule.cpp.

References verve::Neuron::getFiringRate(), verve::Population::getNeuron(), mCriticPopulation, and verve::Population::updateFiringRatesLinear().

Referenced by computeValueEstimation(), and update().

unsigned int verve::RLModule::updatePolicyOnly const Observation obs  )  [virtual]
 

Returns the index of the next action to perform based on the given Observation.

Definition at line 212 of file RLModule.cpp.

References verve::RBFInputData::copyInputData(), verve::Observation::getContinuousInputData(), verve::Observation::getDiscreteInputData(), mLatestInputData, mStateRepresentation, updateActorOutput(), and verve::RBFPopulation::updateFiringRatesRBF().

Referenced by verve::Agent::update().


Member Data Documentation

ActiveTDConnectionList verve::RLModule::mActivePolicyTDConnections [protected]
 

A dynamic list of the active policy TDConnections.

Definition at line 171 of file RLModule.h.

Referenced by trainTDRule(), and update().

ActiveTDConnectionList verve::RLModule::mActiveValueFunctionTDConnections [protected]
 

A dynamic list of the active value function TDConnections.

Definition at line 168 of file RLModule.h.

Referenced by trainTDRule(), and update().

WinnerTakeAllPopulation* verve::RLModule::mActorPopulation [protected]
 

The actor Population.

Definition at line 159 of file RLModule.h.

Referenced by RLModule(), and updateActorOutput().

std::vector<Population*> verve::RLModule::mAllPopulations [protected]
 

A list of all Populations.

Definition at line 165 of file RLModule.h.

Referenced by resetShortTermMemory(), RLModule(), and ~RLModule().

Population* verve::RLModule::mCriticPopulation [protected]
 

The critic Population.

Definition at line 162 of file RLModule.h.

Referenced by RLModule(), and updateCriticOutput().

real verve::RLModule::mETraceTimeConstant [protected]
 

Time constant that determines the rate of eligibility trace decay.

Definition at line 191 of file RLModule.h.

Referenced by changeStepSize(), RLModule(), and setETraceTimeConstant().

bool verve::RLModule::mFirstStep [protected]
 

Used to handle the first step differently.

This is necessary because temporal difference learning compares values between two steps, which isn't possible on the first step.

Definition at line 176 of file RLModule.h.

Referenced by RLModule(), and update().

RBFInputData verve::RLModule::mLatestInputData [protected]
 

The most recent sensory input data.

We store a copy of this so that we can, for example, compute the estimated value of a user-supplied Observation, then set the state representation back to the actual latest input data before the next update.

Definition at line 153 of file RLModule.h.

Referenced by computeValueEstimation(), resetShortTermMemory(), resetState(), RLModule(), saveStateRBFData(), saveValueData(), update(), and updatePolicyOnly().

real verve::RLModule::mNewValueEstimation [protected]
 

The value estimation of the state based on the new sensory inputs.

Definition at line 187 of file RLModule.h.

Referenced by RLModule(), and update().

real verve::RLModule::mOldValueEstimation [protected]
 

The value estimation of the state based on the old sensory inputs.

Definition at line 183 of file RLModule.h.

Referenced by RLModule(), and update().

real verve::RLModule::mPolicyLearningMultiplier [protected]
 

The policy's learning rate is a combination of this multiplier and the value function's learning rate.

Definition at line 210 of file RLModule.h.

Referenced by changeStepSize(), RLModule(), setTDLearningRate(), and trainTDRule().

RBFPopulation* verve::RLModule::mStateRepresentation [protected]
 

The state representation Population.

Definition at line 156 of file RLModule.h.

Referenced by computeValueEstimation(), resetState(), RLModule(), setETraceTimeConstant(), setTDDiscountTimeConstant(), setTDLearningRate(), update(), updateActiveTDConnectionList(), and updatePolicyOnly().

real verve::RLModule::mTDDiscountFactor [protected]
 

A precomputed discount factor using the TD discount time constant and step size.

Definition at line 198 of file RLModule.h.

Referenced by RLModule(), setTDDiscountTimeConstant(), and update().

real verve::RLModule::mTDDiscountTimeConstant [protected]
 

Time constant that determines the TD discount rate.

Definition at line 194 of file RLModule.h.

Referenced by changeStepSize(), RLModule(), and setTDDiscountTimeConstant().

real verve::RLModule::mTDError [protected]
 

The current TD error signal.

Definition at line 179 of file RLModule.h.

Referenced by getTDError(), RLModule(), trainTDRule(), and update().

real verve::RLModule::mValueFunctionLearningFactor [protected]
 

A precomputed discount factor using the value function learning time constant and step size.

Definition at line 206 of file RLModule.h.

Referenced by RLModule(), setTDLearningRate(), and trainTDRule().

real verve::RLModule::mValueFunctionLearningTimeConstant [protected]
 

Time constant that determines the learning rate for the value function.

Definition at line 202 of file RLModule.h.

Referenced by changeStepSize(), RLModule(), and setTDLearningRate().


The documentation for this class was generated from the following files:
Generated on Tue Jan 24 21:46:39 2006 for Verve by  doxygen 1.4.6-NO