Paper

Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned on-line using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Actions are chosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a co-operative multi-agent task. The product of experts model is found to perform comparably to table-based Q-learning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation. 1 Introduction Online Reinforcement Learning (RL) algorithms try to find a policy which maximizes the expected time-discounted reward provided by the environment. They do this by performing sample backups to lea...

http://www.cs.cmu.edu/Web/Groups/NIPS/00papers-pub-on-web/SallansHinton.ps.gzPublished 2000-01-01Paper link

Authors: Brian Sallans · Geoffrey E. Hinton

Topics

Relevant entities

People

Related coverage

Linked coverage will appear here.

Related events

Linked events will appear here.

Related discussions

Related discussion nodes will appear here.