Several key advantages of model-based planning over model-free methods are that models support generalization to states not previously experienced, help express the relationship between present actions and future rewards, and can resolve states which are aliased in value-based approximations. These advantages are especially pronounced in problems with complex and stochastic environmental dynamics, sparse rewards, and restricted trial-and-error experience. Yet even with an accurate model, planning is often very challenging because while a model can be used toevaluatea plan, it does not prescribe how toconstructa plan.

To explore efficiently, the first step to quantify uncertainty in value estimates so that the agent can judge potential benefits of exploratory actions. The neural network literature presents a sizable body of work on uncertainty quantification founded on parametric Bayesian inference [3, 7]. We actually found the simple non-parametric bootstrap with random initialization [5] more effective in our experiments, but the main ideas of this paper would apply with any other approach to uncertainty in DNNs.

Deep Exploration via Bootstrapped DQN

Bootstrapped DQN

results matching ""

No results matching ""