# Reinforcement Learning: An Introduction - BSTU Laboratory of

398 Pages · 2005 · 5.23 MB · English

Book 1.2 Examples 1.3 Elements of Reinforcement Learning 1.4 An Extended Example: Tic-Tac-Toe 1.5 Summary 1.6 History of Reinforcement Learning

Contents Reinforcement Learning: An Introduction Richard S Sutton and Andrew G Barto A Bradford Book The MIT Press Cambridge, Massachusetts London, England In memory of A Harry Klopf l Contents m Preface m Series Forward m Summary of Notation l I The Problem m 1 Introduction n 11 Reinforcement Learning
n 12 Examples n 13 Elements of Reinforcement Learning n 14 An Extended Example: TicTacToe n 15 Summary n 16 History of Reinforcement Learning n 17 Bibliographical Remarks m 2 Evaluative Feedback n 21 An Armed Bandit Problem n 22 ActionValue Methods n 23 Softmax Action Selection n 24 Evaluation Versus Instruction n 25 Incremental Implementation n 26 Tracking a Nonstationary Problem n 27 Optimistic Initial Values n 28 Reinforcement Comparison n 29 Pursuit Methods n 210 Associative Search n 211 Conclusions n 212 Bibliographical and Historical Remarks m 3 The Reinforcement Learning Problem n 31 The AgentEnvironment Interface n 32 Goals and Rewards n 33 Returns n 34 Unified Notation for Episodic and Continuing Tasks n 35 The Markov Property n 36 Markov Decision Processes n 37 Value Functions n 38 Optimal Value Functions n 39 Optimality and Approximation n 310 Summary n 311 Bibliographical and Historical Remarks l II Elementary Solution Methods m 4 Dynamic Programming n 41 Policy Evaluation n 42 Policy Improvement n 43 Policy Iteration n 44 Value Iteration n 45 Asynchronous Dynamic Programming n 46 Generalized Policy Iteration n 47 Efficiency of Dynamic Programming
n 48 Summary n 49 Bibliographical and Historical Remarks m 5 Monte Carlo Methods n 51 Monte Carlo Policy Evaluation n 52 Monte Carlo Estimation of Action Values n 53 Monte Carlo Control n 54 OnPolicy Monte Carlo Control n 55 Evaluating One Policy While Following Another n 56 OffPolicy Monte Carlo Control n 57 Incremental Implementation n 58 Summary n 59 Bibliographical and Historical Remarks m 6 TemporalDifference Learning n 61 TD Prediction n

