Reinforcement Learning and Stochastic Optimization

The new book by Warren B. Powell, Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions, is due to come out on 21 July, 2022.

Sequential decision problems, which consist of “decision, information, decision, information, …” are ubiquitous, spanning virtually every human activity ranging from business applications, health (personal and public health, and medical decision making), energy, the sciences, all fields of engineering, finance, and e-commerce. The diversity of applications attracted the attention of at least 15 distinct fields of research, using eight distinct notational systems which produced a vast array of analytical tools. A byproduct is that poweful tools developed in one community may be unknown to other communities.

Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions will offer a single canonical framework that can model any sequential decision problem using five core components: state variables, decision variables, exogenous information variables, transition function, and objective function. It highlights twelve types of uncertainty that might enter any model. It then pulls together the diverse set of methods for making decisions, known as policies, into four fundamental classes that span every method suggested in the academic literature or used in practice.

Reinforcement Learning and Stochastic Optimization is the first book to provide a balanced treatment of the different methods for modelling and solving sequential decision problems, following the style used by most books on machine learning, optimization, and simulation. The presentation is designed for readers with a course in probability and statistics and an interest in modelling and applications. Linear programming is occasionally used for specific problem classes. The book is designed for readers who are new to the field, as well as those with some background in optimization under uncertainty. Many sections are marked with * as an indication that they can be skipped on a first pass through the book.

Readers will find references to over 100 different applications, spanning pure learning problems, dynamic resource allocation problems, general state-dependent problems, and hybrid learning/resource allocation problems such as those that arose in the COVID pandemic. There are 370 exercises, organized into seven groups, ranging from review questions, modelling, computation, problem solving, theory, programming exercises, and a “diary problem” that a reader chooses at the beginning of the book, and which is used as a basis for questions throughout the rest of the book.

From the author:

Reinforcement Learning and Stochastic Optimization is the first book to offer a unified framework for modeling and solving sequential decision problems (decision, information, decision, information, …). The byproduct of 40 years of research at Princeton University, the framework spans 15 different fields that deal with making decisions under uncertainty, including reinforcement learning and stochastic optimization in addition to stochastic search, dynamic programming, (stochastic) optimal control, simulation-optimization, and approximate dynamic programming, as well as active learning problems such as the multiarmed bandit problem. This is modern decision analytics, which is the next generation of artificial intelligence.

The book uses a “model first, then solve” strategy, where we begin with a standard canonical modeling framework that captures any sequential decision problem. This leads to two challenges: modeling uncertainty, and the design of policies for making decisions. The presentation organizes the wide diversity of methods for making decisions into four fundamental classes called policy function approximations (PFAs), cost function approximations (CFAs), value function approximations (VFAs), and direct lookahead approximations (DLAs). These four classes are universal, and cover any method proposed in the literature or used in practice, bringing visibility to the contributions of different communities.

The book has a complete discussion of online learning models and algorithms that are used for approximating value functions, policies, response surfaces for stochastic search, and estimating system models. We cover both passive and active learning, and therefore provide a bridge from predictive to prescriptive analytics.

The presentation illustrates the universal framework with dozens of application domains, including transportation and logistics, energy systems, e-commerce, health, engineering, economics, finance, laboratory experimentation, and supply chain management. The book touches on over 100 different applications to motivate the modeling framework and demonstrate the handling of different types of decisions, including binary (optimal stopping, A/B testing), discrete choices (drugs, materials, products), continuous choices (inventories, prices, concentrations), as well as continuous or discrete vectors for complex resource allocation problems.

The book is designed for people coming from any of a wide range of application domains with an introductory course in probability and statistics who want to develop their own models and algorithms. There are selected topics that use linear programming, but the presentation does not require formal training in math programming. The material is supported by over 350 exercises organized into seven classes: review questions, modeling questions, computational exercises, problem solving questions, theory questions, questions that draw on a companion volume Sequential Decision Analytics and Modeling (which are accompanied by a library of python modules on github), and a “diary” problem chosen by the reader in chapter 1 that serves as a basis for questions from each chapter.

For more information, please visit the book webpage.

How to learn cryptography

Interview with Dr Joseph Simonian, Senior Investment Strategist at Scientific Beta

2023 Quant of the Year

Reinforcement Learning and Stochastic Optimization

Check out our other content

How to learn cryptography

Interview with Dr Joseph Simonian, Senior Investment Strategist at Scientific Beta

2023 Quant of the Year

How to learn cryptography

Interview with Dr Joseph Simonian, Senior Investment Strategist at Scientific Beta

2023 Quant of the Year

Takeaways from QuantMinds 2023 by Saeed Amen

An interview with a QDC alumnus: Al Thompson

Global Valuation: An Interview with Claudio Albanese

Most Popular Articles

How to learn cryptography

Interview with Dr Joseph Simonian, Senior Investment Strategist at Scientific Beta

2023 Quant of the Year

Takeaways from QuantMinds 2023 by Saeed Amen

An interview with a QDC alumnus: Al Thompson

Global Valuation: An Interview with Claudio Albanese

Machine Learning, Artificial Intelligence, and Inflation Trading: An Interview with Dariush Mirfendereski

Videos from CppCon 2021

Explore

Arts

Opinion

Living

Others

Follow us