The use of estimation techniques on stochastic models to solve control problems is an emerging paradigm that falls under the rubric of Active Inference (AI) and Control as Inference (CAI). In this work, we use probability propagation on factor graphs to show that various algorithms proposed in the literature can be seen as specific composition rules in a factor graph. We show how this unified approach, presented both in probability space and in log of the probability space, provides a very general framework that includes the Sum-product, the Max-product, Dynamic programming and mixed Reward/Entropy criteria-based algorithms. The framework also expands algorithmic design options that lead to new smoother or sharper policy distributions. We propose original recursions such as: a generalized Sum/Max-product algorithm, a Smooth Dynamic programming algorithm and a modified versions of the Reward/Entropy algorithm. The discussion is carried over with reference to a path planning problem where the recursions that arise from various cost functions, although they may appear similar in scope, bear noticeable differences. We provide a comprehensive table of composition rules and a comparison through simulations, first on a synthetic small grid with a single goal with obstacles, and then on a grid extrapolated from a real-world scene with multiple goals and a semantic map.
A Unifying View of Estimation and Control Using Belief Propagation With Application to Path Planning
Palmieri, F.;Di Gennaro, G;
2022
Abstract
The use of estimation techniques on stochastic models to solve control problems is an emerging paradigm that falls under the rubric of Active Inference (AI) and Control as Inference (CAI). In this work, we use probability propagation on factor graphs to show that various algorithms proposed in the literature can be seen as specific composition rules in a factor graph. We show how this unified approach, presented both in probability space and in log of the probability space, provides a very general framework that includes the Sum-product, the Max-product, Dynamic programming and mixed Reward/Entropy criteria-based algorithms. The framework also expands algorithmic design options that lead to new smoother or sharper policy distributions. We propose original recursions such as: a generalized Sum/Max-product algorithm, a Smooth Dynamic programming algorithm and a modified versions of the Reward/Entropy algorithm. The discussion is carried over with reference to a path planning problem where the recursions that arise from various cost functions, although they may appear similar in scope, bear noticeable differences. We provide a comprehensive table of composition rules and a comparison through simulations, first on a synthetic small grid with a single goal with obstacles, and then on a grid extrapolated from a real-world scene with multiple goals and a semantic map.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.