Constrained Reinforcement Learning: A New Approach
Reinforcement learning (RL) is a powerful technique, but its application in real-world environments, such as robotics and autonomous driving, requires particular attention to safety. Constrained Markov Decision Processes (CMDPs) are a tool for enforcing safety constraints while optimizing performance.
A recent study presents a new algorithm for CMDPs that addresses the limitations of existing methods, often characterized by significant safety violations or high sample complexity. The proposed primal-dual algorithm balances regret and constraint violations, drawing on techniques from online RL and constrained optimization.
Algorithm Details and Results
The algorithm was analyzed in two contexts: relaxed feasibility (where small violations are allowed) and strict feasibility (no violations allowed). The results demonstrate that, in the case of relaxed feasibility, the algorithm returns an ฮต-optimal policy with an ฮต-bounded violation, requiring a number of learning episodes of the order of $\tilde{O}\left(\frac{SAH^3}{\varepsilon^2}\right)$. In the case of strict feasibility, the algorithm guarantees an ฮต-optimal policy with zero violations, with a sample complexity of $\tilde{O}\left(\frac{SAH^5}{\varepsilon^2\zeta^2}\right)$, where ฮถ is a problem-dependent Slater constant.
These results suggest that learning CMDPs in an online setting can be comparable to learning with a generative model and is no more challenging than learning unconstrained MDPs when small violations are allowed.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!