Safe Offline Reinforcement Learning: A New Approach

Reinforcement learning (RL) is widely used in real-world applications, but often faces the need to balance reward maximization with safety constraints. A new study introduces a method to address this problem in the context of safe offline reinforcement learning, focusing on cumulative cost constraints.

Safety-Conditioned Reachability

The research defines a safety-conditioned reachability set, which separates reward maximization from cost constraints. This approach avoids the unstable optimizations typical of methods that handle hard constraints. The result is a safe offline RL algorithm that learns a safe policy from a fixed dataset, without direct interaction with the environment.

Performance and Real-World Applications

Experiments conducted on standard benchmarks and on a real-world maritime navigation use case demonstrate that the proposed method matches or outperforms existing solutions while maintaining safety. This makes it particularly interesting for applications where safety is a fundamental requirement.