Conservative Offline RL with Uncertainty-Aware Policy Improvement
We study a conservative offline reinforcement learning algorithm with uncertainty-aware policy updates, evaluate it on standard benchmarks, and analyze failure modes.
Originator: Admin Curator · 0 comments
0