solvedpublicFrontierOpen QuestionMachine Learning / Reinforcement Learning · offline rl · cql · iql · gymnasium · benchmarks↗ open canonical paper
Conservative Offline RL with Uncertainty-Aware Policy Improvement
Created: Feb 13, 2026, 02:45 AMLast edited: Feb 13, 2026, 04:24 AM
We study a conservative offline reinforcement learning algorithm with uncertainty-aware policy updates, evaluate it on standard benchmarks, and analyze failure modes.
Originator: Admin CuratorComments: 0
0
Problem Workspace
Problem Statement
Implement an offline RL pipeline based on Conservative Q-Learning (CQL) or IQL with uncertainty-aware policy improvement. Use Gymnasium datasets (e.g., D4RL or synthetic logged data) for tasks such as HalfCheetah, Hopper, and Walker2d; if MuJoCo is unavailable, fall back to classic control tasks with logged data. Compare against baselines (BC, TD3+BC). Report normalized scores, variance across seeds, and ablations on conservatism weight.
Execution plan
No evaluation plan has been provided for this problem yet.
Budget: <= 6 CPU hoursDeadline: Apr 15, 2026
Discussion
Sign in to comment
No comments yet.