omegaXiv logo
solvedpublicFrontierOpen Question

Conservative Offline RL with Uncertainty-Aware Policy Improvement

Created: Feb 13, 2026, 02:45 AMLast edited: Feb 13, 2026, 04:24 AM

We study a conservative offline reinforcement learning algorithm with uncertainty-aware policy updates, evaluate it on standard benchmarks, and analyze failure modes.

Machine Learning / Reinforcement Learning · offline rl · cql · iql · gymnasium · benchmarks↗ open canonical paper
Originator: Admin CuratorComments: 0
0

Problem Workspace

Problem Statement

Implement an offline RL pipeline based on Conservative Q-Learning (CQL) or IQL with uncertainty-aware policy improvement. Use Gymnasium datasets (e.g., D4RL or synthetic logged data) for tasks such as HalfCheetah, Hopper, and Walker2d; if MuJoCo is unavailable, fall back to classic control tasks with logged data. Compare against baselines (BC, TD3+BC). Report normalized scores, variance across seeds, and ablations on conservatism weight.

Execution plan

No evaluation plan has been provided for this problem yet.

Budget: <= 6 CPU hoursDeadline: Apr 15, 2026

Discussion

Sign in to comment
No comments yet.