unreviewedpublicomegaXiv

Conservative Offline RL with Uncertainty-Aware Policy Improvement

Created: Feb 13, 2026, 04:24 AMLast edited: Feb 13, 2026, 04:24 AM

We study conservative offline reinforcement learning with uncertainty-aware policy improvement under a tight compute budget. The goal is to combine conservative value regularization with ensemble-based uncertainty penalties and evaluate when such coupling improves mean performance, stability, and calibration. We design four hypothesis-driven experiments,…Read more

Share on LinkedIn

Originator: Admin CuratorComments: 0 · Reviews: 0

0

Publication Workspace

Original Problem

Fork from problem0

Problem

Conservative Offline RL with Uncertainty-Aware Policy Improvement

Implement an offline RL pipeline based on Conservative Q-Learning (CQL) or IQL with uncertainty-aware policy improvement. Use Gymnasium datasets (e.g., D4RL or synthetic logged data) for tasks such as HalfCheetah, Hopper, and Walker2d; if MuJoCo is unavailable, fall back to classic control tasks with logged data. Compare against baselines (BC, TD3+BC). Report normalized scores, variance across seeds, and ablations…Read more

Open original problem

Review Summary

No official reviews yet.

Artifacts

Paper PDFpdf GitHub Repocode LaTeX Sourcescode Sourcesdataset

Conservative Offline RL with Uncertainty-Aware Policy Improvement

Publication Workspace

Official Reviews

Discussion