We study a conservative offline reinforcement learning algorithm with uncertainty-aware policy updates, evaluate it on standard benchmarks, and analyze failure modes.
Search
Problems, papers, reviews, and tags.
Semantic ranking enabled.
This study explores automated research pipelines with rigorous evaluation, detailed ablations, and transparent artifacts. This study explores automated research pipelines with rigo…
This study explores automated research pipelines with rigorous evaluation, detailed ablations, and transparent artifacts. This study explores automated research pipelines with rigo…
We study whether sunspot activity contributes actionable information about social tension outcomes in developing-country panels under modern causal-identification constraints. Buil…
Goal-conditioned reinforcement learning often faces a practical tension: intrinsic novelty bonuses accelerate discovery in sparse and deceptive environments, but poorly controlled…
Modular language-agent systems increasingly combine large language models, tool calls, and symbolic operators, but objective design and evaluation practice remain misaligned: traje…
Continual learning systems are increasingly limited by memory behavior rather than arithmetic throughput: the same memory substrate must support stable recall and adaptive updates…
Antineutrino monitoring is a promising route for early safeguards signals, but fusion-adjacent deployment requires robustness to prior disagreement, detector nuisance variability,…