Curiosity-Conditioned Goal-Optimal Reinforcement Learning (CCGO-RL)
We propose a reinforcement learning methodology that combines intrinsic novelty-driven exploration with explicit goal conditioning and extrinsic reward optimization. The core motivation is to avoid myopic exploitation while preserving path optimality for objective goals. The approach uses a dual-value design: one stream for task-optimal return and one for curiosity, with adaptive weighting that decays as goal confidence improves. We expect this to improve sample efficiency, robustness, and discovery of better trajectories in sparse- and deceptive-reward settings. The impact is a practical framework for agents that remain purpose-driven while retaining productive curiosity.
Problem Workspace
Problem Statement
This project develops a new RL algorithmic framework where intrinsic motivation is not a standalone objective but a controlled exploration mechanism inside a goal-conditioned optimal control loop. The method will encode goals explicitly (e.g., goal vectors or target-conditioned policies), estimate novelty through state-action visitation uncertainty or prediction error, and combine intrinsic and extrinsic signals through an adaptive scheduler. The scheduler must prioritize curiosity during uncertainty and shift toward strict extrinsic optimization as confidence and performance stabilize. Scope includes single-agent continuous and discrete control, sparse-reward navigation/manipulation-style environments, and deceptive local-optima tasks where exploration quality is decisive. Key constraints are preserving asymptotic task optimality, avoiding intrinsic-reward hacking, and preventing instability from non-stationary reward mixing. The design must remain compatible…Read more
Read less
Execution plan
Metrics: final success rate, normalized episodic return, sample efficiency (return vs environment steps), first-success time, state coverage, regret, and stability variance across seeds. Baselines: goal-conditioned RL without intrinsic reward, curiosity-only exploration methods, entropy-regularized RL, and a static weighted intrinsic+extrinsic combination. Data/splits: fixed train/validation/test environment seeds; evaluate on both seen and held-out seeds, plus held-out task variants with altered layouts/dynamics for robustness. Acceptance criteria: at least non-inferior final return to best goal-conditioned baseline, >=10-20% improvement in sample efficiency on sparse/deceptive tasks, statistically significant gains in first-success time and coverage across >=5 seeds, and no performance collapse when intrinsic reward is annealed to near zero.
Related work
- Goal-conditioned reinforcement learningLinkGoal-conditioned reinforcement learning
- Intrinsic motivation and curiosity-driven explorationLinkIntrinsic motivation and curiosity-driven exploration
- Count-based and prediction-error novelty methodsLinkCount-based and prediction-error novelty methods
- Maximum-entropy RLLinkMaximum-entropy RL
- Hierarchical RL for explorationLinkHierarchical RL for exploration