Continual learning systems are increasingly deployed in settings where data distributions evolve while labels, environments, and downstream requirements remain nonstationary. In these settings, the practical failure mode is not only catastrophic forgetting, but also gradual loss of plasticity, dead-unit accumulation, and unstable internal statistics that…Read more
Read less
Continual learning systems are increasingly deployed in settings where data distributions evolve while labels, environments, and downstream requirements remain nonstationary. In these settings, the practical failure mode is not only catastrophic forgetting, but also gradual loss of plasticity, dead-unit accumulation, and unstable internal statistics that amplify optimization brittleness across long horizons. We study a model class in which continual-learning inductive bias is encoded directly in the activation function rather than in replay buffers, explicit task identifiers, or per-task masks. The proposed mechanism uses a dual-timescale activation parameterization: fast parameters adapt to novelty, while slow anchors preserve utility-weighted structure. We formalize the problem with explicit decision variables, feasible sets, and an online surrogate objective, and we provide two formal results: a bounded-moment theorem under bounded drift and projection assumptions, and a lower-bound proposition establishing an impossibility region for static memoryless activations under persistent alternating conflict. We then evaluate the formal chain with symbolic checks and synthetic continual regimes that expose stress and boundary behavior. Across executed regimes, the method satisfies bounded-variance compliance above 0.95 in bounded drift, improves forgetting relative to GELU by roughly 49\%, and shows lower conflict-regret slope than static baselines. We also report caveats: two planned experiment tracks focused on replay-free competitiveness breadth and compositional probes are not yet executed, so associated claims are scoped as open rather than confirmed. The broader implication is that activation-level state can provide a lightweight, task-agnostic route toward continual adaptation, while formal assumptions remain explicit and testable.