Pathway 2 — Algorithmic Paradigm Shifts

The thesis

A "true" paradigm shift = a dramatic architectural or training change that isn't just a smooth evolution of pretraining + fine-tuning + test-time scaling.

Why this pathway is the hardest to predict

By definition, paradigm shifts are unpredictable. The paper is up-front: most of the section is about evolutions of the current paradigm (which the authors say are insufficient for AGI alone but may stack toward it). True shifts get speculation only.

The current-paradigm "missing ingredients" the field is actively chasing

These are the smooth evolutions, not the shifts: - (Near-)unlimited context via recurrency, working memory, or retrieval - Continual learning without catastrophic forgetting - Robust decision-making in interactive environments (training models as agents) - Tool-augmented planning to offload subtasks to specialized engines (calculators, code interpreters, simulators) - World models — internal causal representations that let agents simulate futures and plan - Test-time scaling as a way to decouple capability from training-time scale

Candidate true paradigm shifts (admitted speculation)

Spiking neurons / neuromorphic hardware
Analog computing
RL-based pretraining replacing log-loss pretraining
Explicit world models replacing implicit ones
Architectures that overcome the complexity-theoretic limits of transformers (à la Neural Turing Machine attempts)

The interesting insight: test-time scaling is already a kind of paradigm shift

Test-time scaling decouples "intelligence at inference" from "training scale" — it's the move that makes recursive self-improvement via data possible (use test-time search to generate better training data; distill back into priors).

This is the AlphaZero pattern generalized: base model = prior, test-time search = improved policy, distillation = improved prior. The cycle.

Sub-shifts to watch

Mamba / S4 / linear-time sequence models — eliminate quadratic attention bottleneck
Retrieval-augmented generation — substitute brittle memorization with perfect external recall
Robust internal world models (Dreamer, MuZero, diffusion-based planning) — for grounded reasoning beyond the training distribution

Why "predicting beyond paradigm shifts is near-impossible"

The paper's framing here is itself useful: rather than trying to predict paradigm shifts, the recommendation is advance fundamental, paradigm-agnostic theoretical understanding of intelligence — the AIXI program from 05 - Universal AI (AIXI), informally — because that work survives any specific paradigm.

← Pathway 1 ↑ index Pathway 3 →