Scaling Hypothesis #2: Are Humans Just More Over-Parameterized?
There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are artificial neural nets smart in such stupid ways, and biological brains stupid but in smart ways? I propose a major change in deep learning scaling paradigms: the architectural differences between human brains and NNs (particularly LLMs) may be due to a bias-variance tradeoff, where LLMs minimize variance and human brains minimize bias. Human brains do this by deep double descent-style overparameterization, and adopting a scaling strategy of extremely high-learning-rate training of extremely overparameterized models on small diverse highly-filtered datasets. This approach would lead to sample-efficiently and compute-efficiently traveling (or catapulting) to a highly-generalizing human-like basin in the model loss landscape, while performing poorly up until the end and failing to memorize much data. If true, this would explain a number of odd stylized facts about how humans/NNs perform well/poorly. Such a 'catapulted LLM' would generalize much better than existing NNs, be immune to adversarial attacks, have better economics and be more resistant to cloning, could potentially enable extremely efficient MLP architectures, and by giving true generalization, provide a sturdy foundation for AI safety in the form of useful NNs which are aligned & safe for the right reasons. This could be feasibly tested by training multi-trillion-parameter models for relatively few steps at high cyclical learning rate schedules, and benchmarking adversarial and hard examples on tasks like arithmetic and small-image classification. Discuss