Scientists have debated the origins of intelligence for centuries. A key question has been the nature of the learning machinery that, from birth, produces the powerful mental skills of humans and animals. Nativists argue that intelligence develops from innate domain-specific systems for learning about different kinds of things, such as objects, places, numbers and social partners. Empiricists argue that intelligence develops from domain-general systems, with domain-specific knowledge learned from experience. This fault line between nativism (nature) versus empiricism (nurture) has persisted, producing vigorous debates and hundreds of studies testing the perceptual and cognitive abilities of babies and newborn animals. But the debate is far from resolved. Nativists and empiricists are entrenched, with each camp pointing to different sources of evidence to support their position. In a recent issue of Nature Machine Intelligence, Orhan and Lake1 proposed a rigorous computational approach to tackling the nature–nurture debate and characterizing the core learning algorithms of visual intelligence.

When theorizing about the origins of intelligence, scientists have been forced to rely on their intuitions about what is learnable and what is not. Consider trying to understand how babies learn to see. Nativists intuit that a baby’s visual experiences are sparse, noisy and impoverished. This intuition leads to proposals of innate domain-specific knowledge systems to explain how babies ‘get so much from so little’ and develop abstract knowledge from ‘low-quality’ sensory data. Empiricists often have the opposite intuition, assuming that a baby’s visual experiences are richly structured. If babies learn from high-quality visual data, then domain-general algorithms might suffice for learning how to perceive and understand the world. To date, the nature–nurture debate largely stems from different intuitions about the nature of the experiences available for learning.

It is difficult to change a scientist’s mind by arguing that their intuitions are wrong. But scientists do change when shown compelling evidence. This is what Orhan and Lake1 provide in their elegant study linking the nature–nurture debate to artificial intelligence (AI). The authors start from first principles, embracing the reality that visual learning is complex. The brain is a high-dimensional system (100 trillion adjustable synapses in the human brain2), and during learning, the brain changes as a function of high-dimensional sensory data (106 optic nerve fibres) acquired across nested periods of development3. This is an impossible amount of complexity to capture using human intuition alone.

Orhan and Lake tame this complexity by turning to AI (Fig. 1). Like brains, today’s top-performing AI systems (for example, transformers) are high-dimensional systems that are capable of learning from raw sensory data4. These AI systems also lack hard-coded domain-specific knowledge about objects and space. Instead, the systems learn domain-specific knowledge from experience (training data), mirroring empiricist intuitions. To explore whether these domain-general systems can learn domain-specific object knowledge from raw and realistic experiences, Orhan and Lake trained the AI systems on headcam videos recorded from the perspective of children. This included hundreds of hours of longitudinal, natural videos recorded from three children across 26 months of development. After training, the authors evaluated the object-perception skills of the AI systems using rigorous real-world recognition tasks from the computer vision literature. By using video data from children to train domain-general systems, Orhan and Lake directly test whether object perception is learnable without hard-coded (innate) object knowledge.

Fig. 1: AI models for studying the origins of intelligence.
figure 1

Biological intelligence develops from the interaction of core learning algorithms (horizontal axis) and experience (vertical axis). Prior studies explored how domain-general algorithms learn from pictures4 (purple circle) and videos11 (blue circle). Orhan and Lake1 (red circle) improved the realism of visual experience by training AI systems on the first-person visual experiences of young children, captured through head-mounted cameras. They also extended prior work that trained convolutional neural networks (CNNs) on the first-person experiences of children5 (orange circle). Orhan and Lake provide compelling evidence that domain-general algorithms can learn object perception when trained ‘through the eyes of children’.

They found that the equivalent of a few weeks of visual experience from a child is sufficient for domain-general systems to develop core object perception skills, including object recognition, localizing semantic categories and learning broad semantic categories. These skills developed without explicit supervision and without hard-coded knowledge about objects and space. This finding provides compelling evidence that domain-general systems can learn domain-specific knowledge from experience.

Orhan and Lake also trained a variety of domain-general systems, including embedding models (which learn high-level features for downstream tasks) and generative models (which learn to generate images). Both model types learned object perception. Accordingly, the discovery that domain-general systems learn domain-specific object knowledge is not unique to a particular model, but rather reflects a general principle across many learning algorithms and architectures. Object perception may simply be an emergent property of generic high-dimensional systems (such as, brains or transformers) learning from the high-dimensional visual data acquired by embodied agents.

This study sets the stage for a rigorous computational approach to studying the origins of knowledge, in which high-performing tools from AI are used as formal scientific models of the core learning algorithms in brains. Rather than relying on intuitions, scientists can now directly test whether domain-general algorithms learn core mental skills from the raw sensory experiences available to babies5,6 and newborn animals7.

Of course, this is not the end of the debate. Orhan and Lake used disembodied AI systems that received data passively, which differs from the active learning of humans and animals8. Future research might build embodied models of visual learning, in which artificial agents actively choose their own data for learning. Orhan and Lake also used a handful of object perception tasks, so some object knowledge might still require innate domain-specific knowledge to explain its rapid development in animals. Finally, the authors focused on object perception, leaving open the possibility that explaining other core mental skills might require innate knowledge systems. However, Orhan and Lake’s work provides a blueprint that could be applied across novel domains to leverage AI as scalable and functional models of development. Nativists and empiricists can now ask whether particular mental skills really are learnable by particular brain models when trained on the experiences available to developing organisms. Scientists across camps can then share their models (AI systems) and experiences (training data), building bridges across the nativist–empiricist divide.

Thomas Kuhn9 observed that paradigm changes are often spurred by advances from other fields. Orhan and Lake’s study shows how tools from AI can tackle classic questions in cognitive science. Armed with modelling tools for simulating high-dimensional systems learning from high-dimensional sensory data, scientists can now test nativist and empiricist claims about the learnability of mental skills10. This synergy between developmental psychology and AI will provide a robust foundation for building working models of the origins of intelligence.