Parallels Between AlphaFold’s Neural Architecture and Human Spatial Reasoning

Yiran Jiang ’26

AlphaFold, a neural network-based model that predicts a protein’s 3D structure from its amino acid sequence, has shown great potential to aid biochemical research and protein engineering. The breakthrough lies not just in its technical advancement of understanding protein folding; the underlying logic of how AlphaFold works also calls back to the building blocks of machine learning (ML) and artificial neural networks.

Given AlphaFold’s broad application in the biochemical field and specifically in drug discovery, it would be interesting to explore the similarities between AlphaFold’s internal network structure and human spatial representation, which gives us insights into how artificial intelligence (AI) models, particularly those based on deep learning, mimic or draw parallels to human cognitive processes.

In Dartmouth’s Introduction to Cognitive Science class, it is taught that the McCulloch-Pitts artificial neuron, a mathematical model of a biological neuron, serves as a foundational model for artificial neural networks. It processes inputs based on certain factors (e.g. W1 and W2) and adjusts the output based on a bias term (b). Tokenization, an important concept in ML, refers to assigning an index to each individual token. This broken-down format is necessary when computing an artificial neuron because it allows the system to be more flexible. Such models, however, only support linear decision boundaries; introducing depth by adding hidden layers between the input and output allows neural networks to handle non-linear decision boundaries and solve more complex problems.

We can understand any AI system, including AlphaFold, as extending this artificial neuron foundation by incorporating more layers and non-linear decision boundaries to model complex relationships, such as those between amino acids in a protein chain. The architecture of a neural network functions as a pattern of nodes and connections. This leads us to think one step further about whether AI systems like AlphaFold build connections like human brains do.

The input data that AlphaFold takes is the amino acid sequence of a protein. AlphaFold then integrates Multiple Sequence Alignments (MSAs) – the alignment of different biological sequences of similar length, to search genetic databases to find similar sequences (Jumper et al., 2021). These give hints about how evolution has shaped the structure. AlphaFold then models how each part of the protein sequence interacts with other parts by building a map of pairwise relationships between amino acids, which helps predict how close these amino acids will be in 3D space. Then, AlphaFold uses the learned knowledge of protein physics and geometry to predict the 3D structure, but before generating a highly accurate 3D model of the protein folding, AlphaFold goes through the prediction process iteratively to optimize the final prediction (Jumper et al., 2021).

AlphaFold integrates information from MSAs to take into account evolutionary relationships between proteins, similar to how humans integrate outside data to understand spatial relationships. Humans take in sensory information such as visual cues and feedback to figure out the relative positions of objects. For instance, the presence of linear perspective shows that people are able to pick up monocular cues to perceive depth and distance. Similarly, AlphaFold takes in structural models like MSAs to extract information and patterns of protein structure from the evolutionary databases to form judgements or predictions, much like the evolutionary basis of humans’ visual perception system.

According to Jumper et al. (2021), AlphaFold utilizes “the Evoformer blocks that contain a number of attention-based and non-attention-based components” to determine which amino acids are most likely to be adjacent to or interact with one another. This attention to relationships is also found in human cognition. People tend to prioritize focusing on key spatial features, like edges and angles when doing tasks like shapes identification and direction navigation. By selectively prioritizing the most relevant relationships and features, both systems are able to filter out irrelevant information and focus on the crucial factors of spatial reasoning.

AlphaFold is trained on a large dataset of proteins, and it learns to form abstract patterns across different protein families. Humans also use schemas and prototypes to generalize patterns based on memory and past experience. The abstraction and generalization are things that both humans and AlphaFold share. The end-to-end structure of AlphaFold ensures that the whole prediction process is in one flow, starting with the input sequence of amino acids and ending with the outputs of 3D protein structure. Unlike traditional protein modeling, AlphaFold doesn’t require distinct steps or separate algorithms but makes predictions in an integrated manner, which is similar to the domain-general quality of human cognition.

Humans usually create mental representations to understand spatial relationships. Shepard and Metzler (1971) coined the term “mental rotation” to describe that people are able to mentally rotate 2D or 3D objects to determine whether two objects placed in different orientations are of different shapes. Even though AlphaFold doesn’t perform mental rotations like humans do, there are some overlaps between AlphaFold’s geometric reasoning and mental representations in humans. AlphaFold creates representations of a protein structure through attention mechanisms like the aforementioned MSAs, which is similar to creating a mental map of internal relationships.

The process of refining predictions is also similar. AlphaFold’s initial prediction of the protein structure is iteratively examined and perfected, which is called an iterative refinement process. Humans also go through this iterative refinement process during prediction tasks such as predicting a numerical value or predicting patterns in data. Both humans and AlphaFold rely on feedback to refine their outputs. AlphaFold refines protein structures, whereas humans adjust their mental models based on trials and errors that they have experienced in cognitive tasks.

Overall, in many aspects such as outside data and feedback integration, attention to relationships, iterative refinement, and abstraction and generalization, AlphaFold and humans share similarities in making spatial representations. Indeed, much of the underlying neural structure of AlphaFold is based on the blueprint of human cognition. However, further investigation is needed to determine exactly what specific algorithms do human spatial representation and AlphaFold prediction overlap.

Edited by Jay Nathan ‘27

Sources:

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., & Back, T. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2

Shepard, R. N., & Metzler, J. (1971). Mental Rotation of Three-Dimensional Objects. Science, 171(3972), 701–703. https://doi.org/10.1126/science.171.3972.701

Image Reference: https://deepmind.google/discover/blog/a-glimpse-of-the-next-generation-of-alphafold/

Dartmouth Undergraduate Journal of Science

Leave a Reply Cancel reply