Taking Large Language Models to the Next Level: Combining Built-in and Learned Modularity

Yiran Jiang ‘26

Since the early 1980s, the idea of modularity has become important in understanding how the mind works (Robbins). Fodor’s concept of modularity suggests that certain mental processes, like seeing and understanding language, happen in specific parts of the brain, or “modules,” which are designed to handle particular tasks. Each of these modules works independently and focuses on its own specialized area, like perception or language. These modules process information automatically and are mandatory, meaning that modules work automatically in response to appropriate inputs. Additionally, these modules are informationally encapsulated. They work independently of other cognitive processes and are not influenced by outside knowledge. Fodor argues that low-level cognitive processes are modular, whereas some post-Fodorian cognitive scientists believe that modularity applies to our minds more than we think (Robbins). While some object to this model, modularity still presents itself as an effective theoretical framework to construct the complex architecture of human minds and cognition.

We can also analyze and construct large language models (LLMs) with modularity in mind. Most LLMs still utilize a monolithic approach through which all parts are handled as a single and unified system, which makes it hard to remove or replace a separate, specialized component (Park). Incorporating the modular approach, on the other hand, enhances workflow flexibility and promotes communication between modules. Interestingly, researchers have applied modular approaches to LLMs, but novel research aims to to further improve models’ level of modularity.

In the paper “Unlocking Emergent Modularity in Large Language Models,” Qiu et al. explore how to unlock emergent modularity in large language models, which means they try to tap into the hidden modular structures that form naturally during model training. Instead of treating the language models as one big system, they show that parts of the model can be activated. Most modular neural networks display explicit modularity with predefined tasks rather than implicit modularity, which refers to modular structures that emerge naturally.

The researchers aim to harness the power of implicit modularity. Mixture of Experts (MoEs) is a neural network architecture where multiple submodules are designed to handle different tasks. Emergent Mixture of Experts (EMoEs), which are built on MoEs, are utilized to unlock implicit modularity during training. To achieve this, researchers focus on transforming the Feed-Forward Networks (FFNs), a one way neural network that does not loop back, into modular components using clustering and gating mechanisms (Qiu et al. 2024). The processes include quantifying the output of FFNs, achieving modularization by using MoEs, then using “gating strategy” to eliminate extra parameters. When tested on different models and tasks, this approach helps the system handle new tasks better, both those it was trained on and tasks from completely new areas.

Qiu et al.’s findings allow us to develop Fodorian and post-Fodorian views on modularity. It is difficult to describe an omnipotent representational system to describe the mind, but at the same time it’s hard to disregard the concept of modules completely. Instead, a more flexible framework of “modular flexibility” incorporates both explicit and implicit modularity that are described by Qiu et al.

Like the original Fodor description, explicit modules manage automatic processing and are domain-specific. Implicit modularity, on the other hand, describes modules that are more dynamic and flexible because they are learned and adjusted according to circumstances. The connection between explicit and implicit modularity—that is, how these two processing modalities swap and interact with one another—would be highlighted by this paradigm. The implementation of “modular flexibility” in Large Language Models (LLMs) has the potential to greatly augment their adaptability and deepen language processing.

One possible application of modular flexibility in Large Language Models (LLMs) is in the field of translation. The two commonly used techniques in translation are literal translation, which refers to word-to-word translation that retains surface-level literal meanings, and free translation, which focuses on the implications instead of translating everything as it is. Free translation usually accounts for cultural and contextual nuances as much as possible.

In this case, explicit modules that are domain-specific and predefined are suitable for literal translation, and implicit modules can be used for handling higher-level and deeper-level linguistic translation. The challenge is to integrate these two types of modularity, allowing them to function in parallel and intellectually switch from one to another. Moreover, if emergent modularity that functions in Feed-Forward networks cannot go back in loops, LLMs might be unable to run a contextual check on their initial processing of translation.

Here is a general overview of how modular flexibility could possibly work in translation. The input text is first inserted into the system to start the process. After that, the text input gets assessed where certain neurons (keys) get activated by specific inputs. The keys are then grouped into experts, and this is when explicit modularization of word-to-word translation takes place. Word-for-word translation of simple content is handled by literal translation experts, who perform a domain-specific text translation with predefined modules and pre-trained transformers. However, when it comes to figurative or context-dependent language, implicit modularity that

governs free translation experts comes into play. These modules work in tandem to process the information, and each module produces translations according to its area of expertise.

One possible limitation is how to create interactions between explicit and implicit inputs and how to determine the algorithmic order of input detection and grouping. It couldn’t be just a simple combination of functions; it should be a collaborative system that blends the literal and free translations together through intermodular communication. One possible solution is the use of the gating system to clear unwanted variables and thus narrow down the scope of context evaluation. However, the distinction between literal and free translation is rather obscure, and we currently don’t have a domain-general system to deal with overlaps and ambiguities.

In short, the future of modularity and AI is yet to be explored, and most of the time conceptual proposals can precede practical applications. As we delve deeper into this field, the more exciting the results will be. Incorporating modularity in Large Language Models, especially in the field of AI translation, has the chance to greatly improve LLM efficiency and accuracy.

Edited by Jack Ranani ‘25

Sources:

Park, Joseph. “Modularity in AI: Understanding the Building Blocks of Intelligence – Digital Architecture Lab.” Digital Architecture Lab, 2 Oct. 2024, dalab.xyz/en/blog/modularity-in-ai-understanding-the-building-blocks-of-intelligence/. Accessed 13 Oct. 2024.
Qiu, Zihan, et al. “Unlocking Emergent Modularity in Large Language Models.” ACLWeb, Association for Computational Linguistics, 1 June 2024, aclanthology.org/2024.naacl-long.144/. Accessed 30 July 2024.
Robbins, Philip. “Modularity of Mind (Stanford Encyclopedia of Philosophy).” Stanford.edu, 2017, plato.stanford.edu/entries/modularity-mind/#ModuFodoStylModeProp.

Image Link:

https://en.m.wikipedia.org/wiki/File:Neural_network.svg

Dartmouth Undergraduate Journal of Science

Sources:

Leave a Reply Cancel reply