The Prospect of Integrating Modularity in Large Language Models

Yiran Jiang ‘26

Since the early 1980s, the concept of modularity has been important in understanding how the mind works (Robbins). Fodor’s concept of modularity suggests that certain mental processes, like vision and language comprehension, happen in specific brain regions, or “modules,” each designed to handle particular tasks. Each module operates independently, focusing on their specialized areas such as perception or language, processing information automatically and mandatorily in response to relevant stimuli. Additionally, these modules are informationally encapsulated, functioning independently of other cognitive processes and unaffected by external knowledge.

Fodor argues for low-level processing in the modularity of the mind, whereas some post-Fodorian cognitive scientists believe the extent of modularity in our minds is far more comprehensive than previously thought (Robbins). Though there are dissents to this model, modularity remains a potent theoretical framework for constructing the complex architecture of human minds and cognition. Modularity should be further investigated as an aspect of structured programming and the potential of utilizing a modular approach in constructing Large Language Models (LLMs).

Most LLMs still utilize a monolithic approach, handling all parts as a single and unified system, which complicates the removal or replacement of individual specialized components (Park). Conversely, incorporating a modular approach enhances workflow flexibility and promotes communication between modules.

Interestingly, the application of the modular approach exists in LLMs, but researchers aim to extend the level of modularity further. In the paper “Unlocking Emergent Modularity in Large Language Models,” Qiu et al. explore how to unlock emergent modularity in large language models (LLMs), which means they try to tap into the hidden modular structures that form naturally during model training. Instead of treating the language models as one big system, they show that parts of the model can be activated. Most Modular Neural Networks display explicit modularity with predefined tasks, as opposed to implicit modularity, which refers to modular structures that emerge naturally. Thus, the researchers’ task is to harness the power of implicit modularity.

Mixture of Experts (MoEs) is a neural network architecture where multiple submodules are designed to handle different tasks. Emergent Mixture of Experts (EMoEs), which are built on MoEs, are utilized to unlock implicit modularity during training. To achieve this, researchers focus on transforming Feed-Forward Networks (FFNs) – one way neural networks that do not loop back – into modular components using clustering and gating mechanisms (Qiu et al., 2024). The processes include quantifying FFN outputs, achieving modularization using MoEs, and using a “gating strategy” to eliminate extra parameters. When tested across different models and tasks, this approach helps the system to better handle new tasks, including those it was trained on and those from completely new areas.

Based on the findings of Qiu et al., Fodorian and post-Fodorian views on modularity may be furthered. Accurately outlining the mind or devising an omnipotent representational system to describe it remains challenging, yet completely discarding the concept of modularity is also unfeasible. Instead, a more flexible framework that incorporates both explicit and implicit modularity as put forth by Qiu et al may be advantageous.

The idea of “modular flexibility” blends implicit and explicit modularity. Following Fodor’s original descriptions, explicit modules manage automatic processing and are domain-specific. Implicit modularity, on the other hand, describes modules that are more dynamic and flexible because they are learned and adjusted according to circumstances. The connection between explicit and implicit modularity—how these two processing modalities exchange and interact—would be highlighted by this paradigm. The implementation of “modular flexibility” in LLMs has the potential to greatly augment their adaptability and deepen language processing.

One potential application of modular flexibility in LLMs is in the field of translation. The two common techniques in translation are literal translation, which involves word-for-word translation that retains surface-level meanings, and free translation, which focuses on the implications of the original text, accounting for cultural and contextual nuances.

In this case, explicit modules that are domain-specific and predefined are suitable for literal translation, and implicit modules can handle higher-level and deeper-level linguistic translations. The challenge lies in determining whether these two types of modularity can function in parallel and intellectually switch from one to another. Moreover, if emergent modularity that functions in Feed-Forward networks cannot go back in loops, it raises the question of how much room we have to let LLMs do a contextual check on their initial processing of translation.

Here is a general overview of how modular flexibility might work in translation: The input text is first inserted into the system to start the process. After that, the text input gets assessed where certain neurons (keys) get activated by specific inputs. The keys are then grouped into experts, and this is when explicit modularization of word-to-word translation takes place. Word-for-word translation of simple content is handled by literal translation experts, who perform a domain-specific text translation with predefined modules and pre-trained transformers. However, when it comes to figurative or context-dependent language, implicit modularity that governs free translation experts comes into play.

These modules work in tandem to process the information, and each module produces translations according to its area of expertise. One ongoing challenge is creating interactions between explicit and implicit inputs and determining the algorithmic sequence for input detection and grouping. It should not be a simple combination of functions but a collaborative system that merges literal and free translations through intermodular communication. One possible solution involves using a gating system to clear unnecessary variables, thus narrowing down the scope of context evaluation. However, the distinction between literal and free translation is rather obscure, and we currently lack a domain-general system to address overlaps and ambiguities.

In short, the future of modularity and AI, is yet to be explored, and most of the time, conceptual proposals often precede practical applications. Yet, the harder it gets, the more exciting the results will be. General optimism remains about the prospects of incorporating modularity into Large Language Models, especially in the field of AI translation.

Edited by Alice Wong ‘25

Sources:

Dartmouth Undergraduate Journal of Science

Leave a Reply Cancel reply