New Genomic Formatting Software Could Greatly Expedite Future Research

Dev Kapadia ’23, 7/14/20, CS-Engineering

Genomics refers to the study of an individual’s genetic material. The discipline dates back to 1871 with Friedrich Miescher’s discovery of “nuclein,” which is now known as DNA.1, 6 In recent years, the idea of manipulating these genes for human benefits has garnered a lot of attention. In 1990, the Human Genome Project planned to sequence the 3 billion base pairs that make up the human genome and completed the task 13 years later. In 1996, Dolly the Sheep was the first mammal successfully cloned.6

Despite the vast amount of research that has been put into the subject, studying genomics is actually still an inefficient process. For instance, genomics databases now hold a plethora of information, but this data is often incomplete or inaccurate. Additionally, even the data that is accurate could be scattered across databases, papers, repositories, and more. This then makes assembling and formatting the complete dataset a logistical nightmare for researchers.2 Given the high level of inefficiency caused by the complexity, Altuna Akalin and his research team at the Max Delbrück Center for Molecular Medicine in the Helmholtz Association worked to address this issue.4

Akalin’s team developed Janggu, a tool that converts a variety of genomics data into specified formats required for proper analysis by deep learning models. Deep learning models are machine learning algorithms that find patterns and relevant features from the inputted data, a function heavily sought after especially in a field with extremely high-complexity components. But to be able to apply these deep learning methods, researchers usually had to spend countless hours formatting the data, adding or removing even the smallest amount of data forced them to completely re-format the data again.3

Akalin’s team used principles of neural networks, a deep learning model, in order to implement the functions of Janggu. In neural networks, different characteristics of the data are represented by “neurons,” and algorithm uses the training data to assign probabilities of association between the neurons.2 Through the development of Janggu along with several tutorials, the team hopes that the software will be able to streamline the formatting efforts, also known as “pre-processing,” across the field of genomics. In fact, although the software is mainly a front-end formatting machine, it also provides basic levels of data visualization and level of similarity between nucleotides that would make planning for drug development much easier than today.4

Although genomic datasets have many other challenges to overcome besides the inherent confusion in the organizational structure, like privacy and more user-friendly functionality, the future is still bright for the field of genomics.2 Genomics could be used to create treatments tailored to the individual or could answer questions on the evolution and genetic diversity of species. Genomics could also help researchers make huge strides in the field of synthetic biology and bioengineering.5 With the attention that the field and its potential applications are getting, genomics is likely one of the most-innovative fields of the 21st century.



(1) A Brief Guide to Genomics. (2019, November 7). Genome.Gov. 

(2) Cheng, M. L., & Solit, D. B. (2018). Opportunities and Challenges in Genomic Sequencing    for Precision Cancer Care. Annals of Internal Medicine, 168(3), 221. 

(3) Kopp, W., Monti, R., Tamburrini, A., Ohler, U., & Akalin, A. (2020). Deep learning for         genomics using Janggu. Nature Communications, 11(1), 3488.

(4) Max Delbrück Center for Molecular Medicine in the Helmholtz Association. (2020, July        13). Janggu makes deep learning a breeze. ScienceDaily. Retrieved July 14, 2020 from

(5) Smith, Y. (2019, February 26). Applications of Genomics. News-Medical.Net.

(6) Timeline: History of genomics. (2016, February 5). Yourgenome.



Bookmark the permalink.

Leave a Reply

Your email address will not be published.