MIT Chemists Use Generative AI to Calculate 3D Genomic Structures

Did you know that every cell in your body shares the same genetic blueprint, yet each cell only activates a specific set of genes? This unique gene expression pattern, which distinguishes one cell type from another—like brain cells from skin cells—is significantly influenced by the three-dimensional (3D) arrangement of the genetic material, which regulates how accessible each gene is.

Recently, researchers at MIT have developed a groundbreaking method to map these 3D genome structures, utilizing the power of generative artificial intelligence. This innovative technique allows for the rapid prediction of thousands of chromatin structures in mere minutes, outpacing existing experimental approaches that typically require extensive time and resources.

With this new capability, scientists can easily explore how the spatial organization of the genome influences gene expression and cellular function.

“Our aim was to predict the three-dimensional genome structure based on the underlying DNA sequence,” explains Bin Zhang, an associate professor of chemistry and senior author of the study. “Achieving this places our technique alongside cutting-edge experimental methodologies and opens the door to exciting new research opportunities.”

MIT graduate students Greg Schuette and Zhuohan Lao contributed as lead authors of the study, which appears in Science Advances.

Transforming Sequences into Structures

Within the nucleus of each cell, DNA and proteins intertwine to form a complex called chromatin. This sophisticated structure allows cells to condense over two meters of DNA into a nucleus only one-hundredth of a millimeter across. The DNA strands wind around proteins known as histones, creating formations akin to beads strung on a thread.

Additionally, chemical tags called epigenetic modifications can be affixed to specific DNA locations. These tags, which vary with cell type, dictate how chromatin folds and which genes remain accessible. These variations contribute to the unique gene expression profiles seen in different cell types, or at different points in a cell’s lifecycle.

Over the last two decades, scientists have pioneered various experimental techniques to ascertain chromatin structures. One popular method, Hi-C, connects neighboring DNA strands inside the nucleus, allowing researchers to determine proximity by breaking the DNA into small fragments for sequencing.

This approach can be employed on large populations of cells to average chromatin structures or on individual cells for precise structural insights. However, techniques like Hi-C are labor-intensive, often requiring a week to process data from a single cell.

To address these challenges, Zhang and his team created a model that leverages advancements in generative AI, offering a quick, accurate means of predicting chromatin structures within single cells. Their AI model swiftly analyzes DNA sequences to forecast the chromatin structures those sequences may yield.

“Deep learning excels at recognizing patterns,” Zhang notes. “It enables us to analyze extensive DNA segments—thousands of base pairs long—and extract vital information encoded within those sequences.”

The researchers developed ChromoGen, a dual-component model. The first component is a deep learning network trained to interpret genome data, assessing the information encoded in DNA sequences along with widely available chromatin accessibility data specific to cell types.

The second component utilizes generative AI to accurately predict chromatin conformations, trained on over 11 million chromatin structures sourced from experiments utilizing Dip-C, a Hi-C variant performed on human B lymphocyte cells.

By integrating these components, the model captures how specific cellular environments influence chromatin structure formation, effectively elucidating the relationships between sequence and structure. For every DNA sequence, the researchers use the model to generate multiple potential structures. Given the inherent disorder of DNA, a single sequence can correspond to numerous conformations.

“Predicting genome structure is complicated because we don’t aim for just a single solution; there’s a distribution of possible structures for any given genomic segment,” explains Schuette.

Speedy Structural Analysis

Once the model is trained, it can make predictions significantly faster than Hi-C and similar experimental methods.

“While traditional methods might take six months to yield a few dozen structures for a cell type, our model can produce a thousand structures in 20 minutes using just one GPU,” Schuette adds.

After training, the researchers employed their model to generate structural predictions for over 2,000 DNA sequences and compared the results with experimentally validated structures. They found that the model’s predictions closely matched the experimentally derived data.

“By examining hundreds or thousands of conformations per sequence, we can accurately depict the variety of structures that a specific region may exhibit,” Zhang states. “Repetition of experiments across various cells will likely produce diverse conformations, which is precisely what our model is designed to predict.”

The researchers also found that their model demonstrated predictive accuracy for cell types beyond its original training set. This capability could illuminate how chromatin structure variations influence cell function, support investigations into diverse chromatin states within a single cell, and reveal the implications of specific DNA mutations on chromatin conformation—potentially shedding light on disease mechanisms.

“There are countless intriguing questions our model can help address,” concludes Zhang.

The research team has made their model and data available for anyone interested in further exploration.

This research received funding from the National Institutes of Health.

Photo credit & article inspired by: Massachusetts Institute of Technology

Leave a Reply

Your email address will not be published. Required fields are marked *