An automated fortune teller predicts gene expression regulation

Gene expression affects all aspects of life, from the survival of bacteria in specific environments, to human anatomy and physiology. The ability to predict the level of expression of a gene based on the DNA sequences that regulate that gene expression will transform researchers’ studies of biology. However, the biochemical mechanisms that regulate gene expression are highly complex. Although scientists have been doing their best to predict these gene expression processes for over 50 years, that goal eluded us. But in a paper recently published in the journal Earthresearcher Vaishnav and his research team use1 Two prominent techniques to create an automated “predictor” that successfully predicts gene expression processes in bread yeast Saccharomyces cerevisiae.

The first technique used by these researchers is a way to measure the expression of the gene encoding yellow fluorescent protein (YFP) in every single cell of a large population of yeast cells.2Different cells in this group carried different DNA sequences that regulate gene expression, called promoters. These catalysts are located near Jin. yfp On a small piece of cyclic DNA, the proximity of this gene allows it to control its expression. The authors selected a set of more than 30 million series from different promoters, each 80 base pairs long, and measured the level of YFP production in each cell containing one of the promoters.

Vaishnav and his research team then entered the resulting gene expression data into another instrument called a convolutional neural network, their second AI technique. They trained this network to predict gene expression based on this data. They then demonstrated the network’s ability to predict gene expression with an astonishing degree of accuracy (Fig. 1).

For example, the researchers created thousands of other series of stimuli that were not used to train the neural network, measured the ability of these stimuli to control gene expression, and demonstrated that the network can predict with great accuracy how well each of these stimuli controls no expression. The researchers then presented random primitive sequences of DNA to the network and demonstrated that the network’s ability to predict gene expression from strands of DNA could be used to transform these random primitive strands into stimuli, through ten cycles around the computational evolution of these primitive strings in stimuli that are expected to Induce expression levels that are too high or too low for a gene yfp. The research team then created 500 of these series of promoters to measure their ability to regulate the expression of the gene. yfp. Indeed, these computer-simulated sequences have induced either very high or very low levels of gene expression. These and other validation experiments have shown that Vaishnav and his research team have succeeded in creating an automated fortune teller that predicts gene expression processes very efficiently.

This fortune teller can also help explain many aspects of the evolution of gene expression. For example, the authors were able to computer-predict that three or four mutations in most primary DNA sequences were sufficient to develop very high or very low levels of gene expression, which they confirmed experimentally. They also concluded that approximately 70% of yeast genes are subjected to a selection that stabilizes gene expression (i.e., selection tends to form mutations that do not cause significant changes in gene expression). In addition, they show that genes that are subjected to a selection process that stabilizes gene expression become more resistant to gene expression-regulating mutations in DNA. This means that mutations in the promoters of these genes have less impact on gene expression.

Figure 1 | Learn to predict gene expression. a.
Vaishnav and its research team created1 A library of 30 million promoter sequences, which are DNA sequences that control gene expression, each with a length of 80 base pairs. Next, the research team measured how efficiently each of these sequences controls the expression of the gene encoding yellow fluorescent protein (YFP) in yeast cells. B. The research team used this data to train a neural network to predict how different stimuli would control gene expression. c. The research team then tested the network’s predictive power, designed thousands of other triggers (only one is shown in the diagram), and showed that the network can predict with great accuracy the ability of each promoter to control gene expression.

Zoom in

There are several reasons why this research is important. The first is that it may help to design genes with a specific level of gene expression. Second, it may help to explain many aspects of the evolution of gene regulation. It is noteworthy that this research will also enable scientists to discover answers to a wide range of questions that are more than any research team can tackle alone, as has happened with other deep learning applications used in biology over the past few years. , such as the development of protein folding tools3.

Despite the importance of this automated fortune teller, it has some shortcomings. First, it only mimics changes in promoters, one of many DNA sequences that can affect gene expression. In addition, the predictor does not take into account the effect of changes in neighboring DNA sequences, such as the protein coding region, which may affect gene expression. Another shortcoming is that the fortune teller was developed to predict gene expression in yeast, in which gene regulation is less complex than in humans. For example, the DNA that regulates gene expression in yeast is usually a few hundred base pairs away from the gene that controls its expression, while the DNA that regulates gene expression in animals is millions of base pairs of the gene that regulates expression around it. Therefore, it is not clear whether Vaishnav and his research team’s approach can extend its application to more complex genetic regulatory processes. There is some cautious optimism that this method is largely successful, although the 30 million DNA strands used to train the convolutional neural network represent only a very small percentage (approximately 2 x 10).-41) Van 804 Possible image of DNA strands, which are 80 bases long, formed by the four DNA nucleotides. Thus, the small number of samples of series used in network training may not be a major obstacle to the success of this method.

Finally, like any fortune teller in mythology, this machine predictor does not predict expression processes, but does not explain them.

This does not explain why the level of gene expression induced by a trigger is high or low. Nor does it specify which transcription factors bind to the promoter, or how those transcription factors interact. In other words, this gene expression prediction model does not fully explain the mechanism of gene expression regulation. Overcoming this shortcoming requires a lot of work54.2. But since the prediction of gene expression has long been problematic, we do not need a fortune teller to expect biologists to welcome and embrace technology.

Leave a Comment