Seen as an emerging tool for exploring the huge diversity of viruses on Earth, AI offers the potential to speed up metagenomic studies that look for species unknown to science. Traditionally, viruses have been difficult to study, given both their tendency to quickly evolve and the difficulty scientists have in growing most viruses in the lab.
As a result, in recent years researchers have searched for unknown viruses by sequencing DNA in samples taken from various environments. To identify the microbes present, researchers search for the genetic signatures of known viruses and bacteria, and in the process hope to come across previously unknown microbial genomes.
Such a hit-or-miss approach often fails. However, machine learning algorithms – which can parse data, learn from them, and then classify information autonomously – promise to get around the problem by finding emergent patterns in mountains of information.
“Previously, people had no method to study viruses well,” says Jie Ren, a computational biologist at the University of Southern California in Los Angeles. “But now we have tools to find them.”
The latest research involved the training of computers to identify the genetic sequences of viruses from one unusual family, Inoviridae. Inoviridae viruses live in bacteria and alter their host’s behavior, for example making the bacteria that cause cholera more toxic.
In the study, a machine-learning algorithm was presented with two sets of data – one containing 805 genomic sequences from known Inoviridae, and another containing about 2,000 sequences from bacteria and other types of viruses – so that the algorithm could find ways of distinguishing between them. The model was then fed massive metagenomic data sets.
The computer recovered more than 10,000 Inoviridae genomes, and clustered them into groups indicative of different species. The genetic variation between some of these groups was so wide that Inoviridae is probably many families, says Simon Roux, a computational biologist at the DOE Joint Genome Institute, who conducted the research. According to Roux, perhaps fewer than 100 species had been identified before his research began.
The work was presented at a meeting organized by the U.S. Department of Energy’s Joint Genome Institute.
AI speeds up precision medicine, says IBM Watson study
Virus ‘super sensor’ detects harmful contaminants in water
World’s smallest ‘tape’ recorder uses hacked microbes
Engineered bacteria reflect ‘sonar’ signals for ultrasound imaging
Large-scale DNA data storage moves closer to reality