The AI program could help the search for new drug compounds. Pharmaceutical research tends to rely on software that exhaustively crawls through giant pools of candidate molecules using rules written by chemists, and simulations that try to identify or predict useful structures. The former relies on humans thinking of everything, while the latter is limited by the accuracy of simulations and the computing power required.
Aspuru-Guzik’s system can dream up structures more independently of humans and without lengthy simulations. It leverages its own experience, built up by training machine-learning algorithms with data on hundreds of thousands of drug-like molecules.
Generative models are more typically used to create images, speech, or text, for example in the case of Google’s Smart Reply feature that suggests responses to e-mails. But last month Aspuru-Guzik and colleagues at Harvard, the University of Toronto, and the University of Cambridge published results from creating a generative model trained on 250,000 drug-like molecules.
The system could generate plausible new structures by combining properties of existing drug compounds, and be asked to suggest molecules that strongly displayed certain properties such as solubility, and being easy to synthesize.
The researchers have already experimented with training their system on a database of organic LED molecules, which are important for displays. But making the technique into a practical tool will require improving its chemistry skills, because the structures it suggests are sometimes nonsensical.
Pande says one challenge for asking software to learn chemistry may be that researchers have not yet identified the best data format to use to feed chemical structures into deep-learning software. Images, speech, and text have proven to be a good fit—as evidenced by software that rivals humans at image and speech recognition and translation—but existing ways of encoding chemical structures may not be quite right.
… giving his system more data, to broaden its chemistry knowledge, will improve its power, in the same way that databases of millions of photos have helped image recognition become useful. The American Chemical Society’s database records around 100 million published chemical structures. Before long, Aspuru-Guzik hopes to feed all of them to a version of his AI program.