Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning

Apr 1, 2023·

Maranga Mokaya

Fergus Imrie

Willem P. Van Hoorn

Ola Kalisz

Anthony R. Bradley

Charlotte M. Deane

· 0 min read

Cite Paper

Abstract

Deep reinforcement learning methods have been shown to be potentially powerful tools for de novo design. Recurrent-neural-network-based techniques are the most widely used methods in this space. In this work we examine the behaviour of recurrent-neural-network-based methods when there are few (or no) examples of molecules with the desired properties in the training data. We find that targeted molecular generation is usually possible, but the diversity of generated molecules is often reduced and it is not possible to control the composition of generated molecular sets. To help overcome these issues, we propose a new curriculum-learning-inspired recurrent iterative optimization procedure that enables the optimization of generated molecules for seen and unseen molecular profiles, and allows the user to control whether a molecular profile is explored or exploited. Using our method, we generate specific and diverse sets of molecules with up to 18 times more scaffolds than standard methods for the same sample size; however, our results also point to substantial limitations of one-dimensional molecular representations, as used in this space. We find that the success or failure of a given molecular optimization problem depends on the choice of simplified molecular-input line-entry system (SMILES).

Type

Journal article

Publication

Nature Machine Intelligence

Last updated on Apr 1, 2023

Chemical Libraries Computational Biology and Bioinformatics Drug Discovery and Development Machine Learning Structure-Based Drug Design

Authors

Ola Kalisz

PhD Student

← ADIOS: Antibody Development via Opponent Shaping Jan 1, 2025

Learning to Prune Deep Neural Networks via Reinforcement Learning Jul 1, 2020 →