Paper
Flow Network based Generative Models for Non-Iterative Diverse Candidate\n Generation
This paper is about the problem of learning a stochastic policy for\ngenerating an object (like a molecular graph) from a sequence of actions, such\nthat the probability of generating an object is proportional to a given\npositive reward for that object. Whereas standard return maximization tends to\nconverge to a single return-maximizing sequence, there are cases where we would\nlike to sample a diverse set of high-return solutions. These arise, for\nexample, in black-box function optimization when few rounds are possible, each\nwith large batches of queries, where the batches should be diverse, e.g., in\nthe design of new molecules. One can also see this as a problem of\napproximately converting an energy function to a generative distribution. While\nMCMC methods can achieve that, they are expensive and generally only perform\nlocal exploration. Instead, training a generative policy amortizes the cost of\nsearch during training and yields to fast generation. Using insights from\nTemporal Difference learning, we propose GFlowNet, based on a view of the\ngenerative process as a flow network, making it possible to handle the tricky\ncase where different trajectories can yield the same final state, e.g., there\nare many ways to sequentially add atoms to generate some molecular graph. We\ncast the set of trajectories as a flow and convert the flow consistency\nequations into a learning objective, akin to the casting of the Bellman\nequations into Temporal Difference methods. We prove that any global minimum of\nthe proposed objectives yields a policy which samples from the desired\ndistribution, and demonstrate the improved performance and diversity of\nGFlowNet on a simple domain where there are many modes to the reward function,\nand on a molecule synthesis task.\n
Authors: Bengio, Emmanuel · Jain, Moksh · Korablyov, Maksym · Precup, Doina · Bengio, Yoshua