Paper
Advancing regulatory variant effect prediction with AlphaGenome
Deep learning models that predict functional genomic measurements from DNA sequences are powerful tools for deciphering the genetic regulatory code. Existing methods involve a trade-off between input sequence length and prediction resolution, thereby limiting their modality scope and performance<sup>1-5</sup>. We present AlphaGenome, a unified DNA sequence model, which takes as input 1 Mb of DNA sequence and predicts thousands of functional genomic tracks up to single-base-pair resolution across diverse modalities. The modalities include gene expression, transcription initiation, chromatin accessibility, histone modifications, transcription factor binding, chromatin contact maps, splice site usage and splice junction coordinates and strength. Trained on human and mouse genomes, AlphaGenome matches or exceeds the strongest available external models in 25 of 26 evaluations of variant effect prediction. The ability of AlphaGenome to simultaneously score variant effects across all modalities accurately recapitulates the mechanisms of clinically relevant variants near the TAL1 oncogene<sup>6</sup>. To facilitate broader use, we provide tools for making genome track and variant effect predictions from sequence.
Authors: Žiga Avsec · Natasha Latysheva · Jun Cheng · Guido Novati · Kyle R. Taylor · Tom Ward · Clare Bycroft · Lauren Nicolaisen · Eirini Arvaniti · Joshua Pan · Raina Thomas · Vincent Dutordoir · Matteo Perino · Soham De · Alexander Karollus · Adam Gayoso · Toby Sargeant · Anne Mottram · Lai Hong Wong · Pavol Drotár · Adam Kosiorek · Andrew Senior · Richard Tanburn · Taylor Applebaum · Souradeep Basu · Demis Hassabis · Pushmeet Kohli