Paper
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
Authors: Botev, Aleksandar · De, Soham · Smith, Samuel L · Fernando, Anushan · Muraru, George-Cristian · Haroun, Ruba · Berrada, Leonard · Pascanu, Razvan · Sessa, Pier Giuseppe · Dadashi, Robert · Hussenot, Léonard · Ferret, Johan · Girgin, Sertan · Bachem, Olivier · Andreev, Alek · Kenealy, Kathleen · Mesnard, Thomas · Hardin, Cassidy · Bhupatiraju, Surya · Pathak, Shreya · Sifre, Laurent · Rivière, Morgane · Kale, Mihir Sanjay · Love, Juliette · Tafti, Pouya · Joulin, Armand · Fiedel, Noah · Senter, Evan · Chen, Yutian · Srinivasan, Srivatsan · Desjardins, Guillaume · Budden, David · Doucet, Arnaud · Vikram, Sharad · Paszke, Adam · Gale, Trevor · Borgeaud, Sebastian · Chen, Charlie · Brock, Andy · Paterson, Antonia · Brennan, Jenny · Risdal, Meg · Gundluru, Raj · Devanathan, Nesh · Mooney, Paul · Chauhan, Nilay · Culliton, Phil · Martins, Luiz Gustavo · Bandy, Elisa · Huntsperger, David · Cameron, Glenn · Zucker, Arthur · Warkentin, Tris · Peran, Ludovic · Giang, Minh · Ghahramani, Zoubin · Farabet, Clément · Kavukcuoglu, Koray · Hassabis, Demis · Hadsell, Raia · Teh, Yee Whye · de Frietas, Nando