Data for "Inverse Scaling: When Bigger Isn't Better"

McKenzie, Ian; Bowman, Samuel R.; Perez, Ethan

Full metadata record

DC Field	Value	Language
dc.contributor.author	McKenzie, Ian	-
dc.contributor.author	Bowman, Samuel R.	-
dc.contributor.author	Perez, Ethan	-
dc.date.accessioned	2024-09-30T02:00:48Z	-
dc.date.available	2024-09-30T02:00:48Z	-
dc.date.issued	2023-06	-
dc.identifier.uri	http://hdl.handle.net/2451/74633	-
dc.description.abstract	Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://github.com/inverse-scaling/prize/tree/main/data-release to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models.	en
dc.description.sponsorship	We thank everyone who submitted tasks to the Inverse Scaling Prize. Thank you to all the volunteers who contributed to reviewing submissions: Ananya Harsh Jha, Beth Barnes, Jonas Pfeiffer, Joshua Landau, Kamile Lukosiute, Naomi Saphra, Nicholas Kees Dupuis, Nicholas Lourie, Peter Barnett, Quintin Pope, Rasika Bhalerao, Richard Pang, Rune Kvist, Sam Ringer, Tamera Lanham, Thomas Larsen, and William Merrill. We are grateful to Open Philanthropy for providing funding for the prize. Thanks to Hannah Betts, Karl Berzins, Josh Jacobson, and Adam Gleave from FAR AI for logistical support in all aspects of handling prize money, including funding applications and distributing prizes. Thanks to Mary Dowling and Julie Nguyen from Tovella Dowling. Thanks also to Jenna Webster, Andrew Morton, and Brandon Warehime from Players Philanthropy Fund. This project has benefited from financial support to SB by Eric and Wendy Schmidt (made by recommendation of the Schmidt Futures program) and Open Philanthropy, and from in-kind support by the NYU High-Performance Computing Center and Stability AI. This material is based upon work supported by the National Science Foundation under Grant Nos. 1922658 and 2046556. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We would like to thank Anthropic for the use of their LMs, OpenAI for API help and credits for participants, including Cameron McKinnon for help evaluating on Anthropic models. We would also like to thank Scott Heiner, Edwin Chen, and others from Surge AI for organizing human validation and offering support to participants, and Jason Phang, Stella Biderman, and HuggingFace for their help running evaluations on large public models. Thanks to Lama Ahmad and others from OpenAI for assistance to participants in running evaluations on the OpenAI API, and for providing API credits. We also thank Ilya Sutskever and others at OpenAI for sharing results on GPT-4 models. We thank DeepMind for running evaluations, in particular Matthew Rahtz for his work running evaluations on Gopher and Chinchilla in both rounds and for his quick turnaround and patience in re-running after data issues. From DeepMind, we also thank Nick Fernando, Sanah Choudhry, and Koray Kavukcuoglu, and the teams behind Gopher (Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d’Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving) and Chinchilla (Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre)	en
dc.language.iso	en_US	en
dc.publisher	TMLR	en
dc.title	Data for "Inverse Scaling: When Bigger Isn't Better"	en
dc.type	Dataset	en
Appears in Collections:	Machine Learning for Language Lab

Files in This Item:

File	Description	Size	Format
prize-main.zip		73.05 MB	Unknown	View/Open

Show simple item record