Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Michael, Julian | - |
dc.contributor.author | Rein, David | - |
dc.contributor.author | Bowman, Samuel | - |
dc.contributor.author | et al. | - |
dc.date.accessioned | 2024-09-30T01:57:14Z | - |
dc.date.available | 2024-09-30T01:57:14Z | - |
dc.date.issued | 2023-11 | - |
dc.identifier.other | arXiv:2311.08702 | - |
dc.identifier.uri | http://hdl.handle.net/2451/74632 | - |
dc.description.abstract | As AI systems are used to answer more difficult questions and potentially help create new knowledge, judging the truthfulness of their outputs becomes more difficult and more important. How can we supervise unreliable experts, which have access to the truth but may not accurately report it, to give answers that are systematically true and don't just superficially seem true, when the supervisor can't tell the difference between the two on their own? In this work, we show that debate between two unreliable experts can help a non-expert judge more reliably identify the truth. We collect a dataset of human-written debates on hard reading comprehension questions where the judge has not read the source passage, only ever seeing expert arguments and short quotes selectively revealed by 'expert' debaters who have access to the passage. In our debates, one expert argues for the correct answer, and the other for an incorrect answer. Comparing debate to a baseline we call consultancy, where a single expert argues for only one answer which is correct half of the time, we find that debate performs significantly better, with 84% judge accuracy compared to consultancy's 74%. Debates are also more efficient, being 68% of the length of consultancies. By comparing human to AI debaters, we find evidence that with more skilled (in this case, human) debaters, the performance of debate goes up but the performance of consultancy goes down. Our error analysis also supports this trend, with 46% of errors in human debate attributable to mistakes by the honest debater (which should go away with increased skill); whereas 52% of errors in human consultancy are due to debaters obfuscating the relevant evidence from the judge (which should become worse with increased skill). Overall, these results show that debate is a promising approach for supervising increasingly capable but potentially unreliable AI systems. | en |
dc.description.sponsorship | We thank NYU ARG for their helpful feedback, the NYU debate team for allowing us to advertise to and recruit their members, and the many hours of hard work by our hired debaters, including Adelle Fernando, Aliyaah Toussaint, Anuj Jain, Ethan Rosen, Max Layden, Reeya Kansra, Sam Jin, Sean Wang, and Shreeram Modi, among others. For helpful feedback on this draft, we thank Geoffrey Irving, Paul Christiano, Dan Valentine, John Hughes, Akbir Khan, and Beth Barnes. We thank Sunoo Park for guidance on the comparison to the adversarial system of litigation. Thanks also to Jess Smith and Joseph Miller for help with annotation platform development. This project has benefited from financial support to SB by Eric and Wendy Schmidt (made by recommendation of the Schmidt Futures program) and Open Philanthropy, and from OpenAI for API credits and access to gpt-4-32k. This material is based upon work supported by the National Science Foundation under Grant Nos. 1922658 and 2046556. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. | en |
dc.language.iso | en_US | en |
dc.publisher | arXiv | en |
dc.title | Data for "Debate Helps Supervise Unreliable Experts" | en |
dc.type | Dataset | en |
Appears in Collections: | Machine Learning for Language Lab |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
debate-2023-nyu-experiments.zip | 19.41 MB | Unknown | View/Open |
Items in FDA are protected by copyright, with all rights reserved, unless otherwise indicated.