Issue Date | Title | Author(s) |
Jul-2020 | BLiMP: The Benchmark of Linguistic Minimal Pairs for English (Electronic Resources) | Warstadt, Samuel R.; Parrish, Alicia; Liu, Haokun; Mohananey, Anhad; Peng, Wei; Wang, Sheng-Fu; Bowman, Samuel R. |
2019 | CoLA: The Corpus of Linguistic Acceptability (with added annotations) | Warstadt, Alex; Singh, Amanpreet; Bowman, Samuel R. |
2021 | Comparing Test Sets with Item Response Theory | Clara Vania; Samuel R. Bowman |
Nov-2023 | Data for "Debate Helps Supervise Unreliable Experts" | Michael, Julian; Rein, David; Bowman, Samuel; et al. |
Jun-2023 | Data for "Inverse Scaling: When Bigger Isn't Better" | McKenzie, Ian; Bowman, Samuel R.; Perez, Ethan |
2021 | Does Putting a Linguist in the Loop Improve NLU Data Collection? | Alicia Parrish; Samuel R. Bowman |
Nov-2023 | GPQA: A Graduate-Level Google-Proof Q&A Benchmark | Rein, David; Bowman, Samuel; et al. |
Nov-2019 | Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs | Warstadt, Alex; Bowman, Samuel R.; et al. |
2020 | The Mixed Signals Generalization Set | Warstadt, Alex; Zhang, Yian; Li, Haau-Sing; Bowman, Samuel R. |
2018 | The Multi-Genre NLI Corpus | Williams, Adina; Nangia, Nikita; Bowman, Samuel R. |
2023 | Pretraining Language Models with Human Preferences | Tomasz Korbak; Samuel R. Bowman; Ethan Perez |
2023 | (QA)^2: Question Answering with Questionable Assumptions | Samuel R. Bowman; Phu Mon Htut; Najoung Kim |
2022 | QuALITY: Question Answering with Long Input Texts, Yes! | Richard Yuanzhe Pang; Samuel R. Bowman |
Jun-2015 | The SNLI Corpus | Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, Christopher D. |
2022 | SQuALITY: Building a Long-Document Summarization Dataset the Hard Way | Alex Wang; Samuel R. Bowman |