Tanya Goyal (@tanyaagoyal) / X

Tanya Goyal

201 posts

Tanya Goyal

@tanyaagoyal

Faculty @Cornell_CS. she/her

Ithaca, NY

tagoyal.github.io

Born July 9

Joined September 2019

Tanya Goyal
@tanyaagoyal
Sep 27, 2022
✨New preprint✨ Zero-shot GPT-3 does *better* at news summarization than any of our fine-tuned models. Humans like these summaries better. But all of our metrics think they’re MUCH worse. Work/ w/ @jessyjli, @gregd_nlp. Check it out here: arxiv.org/abs/2209.12356 [1/6]
Tanya Goyal
@tanyaagoyal
Jun 29, 2023
Replying to @tanyaagoyal
I will join Cornell CS @cs_cornell as an assistant professor in Fall 2024 after spending a year at @princeton_nlp working on all things language models. I will be posting about open positions in my lab soon but you can read about my research here: tagoyal.github.io
48K
Tanya Goyal
@tanyaagoyal
Jun 29, 2023
I successfully defended my PhD @UTCompSci last week. A BIG thank you to my advisor @gregd_nlp and mentor @jessyjli for being incredibly supportive throughout my PhD, and esp. over the last few months on the job market! Next, ... 1/2
21K
Tanya Goyal
@tanyaagoyal
May 20, 2022
Excited to share SNaC (Summary Narrative Coherence), a dataset of 9.6K error annotations across 150 long summaries + a data collection framework for fine-grained coherence errors in summarization. arxiv.org/abs/2205.09641 (work w/ @jessyjli @gregd_nlp)
Tanya Goyal
@tanyaagoyal
Sep 27, 2022
Replying to @tanyaagoyal
What does this mean for evaluation? All our metrics (ROUGE + factuality work from the last 2+ years, etc.) fail to evaluate GPT-3 summaries that look nothing like past generated or reference summaries of standard datasets! We need to rethink automatic evaluation going forward!
Tanya Goyal
@tanyaagoyal
May 7, 2020
New work "Neural Syntactic Preordering for Controlled Paraphrase Generation" (with @gregd_nlp) at #acl2020nlp! Basic Idea: Break paraphrasing into 2 steps: "soft" reordering of input (like preordering in MT) followed by rearrangement aware paraphrasing arxiv.org/abs/2005.02013 1/
Tanya Goyal
@tanyaagoyal
Oct 15, 2020
Our paper "Evaluating Factuality in Generation with Dependency-level Entailment" (w/ @gregd_nlp) to appear in Findings of #EMNLP2020! arxiv.org/abs/2010.05478 We decompose the sen-level factuality into entailment evaluation of smaller units (dependency arcs) of the hypotheses
Tanya Goyal
@tanyaagoyal
May 22, 2022
Presenting our work (w/ @JiachengNLP @jessyjli @gregd_nlp) “Training Dynamics for Text Summarization Models” at #acl2022 Findings: aclanthology.org/2022.findings-… In-person PS5-3 Summarization: 3:15p, May 24 (Dublin time) Virtual PS3 Summarization: 7.30a, May 25 (Dublin time)
Tanya Goyal
@tanyaagoyal
Dec 5, 2022
Excited to present our work on multi-decoder summarization models "HydraSum" later this week at #EMNLP! work w/ @nazneenrajani @owenhaoliu & @iam_wkr during my Salesforce internship! arxiv.org/abs/2110.04400 I will present this in person on 9th Dec, 11am in the summ session 🧵 1/
Tanya Goyal
@tanyaagoyal
Sep 27, 2022
Replying to @tanyaagoyal
We collect human preference annotations for news summaries generated by current SOTA and zero-shot GPT-3 models. For multiple settings (generic + keyword) and datasets (CNN + BBC), GPT-3 summaries beat prior fine-tuned models! [2/6]
Tanya Goyal
@tanyaagoyal
Dec 9, 2022
I will be presenting this (+ Hydrasum tinyurl.com/hydrasum), both in the summarization oral session at 11 am today. Come say hi!
Tanya Goyal
@tanyaagoyal
May 20, 2022
Excited to share SNaC (Summary Narrative Coherence), a dataset of 9.6K error annotations across 150 long summaries + a data collection framework for fine-grained coherence errors in summarization. arxiv.org/abs/2205.09641 (work w/ @jessyjli @gregd_nlp)
Tanya Goyal
@tanyaagoyal
Sep 27, 2022
Replying to @tanyaagoyal
Browse examples of generated summaries and human annotations at: tagoyal.github.io/zeroshot-news-… [6/6]
Tanya Goyal
@tanyaagoyal
Sep 27, 2022
Replying to @tanyaagoyal
Furthermore, GPT-3 can emulate multiple different styles and is keyword controllable. It does great in all the settings we tested it in, and doesn’t present the kinds of factual errors we’ve seen in the literature. [3/6]
Tanya Goyal
@tanyaagoyal
Sep 27, 2022
Replying to @tanyaagoyal
This also means we can now break away from noisy benchmark datasets, e.g. XSum, that (we observe) cannot produce systems for real settings. Instead, actual use cases and not data availability can now dictate future research directions (task goals, domains, etc.) [4/6]