This repository contains the code and data for the OOMB benchmark introduced in 📄 Can Large Language Models be Effective Online Opinion Miners? OOMB is designed to evaluate the capabilities of large language models (LLMs) to perform both extractive and abstractive opinion mining across realistic, online environment.
All OOMB data files are located under the data/oomb_benchmark.pickle/. The dataset consists of 600 total content instances, broken down by source as follows:
- Blogs (
content_type = blog): 115 instances — Long-form posts describing products or services. - Review Sites (
content_type = review_site): 106 instances — Structured product/service reviews with explicit ratings. - Reddit (
content_type = reddit): 177 instances — Multi-user discussion threads with nested replies. - YouTube (
content_type = youtube): 202 instances — Single-thread comment chains featuring informal language, emojis, and slang.
Each data instance in OOMB consists of:
-
Entity-feature-opinion tuple:
A list of structured tuples in the form of(entity, feature, opinion), capturing fine-grained user opinions and the exact sentence supporting each opinion. -
Opinion-Centric Summary:
A concise, 3–5 sentence abstractive summary that groups key opinion topics and overall sentiment about the entity, reflecting major themes and insights found in the input content.
Example:
{
"content_type": "blog",
"text": "Title: 2021 Honda Ridgeline Review: Why It May Just Be the Perfect Truck for You…",
"tuple": [
{
"entity": "2021 Honda Ridgeline",
"feature": "capabilities",
"opinion": "remain the same",
"evidence_sent": "..."
},
{
"entity": "2021 Honda Ridgeline",
"feature": "interior",
"opinion": "reclaims a physical volume knob",
"evidence_sent": "..."
}
"..."
],
"summary": "The 2021 Honda Ridgeline is ..."
}Based on this dataset, we design two complementary tasks:
-
Feature-Centric Opinion Extraction (FOE):
Extract structured(entity, feature, opinion)tuples from the input content, capturing detailed user opinions grounded in the input content. -
Opinion-Centric Insight Generation (OIG):
Generate a concise, 3–5 sentence summary that captures overall sentiment trends, recurring themes, and notable strengths or weaknesses about the entity in an abstractive manner.
@inproceedings{heo-etal-2025-large,
title = "Can Large Language Models be Effective Online Opinion Miners?",
author = "Heo, Ryang and
Seo, Yongsik and
Lee, Junseong and
Lee, Dongha",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.1178/",
doi = "10.18653/v1/2025.emnlp-main.1178",
pages = "23108--23147",
ISBN = "979-8-89176-332-6",
abstract = "The surge of user-generated online content presents a wealth of insights into customer preferences and market trends.However, the highly diverse, complex, and context-rich nature of such content poses significant challenges to traditional opinion mining approaches.To address this, we introduce Online Opinion Mining Benchmark (OOMB), a novel dataset and evaluation protocol designed to assess the ability of large language models (LLMs) to mine opinions effectively from diverse and intricate online environments. OOMB provides, for each content instance, an extensive set of (entity, feature, opinion) tuples and a corresponding opinion-centric insight that highlights key opinion topics, thereby enabling the evaluation of both the extractive and abstractive capabilities of models.Through our proposed benchmark, we conduct a comprehensive analysis of which aspects remain challenging and where LLMs exhibit adaptability, to explore whether they can effectively serve as opinion miners in realistic online scenarios.This study lays the foundation for LLM-based opinion mining and discusses directions for future research in this field."
}For questions, suggestions, or issues, feel free to reach out:
- Email: ryang1119@yonsei.ac.kr
