OOMB: Online Opinion Mining Benchmark

This repository contains the code and data for the OOMB benchmark introduced in 📄 Can Large Language Models be Effective Online Opinion Miners? OOMB is designed to evaluate the capabilities of large language models (LLMs) to perform both extractive and abstractive opinion mining across realistic, online environment.

Dataset Characteristics

All OOMB data files are located under the data/oomb_benchmark.pickle/. The dataset consists of 600 total content instances, broken down by source as follows:

Blogs (content_type = blog): 115 instances — Long-form posts describing products or services.
Review Sites (content_type = review_site): 106 instances — Structured product/service reviews with explicit ratings.
Reddit (content_type = reddit): 177 instances — Multi-user discussion threads with nested replies.
YouTube (content_type = youtube): 202 instances — Single-thread comment chains featuring informal language, emojis, and slang.

Each data instance in OOMB consists of:

Entity-feature-opinion tuple:
A list of structured tuples in the form of (entity, feature, opinion), capturing fine-grained user opinions and the exact sentence supporting each opinion.
Opinion-Centric Summary:
A concise, 3–5 sentence abstractive summary that groups key opinion topics and overall sentiment about the entity, reflecting major themes and insights found in the input content.

Example:

{
  "content_type": "blog",
  "text": "Title: 2021 Honda Ridgeline Review: Why It May Just Be the Perfect Truck for You…",
  "tuple": [
    {
      "entity": "2021 Honda Ridgeline",
      "feature": "capabilities",
      "opinion": "remain the same",
      "evidence_sent": "..."
    },
    {
      "entity": "2021 Honda Ridgeline",
      "feature": "interior",
      "opinion": "reclaims a physical volume knob",
      "evidence_sent": "..."
    }
  "..."
],
  "summary": "The 2021 Honda Ridgeline is ..."
}

Evaluation Tasks

Based on this dataset, we design two complementary tasks:

Feature-Centric Opinion Extraction (FOE):
Extract structured (entity, feature, opinion) tuples from the input content, capturing detailed user opinions grounded in the input content.
Opinion-Centric Insight Generation (OIG):
Generate a concise, 3–5 sentence summary that captures overall sentiment trends, recurring themes, and notable strengths or weaknesses about the entity in an abstractive manner.

Citation

@inproceedings{heo-etal-2025-large,
    title = "Can Large Language Models be Effective Online Opinion Miners?",
    author = "Heo, Ryang  and
      Seo, Yongsik  and
      Lee, Junseong  and
      Lee, Dongha",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1178/",
    doi = "10.18653/v1/2025.emnlp-main.1178",
    pages = "23108--23147",
    ISBN = "979-8-89176-332-6",
    abstract = "The surge of user-generated online content presents a wealth of insights into customer preferences and market trends.However, the highly diverse, complex, and context-rich nature of such content poses significant challenges to traditional opinion mining approaches.To address this, we introduce Online Opinion Mining Benchmark (OOMB), a novel dataset and evaluation protocol designed to assess the ability of large language models (LLMs) to mine opinions effectively from diverse and intricate online environments. OOMB provides, for each content instance, an extensive set of (entity, feature, opinion) tuples and a corresponding opinion-centric insight that highlights key opinion topics, thereby enabling the evaluation of both the extractive and abstractive capabilities of models.Through our proposed benchmark, we conduct a comprehensive analysis of which aspects remain challenging and where LLMs exhibit adaptability, to explore whether they can effectively serve as opinion miners in realistic online scenarios.This study lays the foundation for LLM-based opinion mining and discusses directions for future research in this field."
}

Contact

For questions, suggestions, or issues, feel free to reach out:

Email: ryang1119@yonsei.ac.kr

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
image		image
prompt		prompt
script		script
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OOMB: Online Opinion Mining Benchmark

Dataset Characteristics

Evaluation Tasks

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OOMB: Online Opinion Mining Benchmark

Dataset Characteristics

Evaluation Tasks

Citation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages