Home

Welcome to the MER-Factory wiki! The following is the development roadmap of MER-Factory

TODO LIST:

Batch Inference: Implement efficient batch inference for Hugging Face models to enable high-throughput processing.
Model API: Develop a lightweight API server for model access, which will decouple dependencies and simplify integration.
Expanded Modality Support: Introduce feature extraction tools for new modalities, beginning with audio.
End-to-End Tutorial: Create a comprehensive tutorial to guide users through a complete workflow.

MER-Factory Technical Development Roadmap

Overview

This document outlines the technical development strategy for MER-Factory, an open-source initiative dedicated to advancing the field of Affective Computing. Our long-term vision is to establish MER-Factory as the leading platform for streamlined research and development of multimodal emotion reasoning models. The short-term goals are focused on building a robust and reproducible foundation for this vision.

Long-term Goal:

To engineer a comprehensive, end-to-end automated platform for affective computing, covering the entire development lifecycle—from advanced task definition (e.g., empathy, personality traits) to state-of-the-art model training, fine-tuning, evaluation, and deployment. The ultimate aim is to significantly lower the barrier for creating and analyzing complex human-centered AI.

Short-term Goals:

Task Support: Stably support core emotion tasks (Emotion Recognition, Cause Extraction, Sarcasm Detection) with both single-modality and multi-modality processing capabilities.
Core Workflow: Establish and solidify the dataset construction and model training pipeline.
Evaluation System: Initially establish an automated evaluation and review process primarily based on LLM-as-a-Judge, supplemented by Human-in-the-Loop.

I. Core Emotion-Related Tasks

The factory will support the following core tasks, with the flexibility to handle both single-modality (e.g., text-only) and multi-modality (e.g., text, audio, visual) inputs.

Emotion Recognition: Identify and classify emotions expressed in a given context.
Emotion Cause Extraction (ECE): Identify and extract the specific utterances or events that trigger a particular emotion.
Sarcasm Detection: Determine whether an utterance is sarcastic by analyzing content and context, including potential inconsistencies across different modalities.

II. Dataset Exportation

LLaMA-Factory Formats:
- Alpaca format.
- ShareGPT format (with multimodal support).
MS-Swift Formats:
- Standard messages key JSONL format.

III. LLM Training Pipeline

Use the export dataset to fine-tune model with:

LLaMA-Factory (Done)
MS-Swift (ModelScope Swift)

IV. Evaluation Pipeline

LLM-as-a-Judge: Utilize a powerful "judge" LLM to automatically assess the quality of generated data based on predefined criteria.
Human-in-the-Loop (HITL): Provide a lightweight review interface for manual auditing, correction, and approval of the generated data.

Advancing together with the Affective Computing community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly