-
Notifications
You must be signed in to change notification settings - Fork 12
Home
Welcome to the MER-Factory wiki! The following is the development roadmap of MER-Factory
TODO LIST:
- Batch Inference: Implement efficient batch inference for Hugging Face models to enable high-throughput processing.
- Model API: Develop a lightweight API server for model access, which will decouple dependencies and simplify integration.
- Expanded Modality Support: Introduce feature extraction tools for new modalities, beginning with audio.
- End-to-End Tutorial: Create a comprehensive tutorial to guide users through a complete workflow.
This document outlines the technical development strategy for MER-Factory, an open-source initiative dedicated to advancing the field of Affective Computing. Our long-term vision is to establish MER-Factory as the leading platform for streamlined research and development of multimodal emotion reasoning models. The short-term goals are focused on building a robust and reproducible foundation for this vision.
Long-term Goal:
To engineer a comprehensive, end-to-end automated platform for affective computing, covering the entire development lifecycle—from advanced task definition (e.g., empathy, personality traits) to state-of-the-art model training, fine-tuning, evaluation, and deployment. The ultimate aim is to significantly lower the barrier for creating and analyzing complex human-centered AI.
Short-term Goals:
- Task Support: Stably support core emotion tasks (Emotion Recognition, Cause Extraction, Sarcasm Detection) with both single-modality and multi-modality processing capabilities.
- Core Workflow: Establish and solidify the dataset construction and model training pipeline.
- Evaluation System: Initially establish an automated evaluation and review process primarily based on LLM-as-a-Judge, supplemented by Human-in-the-Loop.
The factory will support the following core tasks, with the flexibility to handle both single-modality (e.g., text-only) and multi-modality (e.g., text, audio, visual) inputs.
- Emotion Recognition: Identify and classify emotions expressed in a given context.
- Emotion Cause Extraction (ECE): Identify and extract the specific utterances or events that trigger a particular emotion.
- Sarcasm Detection: Determine whether an utterance is sarcastic by analyzing content and context, including potential inconsistencies across different modalities.
-
LLaMA-Factory Formats:
- Alpaca format.
- ShareGPT format (with multimodal support).
-
MS-Swift Formats:
- Standard
messageskey JSONL format.
- Standard
Use the export dataset to fine-tune model with:
- LLaMA-Factory (Done)
- MS-Swift (ModelScope Swift)
- LLM-as-a-Judge: Utilize a powerful "judge" LLM to automatically assess the quality of generated data based on predefined criteria.
- Human-in-the-Loop (HITL): Provide a lightweight review interface for manual auditing, correction, and approval of the generated data.
Advancing together with the Affective Computing community.