Skip to content

jessicayjm/span_alignment_annotation_tool

Repository files navigation

Web tool for span and alignment annotation

This is a website that provides annotation function for spans given one text and one-to-one alignment given two texts with spans. The tool was originally built for our paper: Modeling Empathetic Alignment in Conversation. But it can easily be modified to support other tasks that need spans or alignments annotations.

Features

  • Provide span-level annotation function.
  • Provide alignment of spans annotation.
  • Offer Note function where annotators can make private notes on specific segment of text or join shared discussion with other annotators.
  • Add Review function where annotators can have access to others' annotations.
  • Easy job management for project admins and users.
  • Support data upload and download.
  • Built-in calculation for inter-annotator agreement using pygamma-agreement.

Here are some demos for the website: Demos

Installation

(The installation has been tested on Ubuntu 20.04 LTS.)

The website uses Flask, ReactJS, and Sqlite3 . The base architecture refers to EECS 485 at UMich.

** All commands should be executed under base folder span_alignment_annotation_tool/.

Install Utilities

$ sudo apt-get install sqlite3 curl
$ pip install -r requirements.txt

Install Nodejs before running the following command:

$ npm ci . --force

Initialize directories

$ ./bin/compAnninit

Add files under sql/uploads if needed. They will be copied to var/uploads.

Initialize database

Insert the project admin info to start in sql/init/sql. DO NOT insert plain password. You will be able set the password on login page from the website as a first-time user.

$ ./bin/compAnndb create

General ./bin/compAnndb usage:

Usage: ./bin/compAnndb (create|destroy|reset)

Note: destroy command will DELETE the database under var and CLEAR var/uploads/, but everything under sql will not be affected. Use this command with caution.

Configure secret keys

Fill in SECRET_KEY and JWT_SECRET_KEY (they shall be different) in compAnn/services/config.py.

You can choose to use the following command to randomly generate the keys:

$ openssl rand -base64 32

Create package

$ pip install -e .

Start the website

$ ./bin/compAnnrun

Data upload and download

All data files are in json format.

Project upload format

Here is a simple example of the project upload format:

{
    "agreement_score": 0,
    "annotators":
    [
        {
            "email": "admin@admin",
            "fullname": "Final Annotation",
            "isAdmin": 0,
            "username": "Final Annotation"
        },
        {
            "email": "admin@email.com",
            "fullname": "Admin",
            "isAdmin": 1,
            "username": "admin"
        },
        {
            "email": "user1@email.com",
            "fullname": "User 1",
            "isAdmin": 0,
            "username": "user1"
        },
        {
            "email": "user2@email.com",
            "fullname": "User 2",
            "isAdmin": 0,
            "username": "user2"
        },
        {
            "email": "output@model",
            "fullname": "Model",
            "isAdmin": 0,
            "username": "Model"
        }
    ],
    "data":
    {
    },
    "description": "description on the project",
    "labels":
    [
        {
            "color": "#E6B0AA",
            "name": "label1"
        },
        {
            "color": "#D7BDE2",
            "name": "label2"
        }
    ],
    "name": "name of the project"
}

The format in data is the same as Data upload format below.

Data upload format
Upload texts

This specifies the format for uploading only the texts to annotate and their related information.

Create a Dataframe with the following fields:

id target_id observer_id parent_id subreddit target_text observer_text distress_score condolence_score empathy_score full_text
int string string string string string string float float float string
  • id: the id of current instance
  • target_id (our project specific): the id of target text from Reddit API
  • observer_id (our project specific): the id of observer text from Reddit API
  • parent_id (our project specific): the parent_id of comment or post from Reddit API
  • subreddit (our project specific): Subreddit name
  • target_text: target text
  • observer_text: observer text
  • distress_score: the distress score for target (refer to the paper for getting the score)
  • condolence_score: the condolence score for observer (refer to the paper for getting the score)
  • empathy_score: the empathy score for target-observer pair (refer to the paper for getting the score)
  • full_text: the text to annotation. (our project specific: the combination of target_text and observer_text)

Note: if applying to other tasks for span annotation, id and full_text are sufficient. All other fields can be left empty with only the column name present.

Convert it to json file with to get the desired data upload format.

Upload with annotations

This specifies the format for uploading texts and their annotations (spans/alignments)

Create a Dataframe with the following fields:

id target_id observer_id parent_id subreddit target_text observer_text distress_score condolence_score empathy_score full_text annotations annotator1 annotator2 ... alignments annotator1_align annotator2_align ...
int string string string string string string float float float string [[email, start, end, label], ... ] [[start, end, label], ...] [[start, end, label], ...] [[email, (target_start, target_end), (observer_start, observer_end)], ...] [[(target_start, target_end), (observer_start, observer_end)], ...] [[(target_start, target_end), (observer_start, observer_end)], ...]

Newly added fields compared to Upload texts:

  • annotations: annotations from all annotators. label in the array should be one of the labels specified in Project upload format.
  • annotator?: the column name should be the email of the annotator (the code will automatically match the email format). The values are the annotations belonged to this specific annotator.
  • alignments: alignments from all annotators.
  • annotator?_align: the column name should be the email of the annotator with _align followed. This is a fixed format used in the code.

Convert it to json file with to get the desired data upload format.

Data download format

Same as Project upload format with ALL information included.

Citation

Cite our paper if you use this tool.

Contact

If you need any help on the website, please contact: jiaminy@ttic.edu.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors