TRACE

This repository allows for easily constructing and running complex multimodal demonstrations.

A demo is organized as a collection of "features", each of which serves a specific purpose. Examples of features are body tracking, gaze, gesture, audio transcriptions, proposition extraction, common ground tracking, logging, and more.

All features specify an output "interface", which is a class representing the data a feature will output. Features also specify zero or more input interfaces, which they require in order to calculate the output. For example, the Proposition feature has PropositionInterface as its output interface and TranscriptionInterface as its only input interface.

If a feature A needs input interface X, it can set another feature B with output interface X as a "dependency", and the output of feature B will be automatically passed into feature A. The full demo is structured as a directed graph with features as vertices and edges between a feature and all of its dependencies. This framework allows for easily creating, modifying, and running any multimodal demo which can be organized into modular features.

This repository contains a python package called "mmdemo" that provides a "Demo" class to run a demo according to its dependency graph structure. This package also contains premade features used in our common ground tracking demo and a framework to easily create new features. Another package in this repository is "mmdemo-azure-kinect", which provides features for interacting with Azure Kinect cameras and recordings (only availible on Windows). Finally, we have comprehensive tests to make sure all of the premade features and demo logic works as expected.

Example Usage

Any number of "target" features can be given to the Demo constructor. These targets and their dependencies will be evaluated such that all dependencies of a feature are done evaluating before the feature itself evaluates. The following script will perform common ground tracking using microphone input.

from mmdemo.demo import Demo
from mmdemo.features import ( CommonGroundTracking, Log,
    MicAudio, Move, Proposition, VADUtteranceBuilder, 
    WhisperTranscription )

if __name__ == "__main__":
    mic = MicAudio(device_id=6)
    utterances = VADUtteranceBuilder(mic)
    transcription = WhisperTranscription(utterances)
    props = Proposition(transcription)
    moves = Move(transcription, utterances)
    cgt = CommonGroundTracking(moves, props)

    demo = Demo(targets=[Log(transcription, props, moves, cgt, stdout=True)])
    demo.run()

Dependency graph visualizations can also be generated automatically by calling demo.show_dependency_graph(), which can be useful for making sure the demo is structured correctly. In the example above, this would create the following image.

Setup Instructions

Main package

Python 3.10 or higher is required if using conda because of this unresolved issue. The conda environment can be created with conda env create --file multimodalDemo.yaml.

Install the package with pip install -e . from the root directory of the repo.

Download the following models from here and save at the given locations:

fasterrcnn-7-19-demo-finetuned.pth ==> mmdemo/features/objects/objectDetectionModels/best_model-objects.pth
steroid_model/ ==> mmdemo/features/proposition/data/prop_extraction_model/
production_move_classifier.pt ==> mmdemo/features/move/production_move_classifier.pt

CUDA Installation and Pathing (For Windows)(Linux has not been tested)

WINDOWS OS: Ensure that you have CUDA Toolkit 12.4 or greater installed within Program Files for your architecture: https://developer.nvidia.com/cuda-12-4-0-download-archive?target_os=Windows

After installing, add the file path, C:Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8 (often the dafault installation path) to your system environment variables under Path. Additionally, check that the paths for CUDA_PATH and CUDA_PATH_V12_Xmatch the newly added file path.
If issues with CUDA arise, please check the "Solution for .dll File Errors' sub-section within the 'Common Setup Issues' section of this README file.

Azure Kinect features (optional, only for Windows)

See mmdemo-azure-kinect/README.md.

Hugging Face Setup

Hugging face is required to run the friction model, in order to connect to the hugging face endpoint an account is required and a token must be added to the environment.

The base model for the friction model this llama version, and end license agreement must be accepted on the account used to setup the token: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

After setting up an account and accepting the license agreement login with the token to the multimodalDemo on the device, authenication instructions can be found here

#Log in using a token from huggingface.co/settings/tokens 
huggingface-cli login

If you run into issues with the above command not recognizing/accepting your token, us this command instead: huggingface-cli login --token <TOKEN>

Tarski SSH Setup Commands (Access to LLM)

Install GlobalConnect using your organizational login information.

The server needs to start before the demo.

ssh traceteam@tarski.cs.colostate.edu cd fact_server conda activate frictionEnv WTD:/home/traceteam/anaconda3/envs/frictionEnv/bin/python /home/traceteam/fact_server/friction_server.py

Currently in the works (DPIP: /home/traceteam/anaconda3/envs/frictionEnv/bin/python /home/traceteam/fact_server/dpip_friction_server.py)

In another local terminal, run the demo in the proper environemnt.

Setting up the planner

Install Docker for windows from here. Make sure to run it before running the demo.

In mmdemo/features/planner/planner.py, change the path in check solution from C:\\Users\\benkh\\Documents\\GitHub\\TRACE\\mmdemo\\features\\planner\\benchmarks to the path of the benchmarks folder on your machine.

Multiple Mic Set-Up with VoiceMeeter Potato for Live Use

Ensure the live file you are running has the number of MicAudio objects as microphones being used.

If you do not have VoiceMeeter Potato, you can download Voicemeeter Potato here

This software is free but there is a wait time after opening. This wait time increases the more it is used, maxing out at a 300 second wait.
Purchasing a license is only valid for one PC.

Once downloaded follow the set up instruction below for the live demo you plan to use.

DPIP Live Audio Set-Up

Instructions for dpip_cgt_live.py audio set up.

Virtual Cable for Fourth Microphone

Move on to Voicemeeter Settings subsection if you already have the virtual cable driver installed.

For the this set-up you will need to download an additional VB-Cable which is found here
- After downloading extract the files from the zip folder, the run VBCABLE_Setup_x64.exe and install the driver.
- Restart your computer after intallation process to ensure the cable functions properly.

Voicemeeter Settings

Open the VoiceMeeter Potato Application and set the following settings for each of the listed components:

Stereo Input 1:
- Click "Select Input Device" and set to the first headset microphone.
- Next to the Fader Gain bar, turn on the A2, A3, A4, and B1 buttons for Stereo Input 1. All other buttons should be off.
- Set the Gate dial to 2.7
Stereo Input 2:
- Click "Select Input Device" and set to the second headset microphone.
- Next to the Fader Gain bar, turn on the A1, A3, A4, and B2 buttons for Stereo Input 2. All other buttons should be off.
- Set the Gate dial to 2.7
Stereo Input 3:
- Click "Select Input Device" and set to the third headset microphone.
- Next to the Fader Gain bar, turn on the A1, A2, A4, and B3 buttons for Stereo Input 3. All other buttons should be off.
- Set the Gate dial to 2.7
Stereo Input 4:
- Click "Select Input Device" and set to the fourth headset microphone.
- Next to the Fader Gain bar, turn on the A1, A2, A3, and A5 buttons for Stereo Input 4. All other buttons should be off.
- Set the Gate dial to 2.7
HARDWARE OUT
- A1 : Set to first headset (ensures D1 hears D2, D3, and Builder)
- A2 : Set to second headset (ensures D2 hears D1, D3, and Builder)
- A3 : Set to third headset (ensures D3 hears D1, D2, and Builder)
- A4 : Set to fourth headset (ensures Builder hears D1, D2, and D3)
- A5 : Set to CABLE Input (workaround virtual cable for the 4th microphone)

Once these setting have been set, run TRACE\scripts\print_audio_devices.py to see the device id's of each input. (i.e. 9 is the device I.D for the listed output item9 : Voicemeeter Out B1 (VB-Audio Vo)

Device IDs
- audio1 should use the device ID for Voicemeeter Out B1
- audio2 should use the device ID for Voicemeeter Out B2
- audio3 should use the device ID for Voicemeeter Out B3
- audio4 should use the device ID for CABLE Output

WTD-CGT Live Audio Set-Up

Stereo Input 1:
- Click "Select Input Device" and set to the first headset microphone.
- Turn on the A2, A3, and B1 buttons. All other buttons should be off.
- Set the Gate dial to 2.7
Stereo Input 2:
- Click "Select Input Device" and set to the second headset microphone.
- Turn on the A1, A3, and B2 buttons. All other buttons should be off.
- Set the Gate dial to 2.7
Stereo Input 3:
- Click "Select Input Device" and set to the third headset microphone.
- Turn on the A1, A2, and B3 buttons. All other buttons should be off.
- Set the Gate dial to 2.7
HARDWARE OUT
- A1 : Set to first mic (ensures P1 hears P2 and P3)
- A2 : Set to second mic (ensures P2 hears P1 and P3)
- A3 : Set to third mic (ensures P3 hears P1 and P2)

Common Setup Issues

Solution for .dll File Errors

If you are experiencing errors related to .dll files (specially CUDA dlls i.e cublasLt64_12.dll), you can try the following steps:

Uninstall or Update CUDA:
- CUDA version 12 or later is required by the demo, the latest version of CUDA can be found here
- If you expect that your CUDA version should work and is up to date, begin by uninstalling CUDA from your system. Make sure to remove all associated components.
Reinstall/Update CUDA:
- Install CUDA to the directory C:/Program Files/NVIDIA GPU Computing Toolkit/ (which should be the default, verify that this path exists after installing)
- This path is recommended to avoid potential conflicts with system variables.
Update Environment Variables:
- After reinstalling CUDA, restart your machine and verify that the installation path C:/Program Files/NVIDIA GPU Computing Toolkit/ has been added to your system's environment variables (see PATH CUDA_PATH and CUDA_PATH_V12_6)
If updating/reinstalling CUDA doesn't work, try to reinstall Miniconda/Anaconda:
- Finally, reinstall Miniconda or Anaconda. A fresh installation can resolve conflicts that might arise from previous installations, especially those that affect .dll files.

Solution for NotImplementedError concerning torchvision or Userwarning: 1Torch was not compiled with flash attention

Uninstall Torch and Torchvision: pip uninstall torch torchvision

Got to here to install the proper versions. CUDA 12.4.

Solution for remote agent NotImplementedError: Cannot copy out of metadata tensor error

If starting the agent yields this error, run nvidia-smi and make sure it isn't already running. If it is running, use kill -9 and the PID to end it.

Directory structure

examples -- example demonstrations using different combinations of features. This includes our EMNLP submission demonstration in both live and prerecorded/ablation testing forms.
mmdemo -- the core package in this repo which provides demo logic and premade features.
- features -- a collection of premade features we have used so far.
- interfaces -- interface specifications for features to use as inputs / outputs.
- utils -- helper functions and classes used across multiple features
mmdemo-azure-kinect -- a python wrapper library around the C++ code which interacts with Azure Kinect cameras and playback devices. This provides features which can be used alongside features in the mmdemo package.
- _azure_kinect-stubs -- typing information for the wrapper library
- mmdemo_azure_kinect -- the main module of the wrapper library which provides the features
- src -- the C++ source code of the library
scripts -- scripts for performing auxiliary tasks to the demo
- wtd_annotations -- scripts for processing WTD annotation files into a format which can be used as ground truth information during ablation testing
tests -- all of our tests to make sure the demo and features function correctly
- data -- example data used in our tests
- features -- tests for each premade feature
- utils -- helper functions and classes to make writing tests easier
- wtd_ablation -- tests which make sure the ground truth features work correctly

Development

Environment

After setting up the environment by following the instructions above, run pre-commit install to set up formatters to run automatically on commit. If the conda environment file changes, update the environment by running conda env update --file multimodalDemo.yaml --prune.

Creating new features

Every feature must inherit from BaseFeature[T], where T is an output interface which inherits from BaseInterface. The required methods are documented in mmdemo/base_feature.py. For example, if we wanted to create a feature which takes a color image as input and outputs a predicted depth image, we would do something along the lines of the following:

@final
class DepthPredictor(BaseFeature[DepthImageInterface]):
    def __init__(self, color: BaseFeature[ColorImageInterface]):
        super().__init__(color)

    def initialize(self):
        # Initialize model
        pass

    def get_output(self, color: ColorImageInterface) -> DepthImageInterface | None:
        if not color.is_new():
            return None
        # evaluate model on color.frame
        pred = ...
        return DepthImageInterface(frame=pred, frame_count=color.frame_count)

This feature could now seamlessly be used as a dependency to any feature that requires a depth image as input. See mmdemo/features/ for examples of how existing features are implemented. Also note that a feature should never directly modify any of its input interfaces or dependent features. This breaks the modularity of the program and could cause other features to break in unexpected ways.

Testing

Pytest is used for all of the tests in this project. Tests which require our own machine learning models are marked as "model_dependent" and can be executed with pytest -m "model_dependent". These will likely not all pass. Other tests can be executed with pytest -m "not model_dependent", and these should all pass if there are no bugs. To execute all tests at once, just run pytest.

Contributing

See CONTRIBUTING.md for contribution guidelines.

Feel free to reach out to Hannah VanderHoeven (Hannah.VanderHoeven@colostate.edu) with any questions.

Name		Name	Last commit message	Last commit date
Latest commit History 722 Commits
demo/featureModules/move		demo/featureModules/move
examples		examples
images		images
mmdemo-azure-kinect		mmdemo-azure-kinect
mmdemo		mmdemo
scripts		scripts
tests		tests
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
behavioral_engagement_level_log.csv		behavioral_engagement_level_log.csv
event_log.csv		event_log.csv
multimodalDemo-no-cuda.yaml		multimodalDemo-no-cuda.yaml
multimodalDemo.yaml		multimodalDemo.yaml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRACE

Example Usage

Setup Instructions

Main package

CUDA Installation and Pathing (For Windows)(Linux has not been tested)

Azure Kinect features (optional, only for Windows)

Hugging Face Setup

Tarski SSH Setup Commands (Access to LLM)

Setting up the planner

Multiple Mic Set-Up with VoiceMeeter Potato for Live Use

DPIP Live Audio Set-Up

Virtual Cable for Fourth Microphone

Voicemeeter Settings

WTD-CGT Live Audio Set-Up

Common Setup Issues

Solution for .dll File Errors

Solution for NotImplementedError concerning torchvision or Userwarning: 1Torch was not compiled with flash attention

Solution for remote agent NotImplementedError: Cannot copy out of metadata tensor error

Directory structure

Development

Environment

Creating new features

Testing

Contributing

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TRACE

Example Usage

Setup Instructions

Main package

CUDA Installation and Pathing (For Windows)(Linux has not been tested)

Azure Kinect features (optional, only for Windows)

Hugging Face Setup

Tarski SSH Setup Commands (Access to LLM)

Setting up the planner

Multiple Mic Set-Up with VoiceMeeter Potato for Live Use

DPIP Live Audio Set-Up

Virtual Cable for Fourth Microphone

Voicemeeter Settings

WTD-CGT Live Audio Set-Up

Common Setup Issues

Solution for .dll File Errors

Solution for NotImplementedError concerning torchvision or Userwarning: 1Torch was not compiled with flash attention

Solution for remote agent NotImplementedError: Cannot copy out of metadata tensor error

Directory structure

Development

Environment

Creating new features

Testing

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages