Skip to content

UConn-DSIS/Multi-modal-Time-Series-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

[KDD 2025] Multi-modal-Time-Series-Analysis

๐ŸŽ‰ News: This survey has been ACCEPTED to the Lecture Style Tutorials Track of KDD 2025 as a HALF-DAY tutorial! ๐ŸŽ‰

This is the official repository for "Multi-modal Time Series Analysis: A Tutorial and Survey". [Paper]

This repository is maintained by Yushan Jiang and Kanghui Ning from UConn DSIS.

Please consider citing our survey paper if you find it helpful :), and feel free to share this repository with others!

Motivation and Contribution:

This survey aims to provide a unique and systematic perspective on effectively leveraging cross-modal interactions from relevant real-world contexts to advance multi-modal time series analysis, addressing both foundational principles and practical solutions. Our assessment is threefold:

  • Reviewing multi-modal time series data
  • Analyzing cross-modal interactions between time series and other modalities (Fusion, Alignment, Transference)
  • Demonstrating revealing the impact of multi-modal time series analysis in applications across diverse domains.
Figure 1: The Framework of Our Survey Figure 2: Categorization of cross-modal interaction methods and representative examples

Representative Open-Source Multi-Modal Time Series Datasets

Domain Dataset Modalities
Healthcare MIMIC-III[1], MIMIC-IV[2] TS, Text, Table
ICBHI[3], Coswara[4], KAUH[5], PTB-XL[6], ZuCo[7] TS, Text
Image-EEG[8] TS, Image
Finance FNSPID[9], ACL18[10], CIKM18[11], DOW30[12] TS, Text
Multi-domain MTBench[13], Time-MMD[14], TimeCAP[15], NewsForecast[16], TTC[17], CiK[18], TSQA[19] TS, Text
Retail VISUELLE[20] TS, Image, Text
IoT LEMMA-RCA[21] TS, Text
Speech LRS3[22], VoxCeleb2[23] TS (Audio), Image
Traffic NYC-taxi, NYC-bike[24] ST, Text
Environment Terra[25] ST, Text

Taxonomy of Representative Multi-Modal Time Series Methods

We define three fundamental types of interactions between time series and other modalities, including Fusion, Alignment, and Transference, which occur at different stages within a framework --- Input, Intermediate (i.e., representations or intermediate outputs), and Output.

  • Fusion refers to the process of integrating heterogeneous modalities in a way that captures complementary information across diverse sources to improve time series modeling.
  • Alignment ensures that the relationships between different modalities are preserved and semantically coherent when integrated into a unified learning framework.
  • Transference refers to the process of mapping between different modalities, which allows one modality to be inferred, translated, or synthesized from another.

Note:

  • F: Fusion; A: Alignment; T: Transference
Method Modality Domain Task Stage F A T Method Large Model
Time-MMD
(NeurIPS 2024)ย Code
TS, Text General Forecasting Output โœ” โœ˜ โœ˜ Addition Multiple
Wang et al.
(NeurIPS 2024)ย Code
TS, Text General Forecasting Input โœ” โœ˜ โœ˜ Prompt LLaMa2, GPT-4 Turbo
Intermediate โœ” โœ” โœ˜ Prompt; LLM reasoning
GPT4MTS
(AAAI 2024)
TS, Text General Forecasting Intermediate โœ” โœ” โœ˜ Addition; Self-attention GPT-2
TimeCMA
(AAAI 2025)ย Code
TS, Text General Forecasting Input โœ˜ โœ˜ โœ” Meta-description GPT-2
Intermediate โœ” โœ” โœ˜ Addition; Cross-attention
MOAT
(2024)
TS, Text General Forecasting Intermediate โœ” โœ” โœ˜ Concat.; Self-attention S-Bert
Output โœ” โœ˜ โœ˜ Offline synthesis
TimeCAP
(AAAI 2025)
TS, Text General Classification Input โœ˜ โœ˜ โœ” LLM Generation Bert, GPT-4
Intermediate โœ” โœ” โœ˜ Concat.; Self-attention, Retrieval
Output โœ” โœ˜ โœ˜ Addition
TimeXL
(NeurIPS 2025)
TS, Text General Classification Intermediate โœ” โœ” โœ˜ Concat., Prompt; LLM Reasoning Bert, S-Bert, GPT-4o
Forecasting Output โœ” โœ˜ โœ˜ Addition
Hybrid-MMF
(2024)ย Code
TS, Text General Forecasting Intermediate โœ” โœ˜ โœ˜ Concat. GPT-4o
Time-LLM
(ICLR 2024)ย Code
TS, Text General Forecasting Input โœ˜ โœ˜ โœ” Meta-description LLaMA, GPT-2
Intermediate โœ” โœ” โœ˜ Concat.; Self-attention
Time-VLM
(2025)
TS, Text, Image General Forecasting Input โœ˜ โœ˜ โœ” Feat. Imaging, Meta-description ViLT, CLIP, BLIP-2
Intermediate โœ” โœ” โœ˜ Addition; Gating, Cross-attention
Unitime
(WWW 2024)
TS, Text General Forecasting Input โœ˜ โœ˜ โœ” Meta-description GPT-2
Intermediate โœ” โœ” โœ˜ Concat.; Self-attention
TESSA
(2024)
TS, Text General Annotation Intermediate โœ” โœ” โœ” Prompt; RL; LLM Generation GPT-4o
InstrucTime
(WSDM 2025)ย Code
TS, Text General Classification Intermediate โœ” โœ” โœ˜ Concat.; Self-attention GPT-2
MATMCD
(2024)
TS, Text, Graph General Causal Discovery Intermediate โœ” โœ” โœ” Prompt; LLM Reasoning; Supervision Multiple
STG-LLM
(2024)
ST, Text General Forecasting Intermediate โœ” โœ” โœ˜ Concat.; Self-attention GPT-2
TableTime
(2024)ย Code
TS, Text General Classification Input โœ” โœ˜ โœ” Prompt; Reformulate Multiple
ContextFormer
(2024)
TS, Table General Forecasting Intermediate โœ” โœ” โœ˜ Addition; Cross-attention No
Time-MQA
(2025)ย Code
TS, Text General Multiple Input โœ” โœ˜ โœ˜ Prompt Multiple
MAN-SF
(EMNLP 2020)
TS, Text, Graph Finance Classification Intermediate โœ” โœ” โœ˜ Bilinear; Graph Convolution USE
Bamford et al.
(ICAIF 2023)
TS, Text Finance Retrieval Intermediate โœ˜ โœ” โœ˜ Supervision S-bert
TS, Image Output โœ˜ โœ˜ โœ”
Chen et al.
(2023)
TS, Text, Graph Finance Classification Input โœ˜ โœ˜ โœ” LLM Generation ChatGPT
Intermediate โœ” โœ” โœ˜ Concat.; Graph Convolution
Xie et al.
(2023)
TS, Text Finance Classification Input โœ” โœ˜ โœ˜ Prompt ChatGPT
Yu et al.
(EMNLP 2023)
TS, Text Finance Forecasting Input โœ” โœ˜ โœ˜ Prompt GPT-4, Open LLaMA
MedTsLLM
(2024)ย Code
TS, Text, Table Healthcare Multiple Intermediate โœ” โœ” โœ˜ Concat.; Self-attention Llama2
RespLLM
(2024)ย Code
TS (Audio), Text Healthcare Classification Intermediate โœ” โœ” โœ˜ Addition, Self-attention OpenBioLLM-8B
METS
(2023)
TS, Text Healthcare Classification Output โœ˜ โœ” โœ˜ Contrastive ClinicalBert
Wang et al.
(AAAI 2022)
TS, Text Healthcare Classification Intermediate โœ˜ โœ˜ โœ” Supervision Bart, Bert, RoBerta
EEG2TEXT
(BigData 2024)
TS, Text Healthcare Generation Output โœ˜ โœ˜ โœ” Self-supervision, Supervision Bart
MEDHMP
(EMNLP 2023)ย Code
TS, Text Healthcare Classification Intermediate โœ” โœ” โœ˜ Concat.; Self-attention, Contrastive ClinicalT5
Deznabi et al.
(ACL 2021)ย Code
TS, Text Healthcare Classification Intermediate โœ” โœ˜ โœ˜ Concat. Bio+Clinical Bert
Niu et al.
(2023)
TS, Text Healthcare Classification Intermediate โœ” โœ” โœ˜ Concat.; Cross-attention BioBERT
Yang et al.
(EMNLP 2021)ย Code
TS, Text Healthcare Classification Intermediate โœ” โœ” โœ˜ Concat., Addition; Gating ClinicalBERT
Liu et al.
(2023)ย Code
TS, Text Healthcare Classification, Regression Input โœ” โœ˜ โœ˜ Prompt PaLM
xTP-LLM
(2024)ย Code
ST, Text Traffic Forecasting Input โœ” โœ˜ โœ” Prompt; Meta-description Llama2-7B-chat
UrbanGPT
(2024)ย Code
ST, Text Traffic Forecasting Input โœ” โœ˜ โœ” Prompt; Meta-description Vicuna-7B
CityGPT
(2024)ย Code
ST, Text Mobility Multiple Input โœ” โœ˜ โœ˜ Prompt Multiple
MULAN
(WWW 2024)
TS, Text, Graph IoT Causal Discovery Intermediate โœ” โœ” โœ” Addition; Contrastive; Supervision No
MIA
(2023)
TS, Image IoT Anomaly Detection Intermediate โœ” โœ” โœ˜ Addition; Cross-attention, Gating No
Ekambaram et al.
(KDD 2020)ย Code
TS, Image, Text Retail Forecasting Intermediate โœ” โœ” โœ˜ Concat.; Self & Cross-attention No
Skenderi et al.
(2024)ย Code
TS, Image, Text Retail Forecasting Intermediate โœ” โœ” โœ˜ Concat.; Cross-attention No
VIMTS
(BigData 2022)
ST, Image Environment Imputation Intermediate โœ” โœ” โœ˜ Concat.; Supervision No
LITE
(2024)ย Code
ST, Text, Image Environment Forecasting Intermediate โœ” โœ” โœ˜ Concat.; Self-attention LLaMA-2-7b
AV-HuBERT
(ICLR 2022)ย Code
TS (Audio), Image Speech Classification Intermediate โœ” โœ” โœ˜ Concat.; Self-attention HuBert
SpeechGPT
(EMNLP 2023)ย Code
TS(Audio), Text Speech Generation Intermediate โœ” โœ” โœ˜ Concat.; Self-attention LLaMA-13B
LA-GCN
(2023)ย Code
ST, Text Vision Classification Intermediate โœ˜ โœ” โœ˜ Supervision Bert

Citation

      title={Multi-modal Time Series Analysis: A Tutorial and Survey}, 
      author={Yushan Jiang and Kanghui Ning and Zijie Pan and Xuyang Shen and Jingchao Ni and Wenchao Yu and Anderson Schneider and Haifeng Chen and Yuriy Nevmyvaka and Dongjin Song},
      year={2025},
      eprint={2503.13709},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.13709}, 
}

About

[KDD 2025] Awesome Multi-modal Time Series Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors