[KDD 2025] Multi-modal-Time-Series-Analysis

🎉 News: This survey has been ACCEPTED to the Lecture Style Tutorials Track of KDD 2025 as a HALF-DAY tutorial! 🎉

[时序人中文解读] [圆圆的算法笔记中文解读] [深度图学习与大模型LLM中文解读] [QuantML中文解读]

This is the official repository for "Multi-modal Time Series Analysis: A Tutorial and Survey". [Paper]

This repository is maintained by Yushan Jiang and Kanghui Ning from UConn DSIS.

Please consider citing our survey paper if you find it helpful :), and feel free to share this repository with others!

Motivation and Contribution:

This survey aims to provide a unique and systematic perspective on effectively leveraging cross-modal interactions from relevant real-world contexts to advance multi-modal time series analysis, addressing both foundational principles and practical solutions. Our assessment is threefold:

Reviewing multi-modal time series data
Analyzing cross-modal interactions between time series and other modalities (Fusion, Alignment, Transference)
Demonstrating revealing the impact of multi-modal time series analysis in applications across diverse domains.


Figure 1: The Framework of Our Survey	Figure 2: Categorization of cross-modal interaction methods and representative examples

Representative Open-Source Multi-Modal Time Series Datasets

Domain	Dataset	Modalities
Healthcare	MIMIC-III^[1], MIMIC-IV^[2]	TS, Text, Table
	ICBHI^[3], Coswara^[4], KAUH^[5], PTB-XL^[6], ZuCo^[7]	TS, Text
	Image-EEG^[8]	TS, Image
Finance	FNSPID^[9], ACL18^[10], CIKM18^[11], DOW30^[12]	TS, Text
Multi-domain	MTBench^[13], Time-MMD^[14], TimeCAP^[15], NewsForecast^[16], TTC^[17], CiK^[18], TSQA^[19]	TS, Text
Retail	VISUELLE^[20]	TS, Image, Text
IoT	LEMMA-RCA^[21]	TS, Text
Speech	LRS3^[22], VoxCeleb2^[23]	TS (Audio), Image
Traffic	NYC-taxi, NYC-bike^[24]	ST, Text
Environment	Terra^[25]	ST, Text

Taxonomy of Representative Multi-Modal Time Series Methods

We define three fundamental types of interactions between time series and other modalities, including Fusion, Alignment, and Transference, which occur at different stages within a framework --- Input, Intermediate (i.e., representations or intermediate outputs), and Output.

Fusion refers to the process of integrating heterogeneous modalities in a way that captures complementary information across diverse sources to improve time series modeling.
Alignment ensures that the relationships between different modalities are preserved and semantically coherent when integrated into a unified learning framework.
Transference refers to the process of mapping between different modalities, which allows one modality to be inferred, translated, or synthesized from another.

Note:

F: Fusion; A: Alignment; T: Transference

_Method	_Modality	_Domain	_Task	_Stage	_F	_A	_T	_Method	_{Large Model}
_Time-MMD ^{_{(NeurIPS 2024)}} ^_Code	_{TS, Text}	_General	_Forecasting	_Output	_✔	_✘	_✘	_Addition	_Multiple
_{Wang et al.} ^{_{(NeurIPS 2024)}} ^_Code	_{TS, Text}	_General	_Forecasting	_Input	_✔	_✘	_✘	_Prompt	_{LLaMa2, GPT-4 Turbo}
_{Wang et al.} ^{_{(NeurIPS 2024)}} ^_Code	_{TS, Text}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Prompt; LLM reasoning}	_{LLaMa2, GPT-4 Turbo}
_GPT4MTS ^{_{(AAAI 2024)}}	_{TS, Text}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Addition; Self-attention}	_GPT-2
_TimeCMA ^{_{(AAAI 2025)}} ^_Code	_{TS, Text}	_General	_Forecasting	_Input	_✘	_✘	_✔	_{Meta-description}	_GPT-2
_TimeCMA ^{_{(AAAI 2025)}} ^_Code	_{TS, Text}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Addition; Cross-attention}	_GPT-2
_MOAT ^{₍₂₀₂₄₎}	_{TS, Text}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_S-Bert
_MOAT ^{₍₂₀₂₄₎}	_{TS, Text}	_General	_Forecasting	_Output	_✔	_✘	_✘	_{Offline synthesis}	_S-Bert
_TimeCAP ^{_{(AAAI 2025)}}	_{TS, Text}	_General	_{Classification}	_Input	_✘	_✘	_✔	_{LLM Generation}	_{Bert, GPT-4}
				_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention, Retrieval}
				_Output	_✔	_✘	_✘	_Addition
_TimeXL ^{_{(NeurIPS 2025)}}	_{TS, Text}	_General	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Concat., Prompt; LLM Reasoning}	_{Bert, S-Bert, GPT-4o}
_TimeXL ^{_{(NeurIPS 2025)}}	_{TS, Text}	_General	_Forecasting	_Output	_✔	_✘	_✘	_Addition	_{Bert, S-Bert, GPT-4o}
_Hybrid-MMF ^{₍₂₀₂₄₎} ^_Code	_{TS, Text}	_General	_Forecasting	_Intermediate	_✔	_✘	_✘	_Concat.	_GPT-4o
_Time-LLM ^{_{(ICLR 2024)}} ^_Code	_{TS, Text}	_General	_Forecasting	_Input	_✘	_✘	_✔	_{Meta-description}	_{LLaMA, GPT-2}
_Time-LLM ^{_{(ICLR 2024)}} ^_Code	_{TS, Text}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_{LLaMA, GPT-2}
_Time-VLM ^{₍₂₀₂₅₎}	_{TS, Text, Image}	_General	_Forecasting	_Input	_✘	_✘	_✔	_{Feat. Imaging, Meta-description}	_{ViLT, CLIP, BLIP-2}
_Time-VLM ^{₍₂₀₂₅₎}	_{TS, Text, Image}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Addition; Gating, Cross-attention}	_{ViLT, CLIP, BLIP-2}
_Unitime ^{_{(WWW 2024)}}	_{TS, Text}	_General	_Forecasting	_Input	_✘	_✘	_✔	_{Meta-description}	_GPT-2
_Unitime ^{_{(WWW 2024)}}	_{TS, Text}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_GPT-2
_TESSA ^{₍₂₀₂₄₎}	_{TS, Text}	_General	_Annotation	_Intermediate	_✔	_✔	_✔	_{Prompt; RL; LLM Generation}	_GPT-4o
_InstrucTime ^{_{(WSDM 2025)}} ^_Code	_{TS, Text}	_General	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_GPT-2
_MATMCD ^{₍₂₀₂₄₎}	_{TS, Text, Graph}	_General	_{Causal Discovery}	_Intermediate	_✔	_✔	_✔	_{Prompt; LLM Reasoning; Supervision}	_Multiple
_STG-LLM ^{₍₂₀₂₄₎}	_{ST, Text}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_GPT-2
_TableTime ^{₍₂₀₂₄₎} ^_Code	_{TS, Text}	_General	_{Classification}	_Input	_✔	_✘	_✔	_{Prompt; Reformulate}	_Multiple
_{ContextFormer} ^{₍₂₀₂₄₎}	_{TS, Table}	_General	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Addition; Cross-attention}	_No
_Time-MQA ^{₍₂₀₂₅₎} ^_Code	_{TS, Text}	_General	_Multiple	_Input	_✔	_✘	_✘	_Prompt	_Multiple
_MAN-SF ^{_{(EMNLP 2020)}}	_{TS, Text, Graph}	_Finance	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Bilinear; Graph Convolution}	_USE
_{Bamford et al.} ^{_{(ICAIF 2023)}}	_{TS, Text}	_Finance	_Retrieval	_Intermediate	_✘	_✔	_✘	_Supervision	_S-bert
_{Bamford et al.} ^{_{(ICAIF 2023)}}	_{TS, Image}	_Finance	_Retrieval	_Output	_✘	_✘	_✔	_Supervision	_S-bert
_{Chen et al.} ^{₍₂₀₂₃₎}	_{TS, Text, Graph}	_Finance	_{Classification}	_Input	_✘	_✘	_✔	_{LLM Generation}	_ChatGPT
_{Chen et al.} ^{₍₂₀₂₃₎}	_{TS, Text, Graph}	_Finance	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Concat.; Graph Convolution}	_ChatGPT
_{Xie et al.} ^{₍₂₀₂₃₎}	_{TS, Text}	_Finance	_{Classification}	_Input	_✔	_✘	_✘	_Prompt	_ChatGPT
_{Yu et al.} ^{_{(EMNLP 2023)}}	_{TS, Text}	_Finance	_Forecasting	_Input	_✔	_✘	_✘	_Prompt	_{GPT-4, Open LLaMA}
_MedTsLLM ^{₍₂₀₂₄₎} ^_Code	_{TS, Text, Table}	_Healthcare	_Multiple	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_Llama2
_RespLLM ^{₍₂₀₂₄₎} ^_Code	_{TS (Audio), Text}	_Healthcare	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Addition, Self-attention}	_{OpenBioLLM-8B}
_METS ^{₍₂₀₂₃₎}	_{TS, Text}	_Healthcare	_{Classification}	_Output	_✘	_✔	_✘	_Contrastive	_ClinicalBert
_{Wang et al.} ^{_{(AAAI 2022)}}	_{TS, Text}	_Healthcare	_{Classification}	_Intermediate	_✘	_✘	_✔	_Supervision	_{Bart, Bert, RoBerta}
_EEG2TEXT ^{_{(BigData 2024)}}	_{TS, Text}	_Healthcare	_Generation	_Output	_✘	_✘	_✔	_{Self-supervision, Supervision}	_Bart
_MEDHMP ^{_{(EMNLP 2023)}} ^_Code	_{TS, Text}	_Healthcare	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention, Contrastive}	_ClinicalT5
_{Deznabi et al.} ^{_{(ACL 2021)}} ^_Code	_{TS, Text}	_Healthcare	_{Classification}	_Intermediate	_✔	_✘	_✘	_Concat.	_{Bio+Clinical Bert}
_{Niu et al.} ^{₍₂₀₂₃₎}	_{TS, Text}	_Healthcare	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Concat.; Cross-attention}	_BioBERT
_{Yang et al.} ^{_{(EMNLP 2021)}} ^_Code	_{TS, Text}	_Healthcare	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Concat., Addition; Gating}	_ClinicalBERT
_{Liu et al.} ^{₍₂₀₂₃₎} ^_Code	_{TS, Text}	_Healthcare	_{Classification, Regression}	_Input	_✔	_✘	_✘	_Prompt	_PaLM
_xTP-LLM ^{₍₂₀₂₄₎} ^_Code	_{ST, Text}	_Traffic	_Forecasting	_Input	_✔	_✘	_✔	_{Prompt; Meta-description}	_{Llama2-7B-chat}
_UrbanGPT ^{₍₂₀₂₄₎} ^_Code	_{ST, Text}	_Traffic	_Forecasting	_Input	_✔	_✘	_✔	_{Prompt; Meta-description}	_Vicuna-7B
_CityGPT ^{₍₂₀₂₄₎} ^_Code	_{ST, Text}	_Mobility	_Multiple	_Input	_✔	_✘	_✘	_Prompt	_Multiple
_MULAN ^{_{(WWW 2024)}}	_{TS, Text, Graph}	_IoT	_{Causal Discovery}	_Intermediate	_✔	_✔	_✔	_{Addition; Contrastive; Supervision}	_No
_MIA ^{₍₂₀₂₃₎}	_{TS, Image}	_IoT	_{Anomaly Detection}	_Intermediate	_✔	_✔	_✘	_{Addition; Cross-attention, Gating}	_No
_{Ekambaram et al.} ^{_{(KDD 2020)}} ^_Code	_{TS, Image, Text}	_Retail	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Concat.; Self & Cross-attention}	_No
_{Skenderi et al.} ^{₍₂₀₂₄₎} ^_Code	_{TS, Image, Text}	_Retail	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Concat.; Cross-attention}	_No
_VIMTS ^{_{(BigData 2022)}}	_{ST, Image}	_Environment	_Imputation	_Intermediate	_✔	_✔	_✘	_{Concat.; Supervision}	_No
_LITE ^{₍₂₀₂₄₎} ^_Code	_{ST, Text, Image}	_Environment	_Forecasting	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_LLaMA-2-7b
_AV-HuBERT ^{_{(ICLR 2022)}} ^_Code	_{TS (Audio), Image}	_Speech	_{Classification}	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_HuBert
_SpeechGPT ^{_{(EMNLP 2023)}} ^_Code	_{TS(Audio), Text}	_Speech	_Generation	_Intermediate	_✔	_✔	_✘	_{Concat.; Self-attention}	_LLaMA-13B
_LA-GCN ^{₍₂₀₂₃₎} ^_Code	_{ST, Text}	_Vision	_{Classification}	_Intermediate	_✘	_✔	_✘	_Supervision	_Bert

Citation

      title={Multi-modal Time Series Analysis: A Tutorial and Survey}, 
      author={Yushan Jiang and Kanghui Ning and Zijie Pan and Xuyang Shen and Jingchao Ni and Wenchao Yu and Anderson Schneider and Haifeng Chen and Yuriy Nevmyvaka and Dongjin Song},
      year={2025},
      eprint={2503.13709},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.13709}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
Survey-Example.png		Survey-Example.png
Survey-Introduction.png		Survey-Introduction.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[KDD 2025] Multi-modal-Time-Series-Analysis

Motivation and Contribution:

Representative Open-Source Multi-Modal Time Series Datasets

Taxonomy of Representative Multi-Modal Time Series Methods

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

[KDD 2025] Multi-modal-Time-Series-Analysis

Motivation and Contribution:

Representative Open-Source Multi-Modal Time Series Datasets

Taxonomy of Representative Multi-Modal Time Series Methods

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages