Skip to content

Conversation

@abdulfatir
Copy link
Collaborator

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@shchur shchur self-requested a review October 15, 2025 10:35
import datasets
import numpy as np
import pandas as pd
import timesfm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajatsen91, can you please let me know if this wrapper looks good to you? Based on google-research/timesfm@58e01ad and the latest model checkpoint on HF.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Would it be possible to check the results with the context window for the eval to also be 2048 (to match the other models)? My feeling it is that won't change much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajatsen91, I will merge this PR with the updated results for context_length=16000 + new inference code + updated leakage indicator for ['favorita_transactions_1W', 'm5_1W'].

I will then open another PR with the updated results for TimesFM-2.5 with context length 2048 + corresponding results and let you decide if we should merge it.

model_name: str = "google/timesfm-2.5-200m-pytorch",
batch_size: int = 256,
context_length: int = 16_000,
per_core_batch_size: int = 64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Larger per_core_batch_size values (e.g. 128) lead to OOM errors in combination with context_length=16_000 on A10G GPU (24GB RAM) even on a dataset with a single time series. Should I keep per_core_batch_size=64 or reduce the context length?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current setting (per_core_batch_size=64, context_length=16_000) the median inference time on full fev-bench for TimesFM-2.5 is 16.9s (down from 117s) and the win rate/skill score numbers remain unchanged.

import datasets
import numpy as np
import pandas as pd
import timesfm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Would it be possible to check the results with the context window for the eval to also be 2048 (to match the other models)? My feeling it is that won't change much.

@shchur shchur merged commit 12d1eac into autogluon:main Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants