integers

Interpretable time series autoregression for periodicity quantification (ints ⮕ integers). [Slides]

Made by Xinyu Chen • 🌐 https://xinychen.github.io

Which datasets we could provide for experimental evaluation?

🦉 NYC ridesharing dataset (2019-2025) (717 MB, see TLC trip record data)
🦉 NYC yellow taxi dataset (2011-2024) (455 MB, see TLC trip record data)
🦉 Manhattan subway rideship dataset (2024) (1.1 MB, see MTA subway hourly ridership: 2020-2024)
🦉 Manhattan bikesharing dataset (2024) (1.7 MB, see Citi bike system data - NYC)
🦉 Chicago ridesharing dataset (2018-2024) (92 MB, see Transportation Network Providers (TNP) - Trips (2018 - 2022))
🦉 Hangzhou metro passenger flow dataset (2019) (4.7 MB, see Hangzhou metro passenger data - 2019)
🦉 North America climate variable dataset (1980-2019) (3.0 GB, see Daymet)
🦉 Sea surface temperature dataset (1980-2019) (1.3 GB, see Sea surface temperature optimum interpolation)
🦉 Wikipedia page view dataset (January 2024) (4.7 GB, see Analytics datasets: Pageviews)

These mobility and climate datasets are formatted as multidimensional tensors and saved as NumPy arrays in the compressed form, i.e., .npz.

Figure 1. Conceptual overview of the diverse open datasets for periodicity quantification.

In urban systems, how to align the mobility datasets of different travel modes (e.g., ridesharing, taxi, subway, and bikesharing) with the same spatial resolution? For instance, Manhattan has hundreds of subway and bikesharing stations and 69 taxi areas, one can first project subway and bikesharing stations onto taxi areas and then aggregate trip counts.

Figure 2. (A) Subway stations are projected onto 52 areas in Manhattan. (B) Bikesharing stations are projected onto 67 areas in Manhattan.

Interactive Visualization Tool

What is the time series periodicity? How to get started the modeling process with machine learning and optimization? One of the most intuitive ways might be anotating the time series periodicity on the interactive visualization tool.

Figure 3. Anotating the time series periodicity of hourly ridesharing trip time series in Chicago since April 1, 2024.

While human mobility exhibits clear regularity in hourly, daily, and weekly cycles, the greatest challenge lies in accurately modeling these patterns. In addition, as shown in Figure 4, Wikipedia page view time series also demonstrate periodic patterns across multiple cycles.

Figure 4. Hourly time series of number of views on the 3-million Wikipedia page data in January 2024. These views are up to 72% of the total Wikipedia page views.

Sparse Autoregression Explained

i) Statement

This work claims the practical contribution in the following ways:

The classical autoregression can capture auto-correlations, but we do not know which are the dominant auto-correaltions.
The sparse autoregression can limit the number of nonzero auto-correlations by imposing a sparsity level, allowing one to identify the dominant auto-correlations (e.g., time series periodicity).

Figure 5. Identification of the dominant auto-correlations from time series through sparse autoregression. The sparsity constraint allows one to find the dominant auto-correlated time lags (e.g., ).

ii) Sample Time Series

The sample time series of as shown in Figures 3 and 5 is available at Chicago-ridesharing/rideshare_ts.txt.

import pandas as pd
import numpy as np

data = pd.read_csv('rideshare_ts.txt', sep = ' ', header = None, index_col = 0, names = ['trip_count'])

One can draw the two-week time series as follows,

import matplotlib.pyplot as plt

fig = plt.figure(figsize = (6, 1.4))
ax = fig.add_subplot(1, 1, 1)
plt.plot(data['trip_count'].values[: 2 * 7 * 24], color = 'purple', alpha = 0.75, linewidth = 2)
plt.xticks(np.arange(0, 24 * 7 * 3 + 1, 7 * 24))
plt.xlabel('Time (hour)')
plt.ylabel('Trip count')
plt.grid(axis = 'both', linestyle='dashed', linewidth = 0.1, color = 'gray')
ax.tick_params(direction = 'in')
ax.set_xlim([-1, 24 * 7 * 2])
plt.show()

iii) Mixed-Integer Optimization for Sparse Autoregression

We use cplex as the mixed-integer optimization solver in our Python implementation. The setting of sparse autoregression includes d (order) and tau (sparsity level). The optimization problem of sparse autoregression can be reformulated as follows,

by introducing binary decision variables.

import numpy as np
from docplex.mp.model import Model

def obj(x, w, d):
    T = x.shape[0]
    loss = 0
    for t in range(d, T):
        loss += (x[t] - np.inner(w, np.flip(x[t - d : t]))) ** 2
    return loss

def sparse_ar(x, d, tau):
    model = Model()
    alpha = 1
    T = x.shape[0]
    w = [model.continuous_var(name = f'w_{k}') for k in range(d)]
    beta = [model.binary_var(name = f'beta_{k}') for k in range(d)]
    model.minimize(model.sum((x[t] - model.sum(w[k] * x[t - k - 1] for k in range(d))) ** 2 for t in range(d, T)))
    model.add_constraint(model.sum(beta[k] for k in range(d)) <= tau)
    for k in range(d):
        model.add_constraint(w[k] <= alpha * beta[k])
        model.add_constraint(w[k] >= - alpha * beta[k])
    solution = model.solve()
    return np.array(solution.get_values(w))

On the sample time series as mentioned above, please reproduce our results by running the following codes:

import numpy as np

d = 168
for tau in range(1, 7):
    w = sparse_ar(data['trip_count'].values[: 2 * 7 * 24], d, tau)
    print('tau = {}'.format(tau))
    print('Objective function f = {}'.format(obj(x, w, d)))
    ind = np.where(w != 0)[0].tolist()
    print('Support set: {}'.format(ind))
    print('Nonzero coefficients: {}'.format(w[ind]))
    print()

Here, the result at the sparsity level tau = 6 is given by

tau = 6
Objective function f = 50844056.30946854
Support set: [0, 22, 23, 33, 166, 167]
Nonzero coefficients: [0.29769501 0.00173922 0.03533629 0.00832573 0.16595001 0.48356377]

References

Xinyu Chen, Vassilis Digalakis Jr, Lijun Ding, Dingyi Zhuang, Jinhua Zhao (2025). Interpretable time series autoregression for periodicity quantification. arXiv preprint arXiv:2506.22895.
Xinyu Chen, Qi Wang, Yunhan Zheng, Nina Cao, HanQin Cai, Jinhua Zhao (2025). Data-driven discovery of mobility periodicity for understanding urban systems. arXiv preprint arXiv:2508.03747.

Support

For any questions and feedback, please contact Dr. Xinyu Chen (chenxy346@gmail.com).
If you like this repository, share it with your friends and colleagues.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
Chicago-ridesharing		Chicago-ridesharing
Chicago_areas		Chicago_areas
Hangzhou-metro		Hangzhou-metro
Manhattan-subway		Manhattan-subway
NYC-micromobility		NYC-micromobility
NYC-ridesharing		NYC-ridesharing
NYC-yellow-taxi		NYC-yellow-taxi
NYC_zones		NYC_zones
Wikipedia-pageview		Wikipedia-pageview
graphics		graphics
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

integers

Made by Xinyu Chen • 🌐 https://xinychen.github.io

Interactive Visualization Tool

Sparse Autoregression Explained

i) Statement

ii) Sample Time Series

iii) Mixed-Integer Optimization for Sparse Autoregression

References

Support

About

Uh oh!

Releases

Packages

Languages

License

xinychen/integers

Folders and files

Latest commit

History

Repository files navigation

integers

Made by Xinyu Chen • 🌐 https://xinychen.github.io

Interactive Visualization Tool

Sparse Autoregression Explained

i) Statement

ii) Sample Time Series

iii) Mixed-Integer Optimization for Sparse Autoregression

References

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages