Skip to content

xinychen/integers

Repository files navigation

integers

Interpretable time series autoregression for periodicity quantification (ints โฎ• integers). [Slides]

MIT License repo size GitHub stars

Made by Xinyu Chen โ€ข ๐ŸŒ https://xinychen.github.io

logo

Which datasets we could provide for experimental evaluation?

These mobility and climate datasets are formatted as multidimensional tensors and saved as NumPy arrays in the compressed form, i.e., .npz.


Figure 1. Conceptual overview of the diverse open datasets for periodicity quantification.


In urban systems, how to align the mobility datasets of different travel modes (e.g., ridesharing, taxi, subway, and bikesharing) with the same spatial resolution? For instance, Manhattan has hundreds of subway and bikesharing stations and 69 taxi areas, one can first project subway and bikesharing stations onto taxi areas and then aggregate trip counts.


Figure 2. (A) Subway stations are projected onto 52 areas in Manhattan. (B) Bikesharing stations are projected onto 67 areas in Manhattan.


Interactive Visualization Tool

What is the time series periodicity? How to get started the modeling process with machine learning and optimization? One of the most intuitive ways might be anotating the time series periodicity on the interactive visualization tool.


Figure 3. Anotating the time series periodicity of hourly ridesharing trip time series in Chicago since April 1, 2024.


While human mobility exhibits clear regularity in hourly, daily, and weekly cycles, the greatest challenge lies in accurately modeling these patterns. In addition, as shown in Figure 4, Wikipedia page view time series also demonstrate periodic patterns across multiple cycles.


Figure 4. Hourly time series of number of views on the 3-million Wikipedia page data in January 2024. These views are up to 72% of the total Wikipedia page views.


Sparse Autoregression Explained

i) Statement

This work claims the practical contribution in the following ways:

  • The classical autoregression can capture auto-correlations, but we do not know which are the dominant auto-correaltions.
  • The sparse autoregression can limit the number of nonzero auto-correlations by imposing a sparsity level, allowing one to identify the dominant auto-correlations (e.g., time series periodicity).

Figure 5. Identification of the dominant auto-correlations from time series through sparse autoregression. The sparsity constraint allows one to find the dominant auto-correlated time lags (e.g., ).


ii) Sample Time Series

The sample time series of as shown in Figures 3 and 5 is available at Chicago-ridesharing/rideshare_ts.txt.

import pandas as pd
import numpy as np

data = pd.read_csv('rideshare_ts.txt', sep = ' ', header = None, index_col = 0, names = ['trip_count'])

One can draw the two-week time series as follows,

import matplotlib.pyplot as plt

fig = plt.figure(figsize = (6, 1.4))
ax = fig.add_subplot(1, 1, 1)
plt.plot(data['trip_count'].values[: 2 * 7 * 24], color = 'purple', alpha = 0.75, linewidth = 2)
plt.xticks(np.arange(0, 24 * 7 * 3 + 1, 7 * 24))
plt.xlabel('Time (hour)')
plt.ylabel('Trip count')
plt.grid(axis = 'both', linestyle='dashed', linewidth = 0.1, color = 'gray')
ax.tick_params(direction = 'in')
ax.set_xlim([-1, 24 * 7 * 2])
plt.show()

iii) Mixed-Integer Optimization for Sparse Autoregression

We use cplex as the mixed-integer optimization solver in our Python implementation. The setting of sparse autoregression includes d (order) and tau (sparsity level). The optimization problem of sparse autoregression can be reformulated as follows,

by introducing binary decision variables.


import numpy as np
from docplex.mp.model import Model

def obj(x, w, d):
    T = x.shape[0]
    loss = 0
    for t in range(d, T):
        loss += (x[t] - np.inner(w, np.flip(x[t - d : t]))) ** 2
    return loss

def sparse_ar(x, d, tau):
    model = Model()
    alpha = 1
    T = x.shape[0]
    w = [model.continuous_var(name = f'w_{k}') for k in range(d)]
    beta = [model.binary_var(name = f'beta_{k}') for k in range(d)]
    model.minimize(model.sum((x[t] - model.sum(w[k] * x[t - k - 1] for k in range(d))) ** 2 for t in range(d, T)))
    model.add_constraint(model.sum(beta[k] for k in range(d)) <= tau)
    for k in range(d):
        model.add_constraint(w[k] <= alpha * beta[k])
        model.add_constraint(w[k] >= - alpha * beta[k])
    solution = model.solve()
    return np.array(solution.get_values(w))

On the sample time series as mentioned above, please reproduce our results by running the following codes:

import numpy as np

d = 168
for tau in range(1, 7):
    w = sparse_ar(data['trip_count'].values[: 2 * 7 * 24], d, tau)
    print('tau = {}'.format(tau))
    print('Objective function f = {}'.format(obj(x, w, d)))
    ind = np.where(w != 0)[0].tolist()
    print('Support set: {}'.format(ind))
    print('Nonzero coefficients: {}'.format(w[ind]))
    print()

Here, the result at the sparsity level tau = 6 is given by

tau = 6
Objective function f = 50844056.30946854
Support set: [0, 22, 23, 33, 166, 167]
Nonzero coefficients: [0.29769501 0.00173922 0.03533629 0.00832573 0.16595001 0.48356377]

References

Support

  • For any questions and feedback, please contact Dr. Xinyu Chen (chenxy346@gmail.com).
  • If you like this repository, share it with your friends and colleagues.

About

Interpretable time series autoregression for periodicity quantification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published