HoWDe

HoWDe (Home and Work Detection) is a Python package designed to identify home and work locations from individual timestamped sequences of stop locations. It processes stop location data to label each location as 'Home', 'Work', or 'None' based on user-defined parameters and heuristics.

A complete description of the algorithm can be found in our pre-print

Features

Processes stop location datasets to detect home and work locations.
Allows customization through various parameters to fine-tune detection heuristics.
Supports batch processing with multiple parameter configurations.
Outputs results as a PySpark DataFrame for seamless integration with big data workflows.

Installation

HoWDe requires Python 3.6 or later and a functional PySpark environment.

1. Install PySpark

Before installing HoWDe, ensure PySpark and Java are properly configured. For detailed setup instructions, please refer to the official PySpark Installation Guidelines

Installation Note:
PySpark may raise Py4JJavaError if Java or Spark are not properly configured. We recommend checking the Debugging PySpark and Py4JJavaError Guidelines

Compatibility Note:
Once PySpark/Java is correctly configured, HoWDe runs consistently across macOS, Ubuntu, and Windows. The following environments have been tested:

Python 3.9 + PySpark 3.3 + Java 20.0

Python 3.12 + PySpark 4.0 + Java 17.0

2. Install HoWDe

Once PySpark is installed and configured, you can install HoWDe via pip:

pip install HoWDe

Usage

The core function of the HoWDe package is HoWDe_labelling, which performs the detection of home and work locations.

def HoWDe_labelling(
    input_data,
    edit_config_default=None,
    range_window_home=28,
    range_window_work=42,
    C_hours=0.4,
    C_days_H=0.4,
    C_days_W=0.5,
    f_hours_H=0.7,
    f_hours_W=0.4,
    f_days_W=0.6,
    output_format="stop",
    verbose=False,
):
    """
    Perform Home and Work Detection (HoWDe)
    """

📥 Input Data

HoWDe expects the input to be a PySpark DataFrame containing one row per user stop, with the following columns:

Column	Type	Description
`useruuid`	str or int	Unique user identifier.
`loc`	str or int	Stop location ID (unique per `useruuid`). ⚠️ Avoid using `-1` to label meaningful stops, as these are dropped following the Infostop convention.
`start`	long	Start time of the stop (Unix timestamp).
`end`	long	End time of the stop (Unix timestamp).
`tz_hour_start`, `tz_minute_start`	int	Optional. Time zone offsets (hours and minutes) used to convert UTC timestamps to local time, if applicable.
`country`	int	Optional. Country code; if not provided, a default `"GL0B"` label is assigned.

Example

+---------+-----+-------------+-------------+---------------+----------------+---------+
| useruuid| loc | start       | end         | tz_hour_start | tz_minute_start| country |
+---------+-----+-------------+-------------+---------------+----------------+---------+
| 1001    |  1 | 1704031200  | 1704034800  | 1             | 0              | DK      |
| 1001    |  2 | 1704056400  | 1704060000  | 1             | 0              | DK      |
+---------+-----+-------------+-------------+---------------+----------------+---------+

💡 Scalability Tip: This package involves heavy computations (e.g., window functions, UDFs). To ensure efficient parallel processing, use df.repartition("useruuid") to distribute data across partitions evenly. This reduces memory bottlenecks and improves resource utilization.

⚙️ Key Parameters

Parameter	Type	Description	Suggested value and range
`range_window_home`	int or list	Sliding window size (in days) used to detect home locations.	28 [14-112]
`range_window_work`	int or list	Sliding window size (in days) used to detect work locations.	42 [14-112]
`C_hours`	float or list	Minimum fraction of night/business hourly-bins with data in a day	0.4 [0.2-0.9]
`C_days_H`	float or list	Minimum fraction of days with data in a window	0.4 [0.1-0.6]
`C_days_W`	float or list	Minimum fraction of days with data in a window	0.5 [0.4-0.6]
`f_hours_H`	float or list	Minimum average fraction of night hourly-bins (across days in the window) required for a location to qualify as Home.	0.7 [0.5-0.9]
`f_hours_W`	float or list	Minimum average fraction of business hourly-bins (across days in the window) required for a location to qualify as Work.	0.4 [0.4-0.6]
`f_days_W`	float or list	Minimum fraction of days within the window a location should be visited to qualify as Work.	0.6 [0.5-0.8]

All parameters listed above can also be provided as lists to explore multiple configurations in a single run.

💡 Tuning Tip: When adjusting detection parameters, start by refining the temporal coverage filters C_days_H, C_days_W to match the characteristics of your data. Once these are well aligned, tune the estimation thresholds f_hours_H, f_hours_W, f_days_W based on the case of study according to the specifics of your case study. These estimation thresholds play a major role in determining how strictly the algorithm identifies consistent home and work locations.

While we provide recommended parameter ranges to guide your exploration, the hard-coded limits in howde/config.py are intentionally more relaxed—they simply prevent non-sensical values. Inputs falling outside these hard limits will raise an error.

🔧 Other Parameters

edit_config_default (dict, optional): Optional dictionary that allows overriding the default settings in howde/config.py to fine-tune preprocessing and detection behavior.
The dictionary should include parameters:
- is_time_local — interpret timestamps as local time (True) or UTC (False)
- min_stop_t — minimum stop duration (seconds)
- start_hour_day, end_hour_day — hours used for home detection
- start_hour_work, end_hour_work — hours used for work detection
- data_for_predict — use only past data for estimation
stops_output (bool): If stop, returns stop-level data with location_type and one row per stop. If change, returns a compact DataFrame with only one row per day with home/work location changes.
verbose (bool): If True, reports processing steps.

📤 Returns

If a single parameter configuration is used, the function returns a PySpark DataFrame with three additional columns:

detect_H_loc The location ID (loc) identified as Home. Assigned if the location satisfies all filtering criteria. As such, it represents a day-level assessment, taking into account observations within a sliding window of t ± range_window_home / 2 days.
detect_W_loc The location ID (loc) identified as Work. Assigned if the location satisfies all filtering criteria. As such, it represents a day-level assessment, taking into account observations within a sliding window of t ± range_window_work / 2 days.
location_type Indicates the detected location type for each stop ('H' for Home, 'W' for Work, or 'O' for Other), based on matching the stop location to the inferred home/work labels.

If multiple parameter configurations are provided (as lists), the function returns a list of dictionaries, each with keys:

configs: including the configuration used
res: including the resulting labeled PySpark DataFrame (as described above)

Example Usage

from pyspark.sql import SparkSession
from howde import HoWDe_labelling

# Initialize Spark session
spark = SparkSession.builder.appName('HoWDeApp').getOrCreate()

# Load your stop location data
input_data = spark.read.parquet('path_to_your_data.parquet')

# Run HoWDe labelling
labeled_data = HoWDe_labelling(
    input_data,
    range_window_home=28,
    range_window_work=42,
    C_hours=0.4,
    C_days_H=0.4,
    C_days_W=0.5,
    f_hours_H=0.7,
    f_hours_W=0.4,
    f_days_W=0.6,
    output_format="stop",
    verbose=False,
)

# Show the results
labeled_data.show()

See more examples at /tutorials

Data

Anonymized stop location data with true home and work labels will be available at:

De Sojo Caso, Silvia; Lucchini, Lorenzo; Alessandretti, Laura (2025). Benchmark datasets for home and work location detection: stop sequences and annotated labels. Technical University of Denmark. Dataset. https://doi.org/10.11583/DTU.28846325

License

This project is licensed under the MIT License. See the License file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
howde		howde
tests		tests
tutorials		tutorials
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HoWDe

Features

Installation

Usage

📥 Input Data

Example

⚙️ Key Parameters

🔧 Other Parameters

📤 Returns

Example Usage

Data

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HoWDe

Features

Installation

Usage

📥 Input Data

Example

⚙️ Key Parameters

🔧 Other Parameters

📤 Returns

Example Usage

Data

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages