A robust Node.js/TypeScript ETL (Extract, Transform, Load) tool that synchronizes visitor analytics data from Matomo (formerly Piwik) into a PostgreSQL database. Designed for organizations that need to centralize their web analytics data for advanced analysis, reporting, or integration with other systems.
- 🔄 Incremental Synchronization - Smart date range detection with automatic resume capability
- 📊 Complete Data Extraction - Captures visitor sessions, events, custom dimensions, and device information
- 🗄️ Automatic Schema Management - Kysely-based migrations with performance optimizations
- ⚡ High Performance - Controlled concurrency, pagination, and weekly table partitioning
- 🛡️ Type Safety - Full TypeScript implementation with comprehensive type definitions
- 🔍 Detailed Logging - Progress tracking and debug information for monitoring
- 📱 Device Analytics - Screen resolution, device model, and operating system data
- 🌍 Geographic Data - Country, region, and city information from visitor sessions
npx @socialgouv/matomo-postgresnpm install @socialgouv/matomo-postgres
# or
pnpm add @socialgouv/matomo-postgres| Variable | Description | Example |
|---|---|---|
MATOMO_KEY |
Matomo API authentication token | your_api_token_here |
MATOMO_SITE |
Numeric site ID in Matomo | 1 |
MATOMO_URL |
Base URL of your Matomo installation | https://analytics.example.com/ |
PGDATABASE |
PostgreSQL connection string | postgresql://user:pass@host:5432/db |
| Variable | Default | Description |
|---|---|---|
DESTINATION_TABLE |
matomo |
Selects which table to write to (normal or partitioned) |
MATOMO_TABLE_NAME |
matomo |
Name for the standard table |
PARTITIONED_MATOMO_TABLE_NAME |
matomo_partitioned |
Name for the partitioned table |
STARTDATE |
Auto-detected | Override start date for initial import (YYYY-MM-DD) |
FORCE_STARTDATE |
false |
When true, skip database lookup and use STARTDATE unconditionally |
RESULTPERPAGE |
500 |
API pagination size (max results per request) |
INITIAL_OFFSET |
3 |
Days to look back on first run |
The tool implements a dual table system to optimize performance for different use cases:
The application creates both a standard table and a partitioned table:
- Standard Table (
MATOMO_TABLE_NAME): Traditional PostgreSQL table, suitable for smaller datasets or simpler deployments - Partitioned Table (
PARTITIONED_MATOMO_TABLE_NAME): Weekly partitioned table optimized for large datasets and improved query performance
Use the DESTINATION_TABLE environment variable to specify which table receives the imported data:
# Write to standard table
export DESTINATION_TABLE=matomo
# Write to partitioned table
export DESTINATION_TABLE=matomo_partitioned
# Write to custom table name
export DESTINATION_TABLE=my_custom_analytics_tableConsider using partitioned tables when:
- Large Data Volumes: Importing months or years of analytics data
- Query Performance: Need faster queries on specific date ranges
- Maintenance Operations: Easier to manage large datasets with partition pruning
- Storage Optimization: Better compression and maintenance of historical data
Both tables share the same schema structure, ensuring compatibility regardless of your choice.
When you need to temporarily override the automatic date detection (e.g., to re-import specific data or recover from errors):
export FORCE_STARTDATE=true
export STARTDATE=2024-01-01This configuration will:
- Skip checking the database for the last event
- Use the specified
STARTDATEunconditionally - Useful for one-time re-imports or data recovery scenarios
Important: Remember to unset FORCE_STARTDATE after your one-time import to restore normal automatic detection behavior.
Cronjob Example:
# Temporarily add to your cronjob environment variables:
FORCE_STARTDATE=true
STARTDATE=2024-10-15
MATOMO_URL=https://analytics.example.com/
MATOMO_SITE=1
MATOMO_KEY=your_api_token
PGDATABASE=postgresql://user:pass@host:5432/db
# After import completes, remove FORCE_STARTDATE to resume normal operationThe tool follows a systematic ETL process:
- 📅 Date Range Detection - Determines import range based on last sync or configuration
- 📥 Data Extraction - Fetches visitor data from Matomo's
Live.getLastVisitsDetailsAPI - 🔄 Data Transformation - Converts visits into structured events with proper typing
- 💾 Data Loading - Inserts events into PostgreSQL with conflict resolution
- 📈 Progress Tracking - Provides detailed logging and resumable operations
The tool creates a comprehensive table structure capturing:
- Visitor Information: IDs, geographic location, device details
- Session Metrics: Duration, visit count, visitor type
- Event Data: Actions, categories, values, timestamps (UTC)
- Custom Dimensions: Flexible JSON fields for custom tracking
- Performance Data: Screen resolution, time spent per action
-
Start PostgreSQL:
docker-compose up
-
Set Environment Variables:
export MATOMO_URL=https://your-matomo-instance/ export MATOMO_SITE=your_site_id export MATOMO_KEY=your_api_token export PGDATABASE=postgres://postgres:postgres@127.0.0.1:5455/postgres
-
Run the Application:
pnpm start
# Build TypeScript
pnpm build
# Run tests
pnpm test
# Update test snapshots
pnpm test -u
# Lint code
pnpm lint
# Fix linting issues
pnpm lint:fix
# Run database migrations
pnpm migrateDatabase schema is managed through Kysely migrations located in ./src/migrations/:
Migrations run automatically on each pnpm start to ensure schema compatibility.
-
Initialization - Determine import date range based on:
- Explicit date parameter
- Last event timestamp in database
STARTDATEenvironment variable- Default offset from current date
-
Sequential Processing - For each date:
- Check existing records for pagination offset
- Fetch visitor data in paginated chunks
- Transform visits into individual events
- Insert with conflict resolution
-
Concurrency Control:
- Sequential date processing (one day at a time)
- Parallel event insertion (configurable)
- Automatic pagination for large datasets
API Authentication Errors
- Verify
MATOMO_KEYhas sufficient permissions - Ensure
MATOMO_SITEID is correct - Check
MATOMO_URLincludes trailing slash
Database Connection Issues
- Verify PostgreSQL is running and accessible
- Check
PGDATABASEconnection string format - Ensure database exists and user has write permissions
Performance Issues
- Adjust
RESULTPERPAGEfor optimal API performance - Monitor database indexes and partitioning
- Consider running during off-peak hours for large imports
Enable detailed logging:
DEBUG=matomo-postgres* npx @socialgouv/matomo-postgres