logpare

Semantic log compression for LLM context windows. Reduces repetitive log output by 60-90% while preserving diagnostic information.

The Problem

AI assistants processing logs waste tokens on repetitive patterns. A 10,000-line log dump might contain 50 unique message templates repeated thousands of times — but the LLM sees (and bills for) every repetition.

The Solution

LogPare uses the Drain algorithm to identify log templates, then outputs a compressed format showing each template once with occurrence counts.

Input (10,847 lines):
INFO Connection from 192.168.1.1 established
INFO Connection from 192.168.1.2 established
INFO Connection from 10.0.0.55 established
... (10,844 more similar lines)

Output (23 templates):
=== Log Compression Summary ===
Input: 10,847 lines → 23 templates (99.8% reduction)

Top templates by frequency:
1. [4,521x] INFO Connection from <*> established
2. [3,892x] DEBUG Request <*> processed in <*>
3. [1,203x] WARN Retry attempt <*> for <*>
...

Installation

As a CLI tool (recommended for command-line usage)

Install globally to use logpare directly from anywhere:

npm install -g logpare

# Now works directly
logpare server.log

As a library

Install locally in your project for programmatic usage:

npm install logpare
# or
pnpm add logpare

Note: Local installs require npx to run the CLI: npx logpare server.log

CLI Usage

LogPare includes a command-line interface for quick log compression:

# Compress a log file
logpare server.log

# Pipe from stdin
cat /var/log/syslog | logpare

# JSON output
logpare --format json app.log

# Custom algorithm parameters
logpare --depth 5 --threshold 0.5 access.log

# Write to file
logpare --output templates.txt error.log

# Multiple files
logpare access.log error.log server.log

Using a local install? Prefix commands with npx:
npx logpare server.log
cat /var/log/syslog | npx logpare

CLI Options

Option	Short	Description	Default
`--format`	`-f`	Output format: `summary`, `detailed`, `json`	`summary`
`--output`	`-o`	Write output to file	stdout
`--depth`	`-d`	Parse tree depth	`4`
`--threshold`	`-t`	Similarity threshold (0.0-1.0)	`0.4`
`--max-children`	`-c`	Max children per node	`100`
`--max-clusters`	`-m`	Max total clusters	`1000`
`--max-templates`	`-n`	Max templates in output	`50`
`--help`	`-h`	Show help
`--version`	`-v`	Show version

Programmatic Usage

Simple API

import { compress } from 'logpare';

const logs = [
  'INFO Connection from 192.168.1.1 established',
  'INFO Connection from 192.168.1.2 established',
  'ERROR Connection timeout after 30s',
  'INFO Connection from 10.0.0.1 established',
];

const result = compress(logs);
console.log(result.formatted);
// === Log Compression Summary ===
// Input: 4 lines → 2 templates (50.0% reduction)
// ...

Text Input

import { compressText } from 'logpare';

const logFile = fs.readFileSync('app.log', 'utf-8');
const result = compressText(logFile, { format: 'json' });

Advanced API

import { createDrain, defineStrategy } from 'logpare';

// Custom preprocessing strategy
const customStrategy = defineStrategy({
  patterns: {
    requestId: /req-[a-z0-9]+/gi,
  },
  getSimThreshold: (depth) => depth < 2 ? 0.5 : 0.4,
});

const drain = createDrain({
  depth: 4,
  maxClusters: 500,
  preprocessing: customStrategy,
});

drain.addLogLines(logs);
const result = drain.getResult('detailed');

Output Formats

Summary (default)

Compact overview with top templates and rare events:

=== Log Compression Summary ===
Input: 10,847 lines → 23 templates (99.8% reduction)

Top templates by frequency:
1. [4,521x] INFO Connection from <*> established
2. [3,892x] DEBUG Request <*> processed in <*>
3. [1,203x] WARN Retry attempt <*> for <*>

Rare events (≤5 occurrences):
- [1x] FATAL Database connection lost
- [2x] ERROR Out of memory exception in <*>

Detailed

Full template list with all diagnostic metadata:

Template #1: INFO Connection from <*> established
  Occurrences: 4,521
  Severity: info
  First seen: line 1
  Last seen: line 10,234
  Sample values: [["192.168.1.1"], ["10.0.0.55"], ["172.16.0.1"]]
  URLs: api.example.com, cdn.example.com
  Status codes: 200, 201
  Correlation IDs: req-abc123, trace-xyz789
  Durations: 45ms, 120ms, 2.5s

JSON

Machine-readable format with version field and complete metadata:

{
  "version": "1.1",
  "stats": {
    "inputLines": 10847,
    "uniqueTemplates": 23,
    "compressionRatio": 0.998,
    "estimatedTokenReduction": 0.95,
    "processingTimeMs": 234
  },
  "templates": [{
    "id": "abc123",
    "pattern": "INFO Connection from <*> established",
    "occurrences": 4521,
    "severity": "info",
    "isStackFrame": false,
    "firstSeen": 1,
    "lastSeen": 10234,
    "sampleVariables": [["192.168.1.1"], ["10.0.0.55"]],
    "urlSamples": ["api.example.com"],
    "fullUrlSamples": ["https://api.example.com/v1/users"],
    "statusCodeSamples": [200, 201],
    "correlationIdSamples": ["req-abc123"],
    "durationSamples": ["45ms", "120ms"]
  }]
}

compress(logs, { format: 'json' });

Diagnostic Metadata

LogPare automatically extracts diagnostic information from matching log lines:

Metadata	Description	Supported Formats
URLs	Hostnames and full URLs	`https://...`, `http://...`
Status codes	HTTP status codes	`status 404`, `HTTP/1.1 500`, `code=200`
Correlation IDs	Request/trace identifiers	`trace-id: xxx`, `request-id: xxx`, UUIDs
Durations	Timing values	`45ms`, `1.5s`, `200µs`, `2min`, `1h`

This metadata is preserved in templates and available in detailed/JSON output formats.

Severity Detection

Each template is automatically tagged with a severity level:

Severity	Detected Patterns
`error`	ERROR, FATAL, Exception, Failed, TypeError, ReferenceError, panic
`warning`	WARN, Warning, Deprecated, [Violation]
`info`	Default for other logs

Stack traces are also automatically detected (V8/Node.js, Firefox, Chrome DevTools formats) and marked with isStackFrame: true.

API Reference

`compress(lines, options?)`

Compress an array of log lines.

lines: string[] - Log lines to compress
options.format: 'summary' | 'detailed' | 'json' - Output format (default: 'summary')
options.maxTemplates: number - Max templates in output (default: 50)
options.drain: DrainOptions - Algorithm configuration

Returns CompressionResult with templates, stats, and formatted output.

`compressText(text, options?)`

Compress a multi-line string (splits on newlines).

`createDrain(options?)`

Create a Drain instance for incremental processing.

options.depth: number - Parse tree depth (default: 4)
options.simThreshold: number - Similarity threshold 0-1 (default: 0.4)
options.maxChildren: number - Max children per node (default: 100)
options.maxClusters: number - Max total templates (default: 1000)
options.preprocessing: ParsingStrategy - Custom preprocessing
options.onProgress: ProgressCallback - Progress reporting callback

Progress Reporting

Track progress during long-running operations:

import { createDrain } from 'logpare';

const drain = createDrain({
  onProgress: (event) => {
    console.log(`${event.currentPhase}: ${event.processedLines} lines`);
    if (event.percentComplete !== undefined) {
      console.log(`Progress: ${event.percentComplete.toFixed(1)}%`);
    }
  }
});

drain.addLogLines(logs);
const result = drain.getResult();

The callback receives ProgressEvent with:

processedLines: Lines processed so far
totalLines: Total lines (if known)
currentPhase: 'parsing' | 'clustering' | 'finalizing'
percentComplete: 0-100 (only if totalLines known)

`defineStrategy(overrides)`

Create a custom preprocessing strategy.

const strategy = defineStrategy({
  patterns: { customId: /id-\d+/g },
  tokenize: (line) => line.split(','),
  getSimThreshold: (depth) => 0.5,
});

Built-in Patterns

LogPare automatically masks common variable types:

IPv4/IPv6 addresses
Port numbers (e.g., :443, :8080)
UUIDs
Timestamps (ISO, Unix)
File paths and URLs
Hex IDs
Block IDs (HDFS)
Numbers with units (e.g., 250ms, 1024KB)

Automatic detection features:

Severity tagging — Templates are tagged as error, warning, or info
Stack frame detection — Identifies stack traces (V8, Firefox, Chrome formats)
Diagnostic extraction — Captures URLs, HTTP status codes, correlation IDs, and durations

Performance

Speed: >10,000 lines/second
Memory: O(templates), not O(lines)
V8 Optimized: Uses Map for tree nodes, monomorphic constructors

Parameter Tuning Guide

When to Adjust Parameters

Symptom	Cause	Solution
Too many templates	Threshold too high	Lower `simThreshold` (e.g., 0.3)
Templates too generic	Threshold too low	Raise `simThreshold` (e.g., 0.5)
Similar logs not grouped	Depth too shallow	Increase `depth` (e.g., 5-6)
Too much memory usage	Too many clusters	Lower `maxClusters`

Recommended Settings by Log Type

Structured logs (JSON, CSV):

{ depth: 3, simThreshold: 0.5 }

Noisy application logs:

{ depth: 5, simThreshold: 0.3 }

System logs (syslog, journald):

{ depth: 4, simThreshold: 0.4 } // defaults work well

High-volume logs (>1M lines):

{ maxClusters: 500, maxChildren: 50 }

Troubleshooting

"Too many templates"

If you're getting more templates than expected:

Lower the similarity threshold: Templates that should group together may not meet the default 0.4 threshold
```
compress(logs, { drain: { simThreshold: 0.3 } })
```

Check for unmaked variables: Custom IDs or tokens may need masking

const strategy = defineStrategy({
  patterns: { customId: /your-pattern/g }
});

"Templates are too generic"

If templates are over-grouping different log types:

Raise the similarity threshold:

compress(logs, { drain: { simThreshold: 0.5 } })

Increase tree depth:
```
compress(logs, { drain: { depth: 5 } })
```

"Memory usage too high"

For very large log files:

Limit clusters: Set maxClusters to cap memory usage

compress(logs, { drain: { maxClusters: 500 } })

Process in batches: Use createDrain() and process chunks

"Some patterns not being masked"

Add custom patterns for domain-specific tokens:

const strategy = defineStrategy({
  patterns: {
    sessionId: /sess-[a-f0-9]+/gi,
    orderId: /ORD-\d{10}/g,
  }
});

Coming from Python Drain3?

See MIGRATION.md for a detailed comparison and migration guide.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.claude		.claude
.cursor/rules		.cursor/rules
.github		.github
.husky		.husky
docs		docs
examples		examples
packages/mcp		packages/mcp
scripts		scripts
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
PUBLISHING_SETUP.md		PUBLISHING_SETUP.md
README.md		README.md
SANDPACK_IMPLEMENTATION.md		SANDPACK_IMPLEMENTATION.md
SECURITY.md		SECURITY.md
cliff.toml		cliff.toml
llms-full.txt		llms-full.txt
llms.txt		llms.txt
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

logpare

The Problem

The Solution

Installation

As a CLI tool (recommended for command-line usage)

As a library

CLI Usage

CLI Options

Programmatic Usage

Simple API

Text Input

Advanced API

Output Formats

Summary (default)

Detailed

JSON

Diagnostic Metadata

Severity Detection

API Reference

compress(lines, options?)

compressText(text, options?)

createDrain(options?)

Progress Reporting

defineStrategy(overrides)

Built-in Patterns

Performance

Parameter Tuning Guide

When to Adjust Parameters

Recommended Settings by Log Type

Troubleshooting

"Too many templates"

"Templates are too generic"

"Memory usage too high"

"Some patterns not being masked"

Coming from Python Drain3?

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`compress(lines, options?)`

`compressText(text, options?)`

`createDrain(options?)`

`defineStrategy(overrides)`

Packages