Skip to content

Releases: dathere/qsv

14.0.0

13 Jan 03:26

Choose a tag to compare

[14.0.0] - 2026-01-12 📦 "The qsv MCP for Everyone Release" 🎁

Building on our 13.0.0 "AI-native Agent" release last week, qsv 14.0.0 is dedicated to making AI integration seamless, reliable, and easy for everyone.

Previously, installing the qsv MCP Server required a full-fledged development environment and familiarity with command line tools and was not readily usable by non-developers.

This release transforms the qsv MCP Server from a powerful developer tool into a user-friendly, transparently integrated Claude Desktop data-wrangling agent with robust cross-platform support, automatic updates, and comprehensive testing infrastructure.

MCP Desktop Extension (Bundle) - One-Click Installation

The new MCP Desktop Extension provides a streamlined installation experience for Claude Desktop users:

  • User-Friendly Package - Pre-configured bundle with automatic qsv binary detection - and if not found, provide installation guidance1
  • Cross-Platform Support - Works seamlessly on macOS, Windows, and Linux
  • Smart Data-wrangling - it's deep knowledge of qsv insulates the User from the nitty-gritty details of the comprehensive toolkit with its hundreds of options, while ensuring fast, effective operations
  • Token Efficient - Despite this deep knowledge, the MCP server is still token efficient by including intelligent contextual guidance to help Claude make optimal decisions (USE WHEN, COMMON PATTERNS, ERROR PREVENTION, PERFORMANCE HINTS prompt guidance along with lazy-loading of full qsv --help text when more info is required)
  • Security Enhanced - Raw Data is not sent to Claude, only statistical metadata2
  • Welcome Experience - Includes prompts and examples to get started quickly
  • Seamlessly works with both Claude Code and the just launched Claude Cowork! Take qsv beyond data-wrangling chats and unlock even greater potential with an agentic qsv.

The Desktop Extension follows the official MCP Bundle (MCPB) manifest specification v0.3, ensuring compatibility with Claude Desktop and future MCP-compatible applications.

See the MCP documentation for installation instructions.

Breaking Changes

  • MCP Skills: qsv-skill-gen binary removed - use qsv --update-mcp-skills instead (requires mcp feature flag)

Added

  • feat: MCP Desktop Extension - user friendly installation of qsv MCP Server #3296
  • feat: MCP Server: numerous QoL improvements to MCP Desktop Bundle #3298
  • feat: MCP skills auto update #3292
  • feat: MCP - add expert guidance, common patterns, MCP optimized descriptions & usage hints #3303
  • feat: MCP skills generator now extracts performance hints (📇 indexed, 🤯 memory-intensive, 😣 proportional memory) from README.md command table
  • feat: MCP Server automatically enables --stats-jsonl flag for stats command to create cache for smart commands
  • feat: MCP enhanced tool descriptions with intelligent guidance - USE WHEN, COMMON PATTERNS, ERROR PREVENTION hints
  • feat: MCP parameter enhancements with examples for common options (selection, delimiter, etc.)
  • feat: MCP comprehensive pipeline tool description with workflows and limitations
  • feat: MCP enhanced filesystem tools (list_files, set_working_dir, get_working_dir) with usage guidance
  • feat: MCP add auto-detection of qsv binary path for Desktop Extension 5c09672e
  • feat: MCP various Quality-of-Life UI/UX improvements b5b338f6
  • feat: MCP enhance Desktop Extension with validation and fixes e2e20551
  • feat: MCP add prompts for welcome message and examples 2672a74b
  • feat: Claude Code GitHub App integration - PR review and issue assistance workflows #3312
  • tests: MCP add CI test workflow for qsv MCP server 8732fee3
  • docs: MCP add comprehensive Claude Code (CLI) documentation 97a88c4e
  • docs: MCP add an MCP Server-specific CLAUDE.md e7e5f9e1
  • docs: add qsv pro download badges to README and update description #3295
  • docs: add alt text to all download badges cc1c3819
  • docs: add mise alternate installation documentation #3304
  • docs: MCP update skills markdown documentation #3308
  • docs: add MCP Server environment variables section to ENVIRONMENT_VARIABLES.md & dotenv.template

Changed

  • refactor: MCP Server - removed applydp command (datapusher+ specific, not needed for general use)
  • refactor: MCP use qsv --update-mcp-skill instead of separate qsv-skill-gen binary 13380ba1
  • refactor: MCP remove qsv-skill-gen binary, make it an option in qsv gated behind mcp feature flag 9c771ee6
  • refactor: MCP more robust output processing - use temp output file and stdout intelligently #3291
  • refactor: MCP qsv-skill-gen.rs to preserve positional docopt args when generating skills JSON file 9618a25c
  • refactor: MCP make output/temp file processing smarter 207274c7
  • refactor: MCP use directory type for filesystem config to clarify restricted access 9650fb41
  • refactor: MCP added null checks before iterating arrays 2d0747ab
  • refactor: MCP fixed TS output directory to account for prod and test builds b0b12a40
  • refactor: MCP address all issues identified during Copilot review 27027e50
  • refactor: MCP optimize tokens use - extract concise command descriptions from README #3307
  • refactor: MCP fine-tune select guidance 37964123
  • docs: with MCP fully implemented - update the logo to make the horse robotic 33f3b9f5
  • docs: comprehensive STATS_DEFINITION.md update b443ccc4
  • chore: address valid robustness issues in last Copilot review 55a5a300
  • chore: delete CITATION.cff file and just depend on Zenodo integration which auto-assigns a DOI on release 9b981b8c
  • deps: bump polars to 0.52.0 at py-1.37.1 tag 3bbad1ea
  • deps: bump atoi_simd and calamine c7cd928f
  • deps: bump data-encoding from 2.9.0 to 2.10.0 09bf3c33
  • deps: bump unicase from 2.8.1 to 2.9.0 99f66a3b
  • deps: bump csvlens to 15.1 and remove our patched fork d588e36e
  • deps: use latest csvlens with marked row export fd706255
  • deps: bump blake3 to 1.8.3 and remove our patched fork 05f0efbb
  • deps: bump toml from 0.9.10+spec-1.1.0 to 0.9.11+spec-1.1.0 2330b1d2
  • deps: bump zerocopy from 0.8.32 to 0.8.33 950564d1
  • build(deps): bump serde_json from 1.0.148 to 1.0.149 #3290
  • build(deps): bump @modelcontextprotocol/sdk from 1.25.1 to 1.25.2 #3293
  • build(deps): bump indexmap from 2.12.1 to 2.13.0 #3294
  • build(deps): bump libc from 0.2.179 to 0.2.180 #3299
  • build(deps): bump zmij from 1.0.12 to 1.0.13 #3305
  • build(deps): bump actions/checkout from 4 to 6 #3309
  • build(deps): bump actions/setup-node from 4 to 6 #3310
  • deps: bump nightly from 2025-10-24 to 2026-01-09; same as polars f77ea524
  • bumped several indirect dependencies
  • applied select clippy & Codacy suggestions
  • applied several GH Copilot and Claude review suggestions
  • bumped nightly from 2025-10-24 to 2026-01-09, same as polars

Fixed

  • fix: stats use .get() instead of [] indexing to avoid panics on missing keys when using old stats cache file #3306
  • fix: MCP force add tsconfig.json #3301
  • fix: MCP correct manifest.json to match official spec v0.3 c783cf2c
  • fix: MCP expand template variables in config paths 3177cfe1
  • fix: MCP address Copilot review issues in package-mcpb.js ec37b7c7
  • fix: MCP replace execSync with execFileSync for security reasons 5209c751
  • fix: MCP add promise-based deduplication for metadata cache to prevent race conditions https...
  1. The qsv MCP Server is at v14.1.0, incorporating several fixes

  2. Note that statistical metadata is not anonymized and will disclose potentially sensitive information. See #3289

Read more

13.0.0

06 Jan 13:15

Choose a tag to compare

[13.0.0] - 2026-01-06 🦾 "The Statistical Data-Wrangling Agent Release" 🤖

We welcome 2026 with qsv 13.0.0 - a major milestone that transforms qsv into an AI-native Agent!

This is in addition to the online AI-Chatbot for CKAN portals we released last September and the expanded describegpt command we released last month as we continue our march towards even more AI/ML/Graph/FAIR and Data Librarian/Concierge/Advisor/Analyst capabilities across the datHere suite in the coming months as we embark on a strategic partnership with the Open Knowledge Foundation to Strengthen Open, FAIR, AI-Ready Data Infrastructure powered by CKAN.

This release introduces first-class support for AI agents through three major new capabilities:

MCP Server - Model Context Protocol Integration

qsv now ships with a built-in Model Context Protocol (MCP) Server enabling seamless integration with AI Chatbots starting with Claude Desktop.

  • Local Data - Its "zero-copy" inspired approach allows you to wrangle very large datasets - WITHOUT sending raw data1, only sending statistical metadata to Claude! This is not only good for security and privacy reasons - it overcomes Claude's upload size limit, saves tokens and improves performance!
  • 22 MCP Tools: 20 common qsv commands as individual tools + 1 generic tool to access all other 46 commands + 1 pipeline tool
  • Natural Language Interface: No need to remember command syntax
  • Pipeline Support: Chain multiple operations together seamlessly

See the MCP documentation for detailed setup instructions.

Claude Agent SDK Helper Utilities

New Agent Skills infrastructure provides:

  • qsv-skill-gen CLI - Generate skill definitions for AI agents
  • Parses qsv USAGE text using qsv-docopt to generate JSON skill definitions. This allows quick update of Agent Skills as commands and options are added & modified.
  • Shell-safe example generation with proper quoting
  • Comprehensive documentation for AI agent integration to integrate qsv into your own AI solutions!

moarstats - Massive Statistical Expansion

The moarstats command received substantial enhancements, adding 24+ MOAR statistical measures:

Advanced Univariate Statistics:

  • Bimodality Coefficient - Detect multimodal distributions
  • Normalized Entropy - Scaled information content measure (0-1)
  • Atkinson Index - Inequality measure with configurable epsilon parameter

Bivariate Statistics:

  • Pearson's correlation - Linear correlation coefficient
  • Spearman's rank correlation - Monotonic relationship measure
  • Kendall's tau - Concordance-based correlation
  • Covariance - Joint variability measure
  • Mutual Information - Information-theoretic dependency
  • Normalized Mutual Information - Scaled mutual information (0-1)
  • Multi-dataset joins - --join-inputs for bivariate analysis ACROSS datasets

XSD Type Mapping:

  • Automatic inference of W3C XML Schema Definition (XSD) datatypes
  • Smart XSD Gregorian date type inferencing with "quick" and "thorough" modes (#3259)
  • Support for gYear, gMonth, gDay, gMonthDay, gYearMonth validation

See STATS_DEFINITIONS.md for a comprehensive list of the ~100 statistical metrics qsv compiles!


Breaking Changes

  • lens: Default behavior changed to NOT stream from stdin (use explicit flag if needed)
  • moarstats: Output now includes additional columns (xsd_type, bivariate stats)

Added

  • feat: qsv MCP server #3269
  • feat: MCP - expanded file selector for more supported tabular file formats; auto index for files larger than 10mb #3278
  • feat: added Claude Agent Skills SDK support 🤖 #3264
  • feat: moarstats add "xsd_type" column #3242
  • feat: moarstats add Atkinson Index with configurable inequality aversion parameter, Normalized Entropy & Bimodal Coefficient #3243
  • feat: moarstats add bivariate stats #3247
  • feat: moarstats add normalized mutual info #3256
  • feat: moarstats add --force and --jobs options #3253
  • feat: moarstats add "xsd_subtype" Gregorian date data types inferencing with --xsd-gdate-scan having fast (default) and comprehensive modes #3259
  • feat: qsvdp enable join command that moarstats uses #3252
  • docs: added comprehensive stats documentation #3240

Changed

  • refactor: describegpt - consolidate JSON response parsing; cache handling; and make DuckDB & Polars error handling more consistent #3241
  • refactor: frequency reduce duplication introduced by --weight option #3236
  • perf: frequency precompute other_prefix for performance 2dc75ee
  • perf: frequency simplify apply_limits* helper functions f0b7f9c
  • perf: pivotp convert directly to PlSmallStr for performance b7dbb3f
  • refactor MCP Server to optimize for Local Access to Files #3272
  • refactor: MCP Server improvements #3274
  • refactor: MCP Server remove examples from ci tests #3277
  • refactor: MCP Server add LIFO converted cache #3280
  • refactor: MCP Server moar refactoring after tests #3282
  • perf: moarstats much faster bivariate calculation #3248
  • perf: moarstats optimize non-streaming bivariate stats compilation #3250
  • refactor: qsv Skills Agent #3267
  • deps: polars bump to rev c241260 #3276
  • build(deps): bump itoa from 1.0.16 to 1.0.17 by @dependabot[bot] in #3239
  • build(deps): bump human-panic from 2.0.4 to 2.0.5 by @dependabot[bot] in #3234
  • build(deps): bump human-panic from 2.0.5 to 2.0.6 by @dependabot[bot] in #3249
  • build(deps): bump libc from 0.2.178 to 0.2.179 by @dependabot[bot] in #3265
  • build(deps): bump redis from 1.0.1 to 1.0.2 by @dependabot[bot] in #3232
  • build(deps): bump rfd from 0.16.0 to 0.17.0 by @dependabot[bot] in #3279
  • build(deps): bump rfd from 0.17.0 to 0.17.1 by @dependabot[bot] in #3284
  • build(deps): bump serde_json from 1.0.147 to 1.0.148 by @dependabot[bot] in #3238
  • build(deps): bump serial_test from 3.2.0 to 3.3.0 by @dependabot[bot] in #3273
  • build(deps): bump serial_test from 3.3.0 to 3.3.1 by @dependabot[bot] in #3275
  • build(deps): bump tokio from 1.48.0 to 1.49.0 by @dependabot[bot] in #3266
  • build(deps): bump url from 2.5.7 to 2.5.8 by @dependabot[bot] in #3286
  • build(deps): numerous bumps zmij from 0.1.7 to 1.0.12
  • bumped several indirect dependencies
  • applied select clippy & Codacy suggestions
  • applied several GH Copilot and Claude review suggestions

Fixed

  • fix: refresh_cpu_all() -> refresh_cpu_list(sysinfo::CpuRefreshKind::nothing())… #3261
  • fix: stats remove redundant check 0977ebf
  • fix: moarstats correct kendall_tau formula cf16543
  • fix: describegpt and util::run_qsv_cmd - add special case for sample as it expects output differently 6b6039f
  • fix: CVE-2025-66414 security vulnerability GHSA-w48q-cv73-mx4w
  • fix: RUSTSEC-2026-0001 (rkyv bump) c2d4937
  • typo: Portugese → Portuguese
  • typo: stats asummes → assumes

AI Contributors

  • @jqnatividad collaborated with and orchestrated @Copilot, Claude Code, Cursor and Gemini using various models

Full Changelog: 12.0.0...13.0.0

  1. Note that statistical metadata is not anonymized and will disclose potentially sensitive information. See #3289

12.0.0

24 Dec 14:14

Choose a tag to compare

[12.0.0] - 2025-12-24 🎄

Stuff your virtual stocking and jingle your data bells - qsv 12.0.0 slides down the chimney packed fuller than Santa’s sleigh! Unwrap delightful surprises like the shiny new moarstats command, gift-wrapped weighted statistics, and AI-powered FAIR metadata inferencing now speaking in multiple languages (no elf translation required). As the star on top, meet TOON - the brand new LLM-optimized, token-efficient format - ready to sleigh your AI projects all through 2026. Ho-ho-hold my data, this update’s a festive feast!

Special thanks to @kulnor for advocating, brainstorming & testing many of the new features below!

🌟 Major Features

NEW: moarstats Command

A powerful new command for "moar" advanced statistical analysis, providing statistics beyond what the stats command offers:

  • Comprehensive Statistics: Over 50+ advanced statistical measures including:

    • Detailed outlier analysis (count, sum, average)
    • Winsorized and trimmed means (5%, 10%, 20%, 25%)
    • Multiple dispersion measures (IQR to range ratio, quartile coefficient of dispersion)
    • Distribution statistics (skewness, multiple kurtosis measures)
  • Advanced Option (--advanced): Access computationally intensive statistics:

    • Gini coefficient for inequality measurement
    • Excess Kurtosis to measure "tailedness" of the distribution
    • Shannon Entropy for data diversity analysis
  • Available on all binary variants for universal access

Enhanced describegpt Command

Major enhancements to AI-powered data description capabilities:

  • ⛩️ Minijinja Template Engine Integration:

    • Custom prompt templating with full Minijinja and Minijinja-contrib filters
    • More powerful and flexible prompt customization
  • Multilingual Support:

  • Advanced Features:

    • --addl-columns option with detailed attribution and system metadata
    • --export-prompt <file> to save the default prompts to the specified file.
      This file can then be tailored and used with the --prompt-file <file> option.
    • Iterative, session-based SQL RAG with --prompt option
    • Sampling in prompt mode for better SQL generation
    • Lookup table and CKAN support for controlled vocabularies
    • Convenience values for --addl-cols-list
      (i.e., "everything", "everything!", "moar", "moar!")

Weighted Statistics Support

Comprehensive weighted statistics implementation across multiple commands:

  • stats Command (--weight <column>):

    • Weighted mean, standard deviation, variance
    • Weighted MAD (Median Absolute Deviation) and percentiles
    • Weighted modes and antimodes
    • Weighted harmonic and geometric means
    • All weighted calculations handle non-finite values gracefully
  • frequency Command (--weight <column>):

    • Weighted frequency distributions
    • Proper handling of weighted "Other" and "ALL UNIQUE" category
    • Non-finite weights automatically skipped

Token Object Oriented Notation (TOON) Format Support

  • A compact, human-readable encoding of the JSON data model for LLM prompts

  • Commands Supporting TOON:

    • describegpt --format TOON
    • frequency --toon
  • Benefits: More readable than JSON, easier to parse than CSV for hierarchical data
    and more token-efficient, terse format targeted for LLMs

stats Command Enhancements

  • Percentile Improvements:

    • --percentile-list special values: "deciles" and "quintiles"
    • Percentile labels now include prefix before value (e.g., "p50: 42.5")
    • Validation of percentile-list on startup
  • New Columns: Added n_counts for more detailed count information

  • Performance Optimizations:

    • Optimized Stats struct layout
    • Eliminated redundant, unnecessary sorting
    • Removed redundant filtering for weighted stats functions
    • Microoptimizations throughout

transpose Command

  • New --long Option: Transform data from wide to long format
    • Column selection support using select syntax
    • Streaming implementation per GitHub Copilot review suggestions

diff Command

  • upgraded csv-diff from 0.1.1 to faster 0.1.2, improving performance
    in optimal cases by up to 25% 🚀

lens Command

  • Aligned --no-streaming-stdin behavior with csvlens upstream

📊 Output Format Changes

schema Command

  • Updated $schema from Draft 7 to JSON Schema Draft 2020-12

⚡ Performance Improvements

suite-wide

stats Command

  • Optimized Stats struct memory layout
  • Eliminated redundant sorting operations
  • Removed unnecessary clone operations
  • Better handling of real-world data (assumes no infinity values)

frequency Command

  • Microoptimizations for faster frequency computation
  • Optimized top_n/bottom_n retrieval

🐛 Bug Fixes

frequency Command

  • Fixed behavior when compiling weighted frequencies with ALL_UNIQUE
  • Fixed issue where "Other (0),0,0,0" could appear in output
  • Proper handling of non-finite weights (automatically skipped)

🏗️ Infrastructure & Quality

Testing

  • Test suite expanded from 2,060 to 2,380 tests
  • Comprehensive test coverage for all new features
  • Weighted statistics thoroughly tested
  • Advanced moarstats options validated

Code Quality

  • Extensive GitHub Copilot review integration
  • Multiple refactoring passes for code clarity
  • Clippy suggestions incorporated throughout
  • Better error handling and edge case management

FAIR Principles

  • Added CITATION.cff (by @rzmk) for academic citation
  • Added Zenodo DOI badge for dataset citation
  • Enhanced FAIRification of qsv as a research tool

📚 Documentation Improvements

Statistical Documentation

  • Comprehensive documentation for statistics produced by stats command (by @kulnor) WIP
  • Enhanced usage text for stats, frequency, and moarstats
  • Better examples throughout documentation

Command Documentation

  • Updated describegpt with multilingual examples
  • Added controlled tag vocabulary examples
  • Enhanced TOON format documentation
  • Better SQL RAG workflow documentation

Migration Notes

Breaking Changes

  1. schema command: $schema output changed from Draft 7 to Draft 2020-12

    • Most schemas should be compatible
    • Validation tools must support JSON Schema Draft 2020-12
  2. stats command: Output now includes percentile label prefixes

    • Example: "p50: 10" of the 50th percentile value instead of just the value "10"
    • May affect parsing scripts that expect raw numbers

Added

  • feat: describegpt add --add-cols and --addl-cols-list <list> options #3179
  • feat: describegpt add --language option #3184
  • feat: describegpt use minijinja engine for prompt processing #3188
  • feat: describegpt add language autodetection in --prompt (chat) mode #3193
  • feat: describegpt sampling in prompt mode for better SQL generation… #3198
  • feat: describegpt add --prompt sessions for iterative SQL RAG refinement #3200
  • feat: describegpt add TOON format support #3205
  • feat: frequency add TOON format #3206
  • feat: frequency add weighted frequencies #3218
  • feat: add new moarstats command #3207
  • feat: moarstats add even moar! Now with detailed outliers info! #3208
  • feat: moarstats - add configurable ...
Read more

11.0.2

08 Dec 06:09

Choose a tag to compare

[11.0.2] - 2025-12-08

qsv 11.0.2 brings significant enhancements to larger-than-memory data processing, AI-powered metadata inferencing, JSON Schema inferencing & validation, and data viewing capabilities, along with important bug fixes and performance improvements.

All in preparation for at-scale, secure, interactive, "zero-copy" "Data Steward-in-the-Loop" FAIRification on the desktop in qsv pro.

🌟 Major Features

stats & frequency

  • Larger than Memory Files: stats & frequency can now handle arbitrarily large files, even when "advanced" statistics are enabled with its new dynamic parallel chunk sizing algorithm! (example stats, frequency)
  • N Counts: Added "n_counts" (n_negative, n_zero and n_positive) columns to stats output for more detailed count information for numeric fields.

describegpt

The describegpt command has received substantial improvements for AI-powered metadata inferencing:

  • "Neuro-Procedural" Data Dictionaries: combines deterministically computed statistics and frequency distribution data with AI-inferred Human-Friendly Labels and Descriptions to compile an expanded Data Dictionary (not quite "neuro-symbolic" (YET!))

  • Chat with your Data!: Improved DuckDB and Polars SQL guidance mean more reliable transformations of your Natural Language queries to SQL - leading to fast, deterministic, reproducible, hallucination-free answers! (example, SQL result)

  • Format Option: Replaced --json flag with --format option for more flexible output formatting

    • Supports multiple output formats - Markdown (default), TSV and JSON
    • Removed --jsonl option for cleaner API
  • Controlled Tag Vocabulary: New tag vocabulary system for consistent categorization

    • --tag-vocab option to specify controlled vocabulary
    • Lookup support for tag vocabularies - retrieve a tag vocabulary from a local or remote CSV
      using http://, https://, dathere:// and ckan:// URL schemes.
  • Enhanced Boolean Inference: --infer-boolean is now enabled by default for better data type detection

  • Performance Metrics: Added elapsed time tracking to monitor processing duration

  • Improved Prompt Templates: Updated default description prompt with PII/PHI alerts and better attribution metadata

schema & validate

Enhanced JSON Schema inference and validation capabilities:

  • Strict Formats: New --strict-formats option for stricter JSON Schema format validation,
    enforcing JSON Schema format constraints for email, hostname & IP address (IPV4/IPV6) formats.

  • Output Option: New --output option for specifying schema output destination

    • Polars schema now uses consistent naming conventions across commands
    • Updated joinp, pivotp, and sqlp commands to use new .pschema.json naming convention
  • Configurable Email Validation: validate has numerous options to tweak email validation
    - taking advantage of schema's email format constraint inferencing.

sample time-series sampling

A new --timeseries sampling method with grouping (hourly, daily, weekly),
adaptive sampling (prefer business hours or weekends) with various aggregation (mean, sum, min, max)
within each interval with configurable starting points (first, last or random).

lens "real-time" Features

Enhanced CSV viewing capabilities with csvlens integration:

  • Auto-Reload: New --auto-reload option to automatically reload file when it changes

    • Useful for monitoring live data files
  • Streaming stdin: New --streaming-stdin option for real-time data viewing

    • Supports viewing data as it's being piped in
  • Row Marking: Updated csvlens dependency with row marking feature

Breaking Changes

  • describegpt: --json flag replaced with --format option
  • describegpt: --jsonl option removed
  • schema, joinp, pivotp, sqlp: Updated Polars schema naming conventions
    (existing workflows should work but output format may differ slightly)

Added

  • Created Event Logo Archive with AI-generated seasonal/version logos
  • describegpt: add controlled vocabulary support for tags #3122
  • describegpt: add elapsed time #3168
  • describegpt: add lookup support #3170
  • excel: add --cell option #3133
  • frequency: add dynamic parallel chunk sizing #3135
  • lens: add --auto-reload option #3128
  • lens: add --streaming-stdin option #3171
  • sample: add timeseries sampling options #3130
  • schema: infer addl JSON Schema predefined formats - email, ipv4, ipv6, hostname #3125
  • schema: add --output option and standardize Polars Schema file name #3126
  • stats: dynamic parallel chunk sizing with indexed files #3134
  • stats: add n_negative, n_zero, n_positive count columns #3157
  • validate: add email validation options #3148
  • tests: add tests for https://100.dathere.com/lessons/4 by @rzmk in #3151
  • Added Claude AI guidance for contributors
  • Enhanced --version output with more comprehensive system metadata

Changed

  • refactor: describegpt improve tags inferencing with Tag Vocabulary #3139
  • feat: describegpt - major refactor #3143
  • feat: describegpt improved Polars SQL processing #3147
  • feat: describegpt replace --json option with --format option supporting 3 formats - markdown, json and TSV; remove --jsonl option #3167
  • refactor: frequency & stats - parallel chunk sizing - allow forcing of cpu based chunking #3138
  • Align partition stdin handling with split/stats pattern by @Copilot in #3162
  • deps: use latest polars upstream with new SQL fixes and features (pola-rs/polars@e1be17f)
  • build(deps): bump actions/setup-python from 6.0.0 to 6.1.0 by @dependabot[bot] in #3120
  • build(deps): bump actix-web from 4.12.0 to 4.12.1 by @dependabot[bot] in #3127
  • build(deps): bump flate2 from 1.1.5 to 1.1.7 by @dependabot[bot] in #3159
  • build(deps): bump jsonschema from 0.37.1 to 0.37.2 by @dependabot[bot] in #3129
  • build(deps): bump jsonschema from 0.37.2 to 0.37.3 by @dependabot[bot] in #3131
  • build(deps): bump jsonschema from 0.37.3 to 0.37.4 by @dependabot[bot] in #3140
  • build(deps): bump log from 0.4.28 to 0.4.29 by @dependabot[bot] in #3150
  • build(deps): bump minijinja from 2.12.0 to 2.13.0 by @dependabot[bot] in #3142
  • build(deps): bump minijinja-contrib from 2.12.0 to 2.13.0 by @dependabot[bot] in #3141
  • build(deps): bump pyo3 from 0.27.1 to 0.27.2 by @dependabot[bot] in #3137
  • build(deps): bump qsv-stats from 0.40.0 to 0.41.0 by @dependabot[bot] in #3136
  • build(deps): bump qsv-stats from 0.41.0 to 0.42.0 by @dependabot[bot] in #3156
  • build(deps): bump qsv-stats from 0.42.0 to 0.43.0 by @dependabot[bot] in #3169
  • build(deps): bump rfd from 0.15.4 to 0.16.0 by @dependabot[bot] in #3121
  • build(deps): bump uuid from 1.18.1 to 1.19.0 by @dependabot[bot] in #3146
  • Improved qsvpy build process for Apple Silicon
  • Updated GitHub Actions workflows for better reliability
  • bumped several indirect dependencies
  • applied select clippy & Codacy suggestions
  • Improved dependency version management
  • Better feature flag handling

Fixed

  • fix: apply panic on empty selection #3165
  • fix: more robust snappy and file extension detection #3166
  • fix: partition add proper stdin handling regression introduced when --limit option was added #3161
  • Fix broken layout of environment variable documentation by @tmtmtmtm in #3163

Removed

  • describegpt: remove --jsonl option #3167
  • chore: remove jemalloc support #3153

New Contributors

  • @Copilot made their first contribution in #3162

*...

Read more

10.0.0

23 Nov 22:43

Choose a tag to compare

[10.0.0] - 2025-11-23

Highlights:

  • Enhanced Data Dictionary: describegpt now features an expanded default prompt (v4.0) that generates more comprehensive data dictionaries.
  • Parallel Search/Replace Operations: search, searchset, and replace commands now support parallel execution when working with indexed CSV files, delivering significant performance improvements for large datasets.
  • Search/Replace Exact Match Options: Added --exact option to search, searchset, and replace commands for precise string matching without regex patterns.
  • Enhanced SQL Capabilities: sqlp now supports arbitrary expressions in SQL JOIN constraints, named window references, and new SQL functions including row_number, rank, dense_rank, and array_to_string.
  • Improved pivotp Performance: Updated to use Polars' new lazy pivot API with --maintain-order flag for predictable output ordering.
  • Luau 0.701: Updated embedded Luau from 0.697 to 0.701 with additional pattern matching documentation and tests.

Added

  • search & searchset: add --exact option for literal string matching #3094
  • search: parallel search when file is indexed #3096
  • searchset: parallel execution when indexed #3097
  • replace: add --exact option e73d9bf
  • replace: parallel execution when indexed #3098
  • sqlp: added support for arbitrary expressions in SQL JOIN constraints d47c44e & 0d2402b
  • sqlp: added support for row_number, rank, and dense_rank SQL window functions #3115
  • sqlp: added support for named window references #3118
  • sqlp: added support for array_to_string list evaluation 64cbf34
  • pivotp: added --maintain-order flag for predictable output ordering 02dca12
  • describegpt: default-prompt-file v4.0 with expanded Data Dictionary generation 4db0d18
  • luau: expanded documentation for string functions using pattern matching a7344e3 & 2dcc9a4
  • util::mem_file_check: added platform adjustment factor 421be84
  • benchmarks: v7.0 added search & searchset indexed parallel benchmarks 55df784
  • benchmarks: v7.1.0 added replace_indexed_parallel benchmark 05c89d8

Changed

  • describegpt: refactored for improved reliability 1433bf1 & b6190a4
  • frequency: special rank of 0 now assigned to <ALL_UNIQUE> rows effa13b
  • frequency: microoptimizations 775bb88 & 29ec7af
  • search, searchset & replace: now parallelizable with an index, with significant performance improvements 45fc83d
  • search: use faster, non-allocating par_sort_unstable_by_key for improved performance 5f50f23
  • search: optimize --quick option 1fc1b85
  • search: --preview-match option forces sequential search 017ca6f
  • search, searchset & replace: sort chunks instead of raw data for better performance 5b58cb8
  • searchset: microoptimizations for performance c4ce324
  • replace: remove unneeded index rebuild logic cfdba60
  • pivotp: refactored to adapt to Polars' new lazy pivot API #3102
  • excel: microoptimize hot loop and formula retrieval f141c1b & 17780b5
  • stats: cache repetitive expensive env_var access in hot path a6ad0ce
  • stats: multiple microoptimizations 2f41c33 & 9bf43e5 & 00958a1
  • validate: updated to jsonschema 0.37.x with improved error handling f45693d & c7ad5d2 & b9ea447
  • luau: updated embedded Luau from 0.697 to 0.701 8885dce
  • deps: bump polars to latest upstream with numerous SQL and LazyFrame improvements
  • deps: bump jsonschema from 0.34 to 0.37.1
  • deps: bump syn from 2.0.109 to 2.0.110 d207524
  • deps: bump quick-xml from 0.38.3 to 0.38.4 11a5ae4
  • deps: bump geosuggest-core from 0.8.1 to 0.8.2 baf3194
  • deps: bump geosuggest-utils from 0.8.1 to 0.8.2 c5bcd1b
  • deps: bump governor from 0.10.1 to 0.10.2 b0068ef
  • deps: bump gzp from 2.0.1 to 2.0.2 2a0b901
  • deps: bump indexmap from 2.12.0 to 2.12.1 afa9c1f
  • deps: bump mlua from 0.11.4 to 0.11.5 49eedb9
  • deps: bump signal-hook-registry from 1.4.6 to 1.4.7 5c2e705
  • deps: bump calamine to 0.32 (removed git dependency) 449f162
  • deps: bump cached to latest upstream (removed patched fork) 508d1ce
  • deps: bump actions/checkout from 5 to 6 f76e009
  • deps: removed hashbrown patched fork ad30460
  • deps: removed grex patched fork 88cd3fc
  • deps: updated Cargo.lock file multiple times with indirect dependency updates
  • docs: updated rust-version requirement to 1.91 c288d4d
  • docs: prebuilt binaries on Linux and Windows x86_64 are no longer compiled with target-cpu=native 5f892a1
  • docs: expanded note about Illegal Instruction (SIGILL) faults and portable builds e4df784
  • docs: describegpt update with expanded Data Dictionary example and link to defaults d722afd & cedcd41 & bba4f76
  • applied select clippy lint suggestions
  • bumped several indirect dependencies

Fixed

  • count: should still work with "broken" CSVs when polars feature is enabled #3104
  • describegpt: more robust SQL escaping to prevent SQL injection e958329
  • excel: formula retrieval bug on error b894515
  • excel: reverted mistaken alloc optimization for trim path b37361a
  • index: added check to confirm that only uncompressed CSV files can be indexed 1be485b
  • sqlp: unnest workaround for test compatibility 54d079b
  • sqlp: corrected array_to_string test 6c661ac
  • docs: fixed typo QSV_MEMORY_HEADROOM_PCT -> QSV_FREEMEMORY_HEADROOM_PCT f15d03e

Removed

  • deps: removed polars crates (polars-utils, polars-ops) that are no longer needed a7785f6
  • publish: removed target-cpu=native as it causes SIGILL on GitHub Action Runners fd74f8f

Full Changelog: 9.1.0...10.0.0

9.1.0

03 Nov 20:52

Choose a tag to compare

[9.1.0] - 2025-11-03

FAIRMetadataRocks-smaller

FAIRification continues to be a focus, as we tweak key commands that enable us to FAIRify raw data at blazing speed:

  • frequency received significant updates in this release, including several new options that make compiling frequency distribution tables easier.
  • describegpt now uses the much faster BLAKE3 hash as a cache key (10-20x faster than SHA256) and supports passing complex prompts more easily through the file system.
  • qsv-stats - the engine that powers both stats and frequency commands - has been further optimized with the 0.40.0 release, to compile summary statistics as fast as possible - even for very large files - often one to two orders of magnitude faster (10 to 100x faster) than typical Python-based tools.
  • Polars has been upgraded to 0.52.0. This vectorized query engine allows us to support more tabular formats & analyze/query millions of rows in seconds in situ - all without loading the data into a database.
  • the csv 1.4.0 crate has been tuned further to squeeze out even higher throughput - already ~2 million rows per second!1

These improvements prepare the ground for the upcoming MCP server on qsv pro, which will enable at-scale, configurable, interactive "Data Steward-in-the-loop", value-added FAIRification of privacy-sensitive files.

The qsv pro MCP server will handle not just CSVs but also other formats, including unstructured data - all processed locally on the desktop, without sending your raw data to the cloud.

It will produce AI-ready, standards-compliant metadata (starting with DCAT-US v3, Croissant and schema.org) - ideal context for AI applications and data governance efforts alike.


Added

  • frequency: add --pretty-json option c67fd06
  • frequency: add --rank-strategy option #3075
  • frequency: add -null-text option #3082

Changed

  • describegpt: explicitly use frequency's dense rank strategy dc3f270
  • describegpt: allow --prompt to be loaded from a text file b11a10c
  • describegpt: use much faster BLAKE3 hash for cache key
  • frequency: change default rank-strategy from min (AKA "1224" ranking) to dense (AKA "1223" ranking)
  • lens: bumped csvlens from 0.13.0 to 0.14.0
  • lens: automatically set to monochrome mode when using --find option 8539869
  • luau: bumped embedded Luau from 0.694 to 0.697 3e68e29
  • stats: fingerprint hash now uses much-faster, parallelizable BLAKE3 instead of SHA256
  • table: document that it also creates "aligned TSVs" and Fixed Width Format files aaa84b0
  • tests: change default Python to 3.13
  • docs: documented that Extended Input Support (🗄️) does .zip auto-decompression
  • docs: documented Limited Extended Input Support (🗃️)
  • use latest qsv-tuned csv crate with performance optimizations
  • build(deps): bump flate2 from 1.1.4 to 1.1.5 by @dependabot[bot] in #3071
  • build(deps): bump human-panic from 2.0.3 to 2.0.4 by @dependabot[bot] in #3077
  • deps: bump Polars from 0.51.0 at py-1.35.0-beta.1 to 0.52.0 618edf0
  • build(deps): bump qsv-stats from 0.39.1 to 0.40.0 by @dependabot[bot] in #3078
  • build(deps): bump actions/upload-artifact from 4 to 5 by @dependabot[bot] in #3074
  • applied several clippy lint suggestions
  • bumped several indirect dependencies
  • align nightly to 2025-10-24, the same nightly as Polars
  • bumped MSRV to Rust 1.91

Fixed

  • describegpt: add SQL escaping to eliminate SQL injection attack vector; add .csv extension to --sql-output when Polars SQL query runs successfully ad52a35
  • frequency: fix --select option always returning <ALL_UNIQUE> #3082
  • fixed some publishing workflows

Removed

  • Removed SHA256 and replaced with mush faster, parallelizable BLAKE3 hash #3072 and #3080
  • publish: removed maximize-build-space step in workflows as it was not working as advertised
  • tests: removed target-cpu=native RUSTFLAG in CI tests to avoid intermittent SIGILL (Illegal Instruction) faults

Full Changelog: 8.1.1...9.1.0

  1. see validate_no_schema benchmark

8.1.1

22 Oct 02:28

Choose a tag to compare

[8.1.1] - 2025-10-22

Added

  • docs: Seeded developer documentation for index/stats/frequency modules by @kulnor in #3056

Changed

  • deps: use latest version of qsv-tuned csv crate 7523e08
  • deps: unpin zip from 4.6 and bump to 6 now that geosuggest uses it 957ad6d
  • build(deps): bump dns-lookup from 3.0.0 to 3.0.1 by @dependabot[bot] in #3057
  • build(deps): bump geosuggest-utils from 0.8.0 to 0.8.1 by @dependabot[bot] in #3058
  • build(deps): bump geosuggest-core from 0.8.0 to 0.8.1 by @dependabot[bot] in #3059
  • build(deps): bump memmap2 from 0.9.8 to 0.9.9 by @dependabot[bot] in #3060
  • build(deps): bump pyo3 from 0.27.0 to 0.27.1 by @dependabot[bot] in #3061
  • tweaked several publishing and test GH Actions workflows
  • applied clippy::to_string_in_format_args lint suggestion
  • bumped several indirect dependencies

Fixed

  • use latest csvlens patched fork that fixes panic when using stdin input 34154e6

New Contributors

Full Changelog: 8.1.0...8.1.1

8.1.0

20 Oct 10:43

Choose a tag to compare

[8.1.0] - 2025-10-20

This minor release features:

  • qsv on IBM Z mainframes (s390x)! - now that we have endianness detection, even adding a prebuilt binary for it.
  • describegpt: Output Kind and Token Usage have been added to the output making it easier to parse responses and track LLM costs.
  • python: with the latest pyO3.rs 0.27 crate, we're setting the stage to drop support for Python 3.12 and below, targeting free-threaded Python exclusively starting with the 9.0 release. This should allow us to massively boost performance by parallelizing py workloads.
    It will also power the upcoming FAIRification commands.
  • a tuned csv fork based on the just released csv 1.4 crate, increasing performance suite-wide.

Added

  • describegpt: add Kind and Token Usage to output a21e117
  • add big-endian handling for big-endian platforms (e.g. s390x-unknown-linux-gnu) #3045
  • add s390x prebuilt binary (qsv now runs on IBM Z Mainframes!) a3f455c

Changed

  • datefmt: Replace localzone crate with iana-time-zone crate #3048
  • geoconvert: Improved with the latest geozero fixes needed for Datapusher+ processing of GeoJSON and SHP files.
  • python: micro-optimize to remove unnecessary clone; use more idiomatic error_result handling - 777aa14
  • docs: update badges with PowerPC Linux GNU, Windows ARM64 MSVC, remove macOS Intel by @rzmk in #3036
  • deps: bump bitflags from 2.9.4 to 2.10.0 8d65c1b
  • deps: bumped csv crate to 1.4 and reapplied qsv optimizations. For more info, see 4e2f2a0
  • deps: bump csvs_convert patch fork 8aa398f
  • deps: bump geozero to latest upstream with unreleased fixes - 0a9d1b3
  • deps: bump polars to 0.51.0 at py-1.35.0-beta-1 tag
  • deps: bump socket2 from 0.6.0 to 0.6.1
  • deps: bump whatlang to 0.18 e80e9c0
  • build(deps): bump actions/setup-python from 5.0.0 to 6.0.0 by @dependabot[bot] in #3030
  • build(deps): bump actix-governor from 0.8.0 to 0.10.0 by @dependabot[bot] in #3046
  • build(deps): bump gzp from 1.0.1 to 2.0.0 by @dependabot[bot] in #3033
  • build(deps): bump github/codeql-action from 3 to 4 by @dependabot[bot] in #3034
  • build(deps): bump flexi_logger from 0.31.4 to 0.31.5 by @dependabot[bot] in #3032
  • build(deps): bump flexi_logger from 0.31.5 to 0.31.6 by @dependabot[bot] in #3035
  • build(deps): bump flexi_logger from 0.31.6 to 0.31.7 by @dependabot[bot] in #3038
  • build(deps): bump libc from 0.2.176 to 0.2.177 by @dependabot[bot] in #3040
  • build(deps): bump pyo3 from 0.26.0 to 0.27.0 by @dependabot[bot] in #3055
  • build(deps): bump qsv_docopt from 1.8.0 to 1.9.0 by @dependabot[bot] in #3041
  • build(deps): bump regex from 1.11.3 to 1.12.1 by @dependabot[bot] in #3043
  • build(deps): bump regex from 1.12.1 to 1.12.2 by @dependabot[bot] in #3050
  • build(deps): bump reqwest from 0.12.23 to 0.12.24 by @dependabot[bot] in #3049
  • build(deps): bump rust_decimal from 1.38.0 to 1.39.0 by @dependabot[bot] in #3047
  • build(deps): bump simd-json from 0.16.0 to 0.17.0 by @dependabot[bot] in #3031
  • build(deps): bump tikv-jemallocator from 0.6.0 to 0.6.1 by @dependabot[bot] in #3053
  • build(deps): bump tokio from 1.47.1 to 1.48.0 by @dependabot[bot] in #3052
  • applied select clippy lint suggestions
  • updated indirect dependencies

Fixed

  • headers: fix stdin handling without explicit - for stdin input #3039

Removed

  • removed Python 3.10 prebuilts as py03 0.27 no longer supports it and Python 3.10 is no longer maintained
  • deps: removed patched fork of time-rs now that 0.3.43 has been released fde03b3

Full Changelog: 8.0.0...8.1.0

8.0.0

06 Oct 00:43

Choose a tag to compare

[8.0.0] - 2025-10-06

FAIRdataAIREADYdataBanner1
Findable, Accessible, Interoperable & Reusable (FAIR) Data is AI-Ready Data.

A week and a half after launching our "People's API" AI Chatbot and "AI-Ready" service, we fine-tune qsv further, as it powers the FAIRification engine that allows us to "open your data" (as a verb) - to infer and calculate AI-Ready, FAIR metadata at blazing speed even for large datasets.

This release features:

These changes set the stage for even more advanced, powerful, configurable FAIRification capabilities to

make ALL your Data AI-Ready, Useful, Usable & Used by Machines & Humans alike.

Added

  • table: add leftendtab alignment option #3004
  • table: add leftfwf (Fixed Width Format) alignment option 590c861
  • validate: add Extended Input Support to RFC 4180 validation mode #3012
  • added PowerPC64 LE Linux prebuilt

Changed

  • describegpt: fine-tuned default LLM Prompt template (v3.1.0) 00e52a3 6b09b7e 5be7f2e
  • luau: bump embedded Luau from 0.690 to 0.693 #3017
  • schema: make Decimal Type Scale configurable for polars schema with QSV_POLARS_DECIMAL_SCALE env var - f20edd5
  • updated optimized csv crate, adding non-allocating StringRecord::trim() and more inline()s 4a1c82a
  • deps: bump calamine to 0.31.0 bd7a04c
  • deps: Bump polars to 0.51.0 from 0.50.0 at py-1.33.1 tag #2995
  • deps: bump polars to 0.51.0 at py-1.34.0-beta.4 tag at revision b973cac (latest upstream) #3022
  • deps: bump polars to 0.51.0 at py-1.35.0 tag revision b973cac 4164875
  • deps: replace tabwriter with renamed fork qsv-tabwriter #3010
  • deps: use patched fork of whatlang-rs. Though our PR was merged, there is still no new release 6afff4f
  • build(deps): bump base62 from 2.2.2 to 2.2.3 by @dependabot[bot] in #3003
  • build(deps): bump bytemuck from 1.23.2 to 1.24.0 by @dependabot[bot] in #3026
  • build(deps): bump chrono from 0.4.41 to 0.4.42 by @dependabot[bot] in #2974
  • build(deps): bump fancy-regex from 0.16.1 to 0.16.2 by @dependabot[bot] in #3000
  • build(deps): bump flate2 from 1.1.2 to 1.1.3 by @dependabot[bot] in #3027
  • build(deps): bump flexi_logger from 0.31.2 to 0.31.3 by @dependabot[bot] in #3005
  • build(deps): bump flexi_logger from 0.31.3 to 0.31.4 by @dependabot[bot] in #3008
  • build(deps): bump indexmap from 2.11.0 to 2.11.1 by @dependabot[bot] in #2973
  • build(deps): bump indexmap from 2.11.1 to 2.11.3 by @dependabot[bot] in #2993
  • build(deps): bump indexmap from 2.11.3 to 2.11.4 by @dependabot[bot] in #2999
  • build(deps): bump libc from 0.2.175 to 0.2.176 by @dependabot[bot] in #3009
  • build(deps): bump mlua from 0.11.3 to 0.11.4 by @dependabot[bot] in #3021
  • build(deps): bump regex from 1.11.2 to 1.11.3 by @dependabot[bot] in #3011
  • build(deps): bump redis from 0.32.5 to 0.32.6 by @dependabot[bot] in #3016
  • build(deps): bump qsv-stats from 0.38.0 to 0.39.0 by @dependabot[bot] in #3028
  • build(deps): bump qsv-stats from 0.39.0 to 0.39.1 by @dependabot[bot] in #3029
  • build(deps): bump redis from 0.32.6 to 0.32.7 by @dependabot[bot] in #3025
  • build(deps): bump serde from 1.0.219 to 1.0.223 by @dependabot[bot] in #2983
  • build(deps): bump serde from 1.0.223 to 1.0.224 by @dependabot[bot] in #2988
  • build(deps): bump serde from 1.0.224 to 1.0.225 by @dependabot[bot] in #2994
  • build(deps): bump serde from 1.0.225 to 1.0.226 by @dependabot[bot] in #3002
  • build(deps): bump serde from 1.0.226 to 1.0.227 by @dependabot[bot] in #3014
  • build(deps): bump serde from 1.0.227 to 1.0.228 by @dependabot[bot] in #3019
  • build(deps): bump serde_json from 1.0.143 to 1.0.145 by @dependabot[bot] in #2981
  • build(deps): bump semver from 1.0.26 to 1.0.27 by @dependabot[bot] in #2982
  • build(deps): bump sysinfo from 0.37.0 to 0.37.1 by @dependabot[bot] in #3015
  • build(deps): bump sysinfo from 0.37.1 to 0.37.2 by @dependabot[bot] in #3024
  • build(deps): bump tempfile from 3.21.0 to 3.22.0 by @dependabot[bot] in #2975
  • build(deps): bump tempfile from 3.22.0 to 3.23.0 by @dependabot[bot] in #3007
  • build(deps): bump toml from 0.9.6 to 0.9.7 by @dependabot[bot] in #3001
  • pin zip to 4.6, as zip 5 has features that are not widely adopted b231a23
  • applied select clippy lint suggestions
  • updated indirect dependencies
  • bumped MSRV to Rust 1.90

Fixed

  • describegpt: init cache vars even when --no-cache is used #2970
  • describegpt: --base-url option being ignored #2977
  • schema: delimiter detection #2998
  • extdedup: really use memmapped ondisk hash table #3020

Removed:

  • removed powerpc64-le cross-compilation directive now that we have access to IBM-provided native PowerPC GH Action runner 9659bfc
  • removed macOS on Intel (x86_64-apple-darwin) prebuilt binaries

Full Changelog: 7.1.0...8.0.0


  1. SangyaPundir, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons https://commons.wikimedia.org/wiki/File:FAIR_data_principles.jpg

7.1.0

06 Sep 16:07
df89a22

Choose a tag to compare

[7.1.0] - 2025-09-06

🇮🇹 csv,conf,v9 edition 🍝

   
csvconfv9-flavor-small Just in time for csv,conf,v9, we're Bologna-bound and will be talking all things qsv, CSV, open data, metadata standards, AI, POSE and CKAN!

For this feature release, we polished describegpt a bit more for the occasion...

Towards the "People's API!"! Verso l'API del Popolo!
(Answering People/Policymaker Interface)

🚀 Enhanced describegpt Command

  • Configurable Frequency Limits: Make frequency distribution limit configurable for better control over data analysis
  • Few-shot Learning: Add --fewshot-examples option to improve LLM response quality with contextual examples
  • Advanced SQL Generation: Fine-tuned SQL generation guidance for better date handling and query optimization
  • Conditional SQL Results: Implement conditional --sql-results format for more efficient "SQL RAG" processing - i.e. if the generated SQL query executes successfully - the results are saved to the specified file with a .csv extension. If a "SQL hallucination" fails, the file is saved with a .sql extension instead for the user to tweak and edit.
  • TogetherAI Support: Add support for TogetherAI models endpoint, expanding LLM provider options
  • Enhanced Error Handling: Improved SQL parsing error handling and more informative error messages
  • Disk Cache by Default: The disk cache is now enabled by default for better performance
  • TOML Configuration: Migrate from JSON to more readable TOML format for more easily modifiable prompt files.
    (see https://github.com/dathere/qsv/blob/master/resources/describegpt_defaults.toml)
  • Better Local LLM Support: --api-key can now be set to NONE for local LLM configurations that may not necessarily run on localhost (e.g. a shared Local LLM service running on the local network)

partition Command Enhancements

  • New --limit Option: Implement --limit option to set the maximum number of open files
  • Streaming to Enhanced Batching Logic: Convert from streaming to a simplified, two-pass batched approach designed to partition on columns with high cardinality for very large datasets

Added

  • describegpt: add configurable frequency limit #2950
  • describegpt: migrate prompt file from JSON to more easier to edit TOML format #2954
  • describegpt: refactor default prompt file; add --fewshot-examples option #2955
  • describegpt: add TogetherAI support for models endpoint #2965
  • partition: add --limit option #2960
  • added Windows ARM64 prebuilt binaries

Changed

  • describegpt: enable disk cache by default #2951
  • describegpt: Polars SQL generation tweaks #2958
  • python: replace deprecated with_gil with attach #2949. This sets the stage for "free-threaded" Python 3.14 support when its released in October 2025. Buh-bye GIL!
  • deps: bump embedded Luau from 0.688 to 0.690 #2967
  • deps: bump Polars to 0.50.0 at py-1.33.0 tag
  • build(deps): bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2962
  • build(deps): bump actions/stale from 9 to 10 by @dependabot[bot] in #2963
  • build(deps): bump log from 0.4.27 to 0.4.28 by @dependabot[bot] in #2961
  • build(deps): bump mlua from 0.11.2 to 0.11.3 by @dependabot[bot] in #2948
  • build(deps): bump pyo3 from 0.25.1 to 0.26.0 by @dependabot[bot] in #2946
  • build(deps): bump uuid from 1.18.0 to 1.18.1 by @dependabot[bot] in #2956
  • build(deps): bump zip from 4.5.0 to 4.6.0 by @dependabot[bot] in #2952
  • applied select clippy lints
  • updated indirect dependencies

Full Changelog: 7.0.1...7.1.0