Skip to content

Commit b443ccc

Browse files
jqnatividadclaude
andcommitted
docs: STATS_DEFINITION.md comprehensive update
Key Updates: 1. stats section corrections: - Added sqlp and joinp to the list of "smart" commands that use the stats cache - Fixed the skewness formula to match actual implementation: (q3 - (2.0 * q2) + q1) / iqr - Added information about memory-aware chunking and the QSV_STATS_CHUNK_MEMORY_MB environment variable 2. moarstats section enhancements: - Added new "Bivariate Statistics" section documenting the 6 bivariate statistics: - Pearson correlation - Spearman correlation - Kendall's tau - Sample and population covariance - Mutual information - Normalized mutual information - Added performance optimizations (date parsing cache, string interning, early termination, streaming algorithms) - Documented multi-dataset join capabilities - Updated xsd_type definition to include Gregorian date type detection (gYear, gYearMonth, etc.) with confidence markers (? vs ??) 3. New frequency section: Created comprehensive documentation for the frequency command including: - Frequency table output format (field, value, count, percentage, rank) - Ranking strategies (dense, min, max, ordinal, average) with examples - Weighted frequencies support and weight handling rules - Stats cache integration explaining ID column detection and memory optimization - JSON/TOON output structure with example and list of 17 additional stats - Memory-aware processing with chunking behavior and environment variable configuration [skip ci] Co-Authored-By: Claude <81847+claude@users.noreply.github.com>
1 parent f7644ad commit b443ccc

File tree

1 file changed

+238
-9
lines changed

1 file changed

+238
-9
lines changed

0 commit comments

Comments
 (0)