csvops is a Ruby CLI for working with CSV data through guided workflows and direct commands, so you can extract columns and rows, randomize rows, split files, compare parity, de-duplicate across files, and generate high-level stats.
It is designed for practical use with interactive prompts, file-based outputs, and an architecture that keeps workflows composable and testable.
- Ruby 3.3.0
- Bundler
rakeminitest
Install dependencies:
bundle installcsvtool menuWith Bundler:
bundle exec csvtool menuCSV Tool Menu
1. Extract column
2. Extract rows (range)
3. Randomize rows
4. Dedupe using another CSV
5. Validate parity
6. Split CSV into chunks
7. CSV stats summary
8. Exit
>
Select 1 for column extraction, 2 for row-range extraction, 3 for row randomization, 4 for cross-CSV dedupe, 5 for parity validation, 6 for CSV splitting, or 7 for CSV stats.
Each action asks only for what it needs (file path, separator, and any action-specific options), then prints results to the console or writes to a file when selected.
Typical prompt pattern:
- choose source file(s)
- choose separator/header options when relevant
- choose action-specific options
- choose output destination (console or file)
For architecture and internal design details, see:
Legend: = prompt/menu, + = user input, - = tool output
CSV file path: /path/to/file.csv
Choose separator:
1. comma (,)
2. tab (\t)
3. semicolon (;)
4. pipe (|)
5. custom
+Separator choice [1]: 1
Filter columns (optional):
Select column:
1. name
2. city
+Column number: 1
Skip blank values? [Y/n]:
Preview (first 3 values):
-Alice
-Bob
-Cara
Print all values? [y/N]:
+y
Output destination:
1. console
2. file
+Output destination [1]: 1
-Alice
-Bob
-Cara Output destination:
1. console
2. file
+Output destination [1]: 2
+Output file path: /tmp/names.csv
-Wrote output to /tmp/names.csvExtract a column without using the interactive menu:
csvtool column /path/to/file.csv column_nameWith Bundler:
bundle exec csvtool column /path/to/file.csv column_nameGet CSV stats directly (default text output):
csvtool stats /path/to/file.csvOptional output format and color mode:
csvtool stats /path/to/file.csv --format json
csvtool stats /path/to/file.csv --format csv
csvtool stats /path/to/file.csv --color auto
csvtool stats /path/to/file.csv --color always
csvtool stats /path/to/file.csv --color neverLegend: = prompt/menu, + = user input, - = tool output
CSV Tool Menu
1. Extract column
2. Extract rows (range)
3. Randomize rows
4. Dedupe using another CSV
5. Validate parity
6. Split CSV into chunks
7. CSV stats summary
8. Exit
+> 4
CSV file path: /tmp/source.csv
Source CSV separator:
Choose separator:
1. comma (,)
2. tab (\t)
3. semicolon (;)
4. pipe (|)
5. custom
+Separator choice [1]: 1
Source headers present? [Y/n]:
Reference CSV file path: /tmp/reference.csv
Reference CSV separator:
Choose separator:
1. comma (,)
2. tab (\t)
3. semicolon (;)
4. pipe (|)
5. custom
+Separator choice [1]: 1
Reference headers present? [Y/n]:
Source key column name: customer_id
Reference key column name: external_id
Trim whitespace before matching? [Y/n]:
Case-insensitive matching? [y/N]:
Output destination:
1. console
2. file
+Output destination [1]: 1
-
-customer_id,name
-1,Alice
-3,Cara
-Summary: source_rows=5 removed_rows=3 kept_rows=2Legend: = prompt/menu, + = user input, - = tool output
CSV Tool Menu
1. Extract column
2. Extract rows (range)
3. Randomize rows
4. Dedupe using another CSV
5. Validate parity
6. Split CSV into chunks
7. CSV stats summary
8. Exit
+> 5
Left CSV file path: /tmp/left.csv
Right CSV file path: /tmp/right.csv
Choose separator:
1. comma (,)
2. tab (\t)
3. semicolon (;)
4. pipe (|)
5. custom
+Separator choice [1]: 1
Headers present? [Y/n]:
-MISMATCH
-Summary: left_rows=10 right_rows=10 left_only=2 right_only=2
-Left-only examples:
- 4,Dina (count +1)
-Right-only examples:
- 4,Dina-Updated (count +1)- Parity uses a streaming count-delta strategy:
- Stream left rows and increment row-key counts.
- Stream right rows and decrement row-key counts.
- Exact duplicate semantics are preserved by count deltas per normalized row value.
- Memory scales with the number of distinct row keys in the parity map, not the total input row count.
Legend: = prompt/menu, + = user input, - = tool output
CSV Tool Menu
1. Extract column
2. Extract rows (range)
3. Randomize rows
4. Dedupe using another CSV
5. Validate parity
6. Split CSV into chunks
7. CSV stats summary
8. Exit
+> 6
Source CSV file path: /tmp/people.csv
Choose separator:
1. comma (,)
2. tab (\t)
3. semicolon (;)
4. pipe (|)
5. custom
+Separator choice [1]: 1
Headers present? [Y/n]:
+Rows per chunk: 1000
Output directory [/tmp]:
Output file prefix [people]:
Overwrite existing chunk files? [y/N]:
Write manifest file? [y/N]:
-Split complete.
-Chunk size: 1000
-Data rows: 25000
-Chunks written: 25
-/tmp/people_part_001.csvLegend: = prompt/menu, + = user input, - = tool output
CSV Tool Menu
1. Extract column
2. Extract rows (range)
3. Randomize rows
4. Dedupe using another CSV
5. Validate parity
6. Split CSV into chunks
7. CSV stats summary
8. Exit
+> 7
CSV file path: /tmp/people.csv
Choose separator:
1. comma (,)
2. tab (\t)
3. semicolon (;)
4. pipe (|)
5. custom
+Separator choice [1]: 1
Headers present? [Y/n]:
Output destination:
1. console
2. file
+Output destination [1]: 1
-CSV Stats Summary
-Rows: 3
-Columns: 2
-Headers: name, city
-Column completeness:
- name: non_blank=3 blank=0
- city: non_blank=3 blank=0- Stats scanning is streaming (
CSV.foreach), processed in one pass. - Memory grows with per-column aggregates (
column_stats), not with total row count.
Run tests:
rake testOr:
bundle exec rake testCurrent version: 1.0.0
Install from RubyGems:
gem install csvopsRelease runbook:
docs/releases/release-v1.0.0.md
Full architecture and domain documentation lives in: