Lightweight Data Chunking for JavaScript – Chonkify

Category: Javascript | June 6, 2025
Authorushakov-igor
Last UpdateJune 6, 2025
LicenseMIT
Tags
Views31 views
Lightweight Data Chunking for JavaScript – Chonkify

Chonkify is a lightweight chunking utility that transforms arrays, strings, sets, maps, and async iterables into evenly-sized groups with minimal overhead.

It handles splitting data structures into manageable chunks while maintaining proper Unicode support and async compatibility.

Features:

  • Works with arrays, strings, buffers, sets, maps, and typed arrays
  • Async iterable support with for await loops
  • Proper Unicode grapheme cluster handling for complex emojis
  • TypeScript-ready with ESM-first design

Use Cases:

  • Batch API Processing: Split large arrays of API requests into smaller batches to avoid rate limits and memory issues
  • Data Stream Processing: Handle large streaming datasets by processing them in manageable chunks rather than loading everything into memory
  • Text Processing: Break down large text strings while respecting Unicode boundaries, particularly useful for emoji-heavy content or internationalized text
  • Memory Management: Process large datasets without overwhelming memory by working with smaller chunks sequentially

Installation:

1. Install Chonkify and import it into your project.

# NPM
$ npm install chonkify
import { chonk, chonkAsync, chonkGraphemes } from 'chonkify';

API:

1. chonk(iterable, size): This is the general-purpose chunker. It takes any standard iterable and splits it. For strings, it operates on UTF-16 code points, which is standard for JavaScript but can lead to issues with complex emojis.

// Chunks an array of numbers
chonk([1, 2, 3, 4, 5], 2);
// Returns: [[1, 2], [3, 4], [5]]
// Chunks a simple string
chonk('abcdef', 3);
// Returns: ['abc', 'def']

2. chonkGraphemes(string, size): This function is specifically for strings where Unicode accuracy matters. It splits by grapheme clusters, not code points. This correctly handles emojis, flags, and other composite characters. In my projects, I default to this for any string that might come from user input.

// This emoji is a single grapheme made of multiple code points
const familyEmoji = '๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ';
// Standard chonk would break it
chonk(familyEmoji, 1); // Incorrect output
// chonkGraphemes handles it correctly
chonkGraphemes(familyEmoji, 1);
// Returns: ['๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ']

3. chonkAsync(asyncIterable, size): Use this for asynchronous data sources like file streams or paginated API responses wrapped in an async generator. It returns an async iterator that yields chunks as they become available.

async function* numberStream() {
  for (let i = 0; i < 10; i++) {
    await new Promise(res => setTimeout(res, 100)); // Simulate network latency
    yield i;
  }
}
// Process the stream in chunks of 3
for await (const batch of chonkAsync(numberStream(), 3)) {
  console.log(batch);
  // Logs: [0, 1, 2], then [3, 4, 5], then [6, 7, 8], then [9]
}

You Might Be Interested In:


Leave a Reply