Skip to content

SQLite fails when UTF-8 multibyte character is split at exactly 350,000 bytes boundary #3533

@purupurupu

Description

@purupurupu

Environment

  • Operating System: Darwin
  • Node Version: v20.19.0
  • Nuxt Version: 3.16.2
  • Nuxt Content Version: 3.6.0
  • Nuxt Sitemap Version: 7.4.1
  • Better-SQLite3 Version: 11.10.0
  • Package Manager: yarn@4.2.2
  • Builder: -
  • User Config: -
  • Runtime Modules: -
  • Build Modules: -

Version

v3

Reproduction

  1. Create a sample.json file in the content directory
  2. Ensure that a UTF-8 multibyte character (e.g., Japanese character "心" = 0xE5 0xBF 0x83) starts at byte position 349,999

json structure

{
  "data": [
    {
      "id": "1",
      "content": "... [text padding to reach exactly 349,999 bytes] ... 中心 ..."
    }
  ]
}

nuxt.config.ts

import { defineContentConfig, defineCollection, z } from '@nuxt/content'

export default defineContentConfig({
  collections: {
    content: defineCollection({
      type: 'data',
      source: '**/*.json',
      schema: z.object({}),
    }),
  },
})

Description

When a UTF-8 multibyte character boundary falls exactly at byte position 350,000 in a JSON file, Nuxt Content's SQLite implementation fails with:

SQLite3Error: SQLITE_ERROR: unrecognized token: "[token_hash]"

And on the client side:
Unterminated string in JSON at position 349999 (line 1 column 350000)

Additional context

Investigation Results

Proof:

  • Character "心" in UTF-8: 0xE5 0xBF 0x83 (3 bytes)
  • At position 349,999: 0xE5 (first byte of "心")
  • At position 350,000: 0xBF (second byte of "心") <- Buffer cuts here
  • At position 350,001: 0x83 (third byte of "心")

Verification

Tested with multiple files:

  • File A: 685,647 bytes total, multibyte character at position 350,000 → ❌ Error
  • File B: 3,150,308 bytes total, ASCII character at position 350,000 → ✅ Success
  • File C: 3,751,769 bytes total, ASCII character at position 350,000 → ✅ Success

The error only occurs when position 350,000 falls within a multibyte UTF-8 character.

Logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions