-
-
Notifications
You must be signed in to change notification settings - Fork 746
SQLite fails when UTF-8 multibyte character is split at exactly 350,000 bytes boundary #3533
Copy link
Copy link
Closed
Labels
Description
Environment
- Operating System: Darwin
- Node Version: v20.19.0
- Nuxt Version: 3.16.2
- Nuxt Content Version: 3.6.0
- Nuxt Sitemap Version: 7.4.1
- Better-SQLite3 Version: 11.10.0
- Package Manager: yarn@4.2.2
- Builder: -
- User Config: -
- Runtime Modules: -
- Build Modules: -
Version
v3
Reproduction
- Create a
sample.jsonfile in thecontentdirectory - Ensure that a UTF-8 multibyte character (e.g., Japanese character "心" = 0xE5 0xBF 0x83) starts at byte position 349,999
json structure
{
"data": [
{
"id": "1",
"content": "... [text padding to reach exactly 349,999 bytes] ... 中心 ..."
}
]
}nuxt.config.ts
import { defineContentConfig, defineCollection, z } from '@nuxt/content'
export default defineContentConfig({
collections: {
content: defineCollection({
type: 'data',
source: '**/*.json',
schema: z.object({}),
}),
},
})Description
When a UTF-8 multibyte character boundary falls exactly at byte position 350,000 in a JSON file, Nuxt Content's SQLite implementation fails with:
SQLite3Error: SQLITE_ERROR: unrecognized token: "[token_hash]"
And on the client side:
Unterminated string in JSON at position 349999 (line 1 column 350000)
Additional context
Investigation Results
Proof:
- Character "心" in UTF-8: 0xE5 0xBF 0x83 (3 bytes)
- At position 349,999: 0xE5 (first byte of "心")
- At position 350,000: 0xBF (second byte of "心") <- Buffer cuts here
- At position 350,001: 0x83 (third byte of "心")
Verification
Tested with multiple files:
- File A: 685,647 bytes total, multibyte character at position 350,000 → ❌ Error
- File B: 3,150,308 bytes total, ASCII character at position 350,000 → ✅ Success
- File C: 3,751,769 bytes total, ASCII character at position 350,000 → ✅ Success
The error only occurs when position 350,000 falls within a multibyte UTF-8 character.
Logs
Reactions are currently unavailable