Skip to content

bug: unicode characters in responses are sometimes corrupted v4 (node) #1161

@vschoettke

Description

@vschoettke

Response text that contains unicode characters is corrupted on large responses when using within node. This behaviour can reproduced in algoliasearch v4, version 3 works fine.

I boiled the problem down to the NodeHttpRequester which simply adds buffer chunks to a string which leads to problems if the buffer only contains part of the unicode character. e.g.:

> a = Buffer.from("öäü")
<Buffer c3 b6 c3 a4 c3 bc>
> b = Buffer.from([0xc3, 0xb6, 0xc3])
<Buffer c3 b6 c3>
> c = Buffer.from([0xa4, 0xc3, 0xbc])
<Buffer a4 c3 bc>
> d = "" + b + c;  // <--- This is the current implementation in @algolia/node-http-requester
'ö��ü'
> Buffer.concat([b,c]);
<Buffer c3 b6 c3 a4 c3 bc>
> Buffer.concat([b,c]).toString()
'öäü'

I already created a fix and will make a pull request shortly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions