ASCIIPropertyListParser: handle non-7b-ASCII chars by matvore · Pull Request #47 · 3breadt/dd-plist

matvore · 2018-08-17T23:27:32Z

Currently, ASCIIPropertyListParser takes bytes[] and then pads the bytes
with an extra 00 byte to get UTF-16. If the byte is >= 0x80, then it
pads it with 0xff. This means that if the bytes are in the 7-bit ASCII
range, everything is fine. But if not, 0x80 for example becomes 0xff80,
(half-width TA katakana) which I don't believe corresponds to any real
encoding system.

The options are to:

convert using the default system encoding
convert using UTF-8

I think UTF-8 is a better default. The default system encoding is
good for backwards compatibility, but this feature (non-7-bit ASCII)
has never worked at all before, so that's not really necessary. This can
also be made configurable if the need presents itself.

Currently, ASCIIPropertyListParser takes bytes[] and then pads the bytes with an extra 00 byte to get UTF-16. If the byte is >= 0x80, then it pads it with 0xff. This means that if the bytes are in the 7-bit ASCII range, everything is fine. But if not, 0x80 for example becomes 0xff80, (half-width TA katakana) which I don't believe corresponds to any real encoding system. The options are to: - convert using the default system encoding - convert using UTF-8 I think UTF-8 is a better default. The default system encoding is good for backwards compatibility, but this feature (non-7-bit ASCII) has never worked at all before, so that's not really necessary. This can also be made configurable if the need presents itself.

3breadt · 2018-08-19T09:47:14Z

I didn't know char casting did that, that was not intended behavior.

So I redesigned the approach for parsing ASCII property list. It now works on a char array instead of a byte array. An encoding can be specified explicitly, otherwise the parser attempts to detect it (UTF-8, UTF-16, UTF-32 or ASCII). I created a feature branch for this reworked parser: https://github.com/3breadt/dd-plist/tree/asciipropertylist-configurable-encoding

What do you think?

matvore · 2018-08-19T17:59:19Z

That's great! That commit would definitely fit my requirements.

matvore closed this Aug 19, 2018

3breadt mentioned this pull request Aug 21, 2018

Improve handling of text encoding of ASCII property lists that are not actually ASCII #48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASCIIPropertyListParser: handle non-7b-ASCII chars#47

ASCIIPropertyListParser: handle non-7b-ASCII chars#47
matvore wants to merge 1 commit into
3breadt:masterfrom
matvore:nonasc

matvore commented Aug 17, 2018

Uh oh!

3breadt commented Aug 19, 2018

Uh oh!

matvore commented Aug 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matvore commented Aug 17, 2018

Uh oh!

3breadt commented Aug 19, 2018

Uh oh!

matvore commented Aug 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants