-
Notifications
You must be signed in to change notification settings - Fork 356
Description
Hi,
Sorry for taking a while to report this. It took me some time to find the JSON (in about 2.5TB of data) and then I realised I should probably trim it down to the relevant section.
I am unable to parse the following JSON with simplejson 3.1.0 under Linux. This JSON is taken from a result set from ElasticSearch and obviously trimmed down to be as small as possible.
{"a": "\ud8e9"}
I done some testing on a fresh vagrant Ubuntu Precise 64bit virtual machine but previously we have been seeing the error with lucid on EC2.
The error get is;
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/__init__.py", line 398, in load
use_decimal=use_decimal, **kw)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/__init__.py", line 454, in loads
return _default_decoder.decode(s)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/decoder.py", line 374, in decode
obj, end = self.raw_decode(s)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/decoder.py", line 393, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Unpaired high surrogate: line 1 column 8 (char 7)
If I disable the speedups with simplejson._toggle_speedups(False) I get a (possibly) slightly more helpful error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/__init__.py", line 398, in load
use_decimal=use_decimal, **kw)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/__init__.py", line 454, in loads
return _default_decoder.decode(s)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/decoder.py", line 374, in decode
obj, end = self.raw_decode(s)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/decoder.py", line 393, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/scanner.py", line 119, in scan_once
return _scan_once(string, idx)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/scanner.py", line 90, in _scan_once
_scan_once, object_hook, object_pairs_hook, memo)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/decoder.py", line 198, in JSONObject
value, end = scan_once(s, end)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/scanner.py", line 87, in _scan_once
return parse_string(string, idx + 1, encoding, strict)
File "/home/vagrant/.virtualenvs/a6e8a75c0b63d8f2/local/lib/python2.7/site-packages/simplejson/decoder.py", line 118, in py_scanstring
raise JSONDecodeError(msg, s, end)
simplejson.scanner.JSONDecodeError: Unpaired high surrogate: line 1 column 8 (char 7)
However, I'm not totally sure where the blame lies here. I've done a bit of research and this is what I've found.
- I mentioned before I only have this problem under Linux. Locally on my mac it works fine. I've not been able to test Windows. I was pointed to this issue by @bigkevmcd: http://bugs.python.org/issue11489 which seems relevant but not conclusive.
- I have tested this file with a few other languages for comparison - works fine in Java and Ruby. JavaScript seems to mostly work, although I noticed some display issues in the Chrome console.
- I have discovered some other JSON documents cause the same error however they do work when I turn off the speedups. I've not been logging these as I'm turning off the speedups and trying again and then only logging those that still fail. I'll do another run (probably not until Monday now) and add logging of these so I can provide them.
- In my tweet I was confused why it worked, failed and then worked after a pip re-install. This was with a different document and because simplejson was originally installed on the server with a package. Then when I pip installed it failed to install speedups - thus I "disabled" them without realising.
So, yeah, that's what I've found. I'm not sure how helpful it is. It does seem that the JSON that work without c speedups (but not with) only are causing a bug but it also seems to me that unpaired surrogate issues should either always work or always not work...