Add the option to iteratively encode JSON.#29
Conversation
|
This is related to matrix-org/synapse#6998. |
richvdh
left a comment
There was a problem hiding this comment.
A few thoughts here.
Nowadays, the speed benefit of setting ensure_ascii=True and fixing up afterwards is much reduced on simplejson, and non-existent on stdlib json (see #9 (comment)), and the thing about U+2028 is incorrect as of simplejson 3.14.0.
I think it would be entirely reasonable to drop the _unascii optimisation, with the expectation that anyone looking for optimal performance should use stdlib json.
I also suspect (without evidence) that it is more efficient to do a single str.encode("utf-8") on a larger result than it is to do many of them on smaller results.
In short: can we avoid the for/yield loops? (maybe we should just make _canonical_encoder and _pretty_encoder public?)
Just to make sure I understand -- this would require bumping the minimum version of simplejson to 3.14.0 for correctness? I'll check this as a separate PR.
We could also define the interface for the new functions as returning strings, not bytes, that would just differ from the old interface. 😄 My concern with making |
From some testing locally this seems to be true. See #30 |
4d71cdb to
825a031
Compare
I think this is the remaining open question about this PR. I agree that it is probably more efficient to encode the longer string than a shorter one (although I'll try to do some quick perf checks of this). |
fair point. maybe better to do as you suggest and just define the new functions to return strings. Or possibly: supply two variants: |
I was curious to benchmark this a bit: https://gist.github.com/clokep/20c7cf34006099120bea5bbbb1c76c97 The results with Python 3.7.7 were: I think the tl;dr is that it doesn't matter too much which way we do it? Not sure if this changes your opinion or not! |
interesting. slightly surprising. In that case maybe we just stick with what you've got. Up to you, really. |
I think keeping the return type consistent is nice. We can use |
This expands the APIs available from canonicaljson to include methods that return iterators (essentially calling
iterencodeinstead ofencode).