Remove serializer todos and add comments#37170
Conversation
| { | ||
| JsonClassInfo newClassInfo = options.GetOrAddClass(type); | ||
|
|
||
| // todo: check if type==newtype and skip below? |
There was a problem hiding this comment.
Skipping the logic below would only have a minimal impact on perf, so it is not worth the check \ risk of doing that.
| } while (!isFinalBlock); | ||
| } | ||
|
|
||
| // todo: verify that we do want to call FlushAsync here (or above). It seems like leaving it to the caller would be best. |
There was a problem hiding this comment.
This was added during the original design\coding when we discussed a possible callback model (caller must call flush) instead of auto-flush. I don't believe we received any further feedback to add that capability.
...libraries/System.Text.Json/src/System/Text/Json/Serialization/JsonSerializer.Write.Stream.cs
Show resolved
Hide resolved
| { | ||
| // We flush the Stream when the buffer is >=90% of capacity. | ||
| // With the default buffer size of 16K, if a single JSON value is >=1.6K it is possible for | ||
| // the buffer to max out before it can be flushed, causing the buffer to expand. |
There was a problem hiding this comment.
I'm still trying to understand the magic behind the 90% number. The sentence would make just as much sense if it said "We flush the Stream when the buffer is >= 80% of capacity ... if a single JSON value is >= 3.2K it is possible..."... so how was the 90% chosen? Below it says it's a compromise... was this just based on experimentation and this value found to be ideal for a wide-variety of scenarios?
There was a problem hiding this comment.
I'll look at adding some more explanation.
Note that we do check for flushing as much as reasonable since we check for flushing after every property and element is written to the buffer, not whole objects or whole arrays.
Background on tradeoffs:
- The initial buffer size (smaller takes up less RAM). We are currently optimizing for payloads < 16K (the default buffer size).
- Minimizing buffer doubling. There are two reasons why the buffer is doubled:
- To get the buffer big enough to handle the largest single element\property.
- To deal with the buffer being partially used from smaller prior elements\properties that are not yet flushed. The current algorithm is flush at >= .9.
- Buffer utilization % -- i.e. flushing on every small property write would be wasteful for memory utilization plus the underlying Stream may not appreciate having Flush() called so often.
Consider a scenario of reading in large sets of fixed-size elements, such as bytes arrays. Assuming a single JSON element fits into the current buffer already:
- If threshold is
1-(1\(2^0))== 0 there would be no doubling (flush every time) - If threshold is
1-(1\(2^1))== .5 there would be at most 1 doubling - If threshold is
1-(1\(2^2))== .75 there would be at most 2 doublings - If threshold is
1-(1\(2^3))== .875 there would be at most 3 doublings - If threshold is
1-(1\(2^4))== .9375 there would be at most 4 doublings
So at .9 we sit in the "4 doublings" which also leaves some wiggle room for JSON token overhead and other smaller, "normal" properties\elements. We haven't received any direct reports of this threshold causing perf issues.
Some other options considered at the time the original todo was created:
- Add a flush threshold knob (in addition to the existing buffer size knob) on the options class.
- Add a heuristic and auto-control the % based on that (such as guessing based on largest prior size).
Both seem like overkill because:
- There is already a knob on the options class to specify the default buffer size. It is useful to increase this if you know certain payloads are going to be large to avoid some doubling or to adjust to the underlying Stream's optimal buffer size (if it also has a buffer).
- Since the buffer size is automatically doubled, the buffer will ultimately expand a max of 4 times (for fixed-size elements\properties) which seems like a reasonable number.
Finally we also considered a callback mechanism to have the caller do the flush, instead of "auto-flushing" like we do today. So far we haven't received any requests for that (AFAIK).
Non-functional changes. Remove the "todos" and add appropriate comments.
fixes #32356
fixes #32358