Skip to content

Conversation

@bimoadityar
Copy link
Contributor

What type of PR is this?

Use sync.Pool to get buffer for EscapeHTML and utf8.Validate to decrease allocation. Related to #835 .

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

Which issue(s) this PR fixes:

#835

@AsterDY
Copy link
Collaborator

AsterDY commented Sep 30, 2025

DON'T change the format otherwise it is hard to review.
And please explain your design in detail and show benchmark improvement

@bimoadityar
Copy link
Contributor Author

Reverted gofmt.

For the design, the issue of #835 was caused by:

  • HTMLEscape call inside encodeFinish (internal/encoder/encoder.go:238) was passed nil.
  • Because we did not pass any preallocated bytes, the implementation always allocated to grow the dst bytes to fit the result.

To make the change to reduce this allocation:

  • We get a preallocated bytes from vars.NewBytes to be passed to HTMLEscape
  • We store the result in buf, and we put back the old content of buf to vars.FreeBytes as it is not used anymore
  • We can remove the free logic in Encode as encodeFinish already put back any unused bytes
  • Similar for utf8.CorrectWith

For the benchmark, check out the following benchmark code based on the payload in the linked issue:

func BenchmarkSonicEscapeHTML(b *testing.B) {
	payload := new(infobipResponse)
	err := json.Unmarshal([]byte(data), payload)
	require.NoError(b, err)

	api := sonic.Config{EscapeHTML: true}.Froze()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_, _ = api.Marshal(payload)
	}
}

Here is the result from the old code (48ba40b):

BenchmarkSonicEscapeHTML-8       1976474               603.1 ns/op          2354 B/op          3 allocs/op

Here is the result from the new code (fedc295):

BenchmarkSonicEscapeHTML-8       2506180               502.3 ns/op           924 B/op          2 allocs/op

See the decrease in allocation count and allocated bytes per operation.

@AsterDY
Copy link
Collaborator

AsterDY commented Oct 9, 2025

how about the performace of JSON without HTLM chars? Including both CPU and MEM

@bimoadityar
Copy link
Contributor Author

how about the performace of JSON without HTLM chars? Including both CPU and MEM

The previous result payload does not contain any HTML chars. From my understanding, currently the EscapeHTML logic will be executed with the allocation, regardless whether the payload contain any HTML chars or not.

Here is another result I got when running the experiment using sensitive payload:

  • from the old code:
BenchmarkSonicEscapeHTML-8       1877959               639.2 ns/op          2484 B/op          3 allocs/op
  • from the new code:
BenchmarkSonicEscapeHTML-8       2222899               545.5 ns/op          1054 B/op          2 allocs/op

@AsterDY AsterDY merged commit 72bd4ca into bytedance:main Oct 10, 2025
32 checks passed
nmaupu pushed a commit to nmaupu/sonic that referenced this pull request Oct 21, 2025
nmaupu pushed a commit to nmaupu/sonic that referenced this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants