Skip to content

encoding/json/jsontext: add errors to Token accessors for numbers #77666

@huww98

Description

@huww98

Proposal Details

This was proposed in go-json-experiment/json#158 . An implementation of this proposal is also present there.

Background

I was using jsontext to implement a Unmarshaler for my type. But I found the The current jsontext.Token APIs about number parsing are confusing and easy to misuse.

See also #63397 (comment)

At first glance, it seems very natural to use Token.Float() in the custom unmarshaler. However, it is not.

Specifically, I'm not satisfied with the following behavior (and cannot be disabled):

token, _ := jsontext.NewDecoder(bytes.NewReader([]byte("1e500"))).ReadToken()
token.Float() // returns 1.7976931348623157e+308
token, _ := jsontext.NewDecoder(bytes.NewReader([]byte(`"Infinity"`))).ReadToken()
token.Float() // returns +Inf

These are not standard, and not consistent with the json package itself. But a new user of jsontext package may not realize this, and finally implement an undesired unmarshaler. e.g. unmarshal {"total":1e1000,"used":1e500} into

type Amount struct {
	Total float64 `json:"total"`
	Used  float64 `json:"used"`
}

Total - Used could return 0 silently if the unmarshaler is using Token.Float()

So I think Token.Float(), etc. are not suitable for custom unmarshaler.

Proposed API Changes

Separate API for decoding and encoding

flowchart LR

Decoder -->|ReadToken| RawToken -->|jsontext.Raw| Token -->|WriteToken| Encoder
RawToken -->|ParseInt| int((int64)) -->|jsontext.Int| Token
Loading

Added:

// RawToken is similar to [Token], and is returned by [Decoder.ReadToken].
//
// Use [Raw] to convert it to [Token] for [Encoder.WriteToken].
type RawToken struct {...}

func (RawToken) ParseFloat(bits int) (float64, error)
func (RawToken) ParseInt(bits int) (int64, error)
func (RawToken) ParseUint(bits int) (uint64, error)

// Raw wraps a [RawToken] as [Token], for passing through the freshly decoded token to [Encoder].
func Raw(RawToken) Token

// Raw returns the [RawToken] embedded.
// It panics if the token is not created with [Raw].
func (t Token) Raw() RawToken {

ParseXxx functions should work like strconv.ParseXxx. The only difference is that it only needs to support JSON format, so that the implementation may be more efficient. Compared with the current Token.Float() etc., it will not silently exhibit any non-standard behavior:

  • It will not parse "Infinity" as float value Infinity (error should be returned instead)
  • It will not parse 1e500 as math.MaxFloat64 or math.MaxInt64 (error should be returned instead)
  • It will not truncate fractional component when parsing int (error should be returned instead)

Changed: Decoder.ReadToken should now return RawToken instead of Token

// ReadToken reads the next [RawToken], advancing the read offset.
// The returned token is only valid until the next Peek, Read, or Skip call.
// It returns [io.EOF] if there are no more tokens.
func (d *Decoder) ReadToken() (RawToken, error)

Behavior changes to existing APIs:

When passing an Infinity, -Infinity, NaN float token to Encoder.WriteToken, it should now return an error, rather than writing JSON string "Infinity", "-Infinity", "NaN", which is not standard.

The following functions are greatly simplified to only act as accessors to Token values passed to corresponding constructor. Much like reflect.Value.Float()

// Float returns the floating-point value for a JSON number.
// It panics if the token is not created with [Float].
func (Token) Float() float64
// Int returns the signed integer value for a JSON number.
// It panics if the token is not created with [Int].
func (Token) Int() int64
// Uint returns the unsigned integer value for a JSON number.
// It panics if the token is not created with [Uint].
func (Token) Uint() uint64

As a result of this change, only the passthrough use case will become a little more complex (need enc.WriteToken(jsontext.Raw(tok)), rather than enc.WriteToken(tok). I think this use case is very rare.

Other currently supported patterns that are disabled by this proposal, such as jsontext.Float(123.456).Int(), should not have valid use cases, IMO.

RawToken returned by decoder does not have Float() method, so it is impossible to misuse.

Convertion between JSON string "Infinity", "-Infinity", "NaN" and Go float is now an encoding/json feature exclusively (controlled by nonfinite format option). jsontext package will not do such convertion. User can write his own convert function easily.

Alternatives

Use strconv

tok, _ := dec.ReadToken()
f, err := strconv.ParseFloat(tok.String(), 64)

It will take non-trivial efforts to realize that JSON number is a subset of Go float. This usage is very hard to discover.
And we cannot take the advantage of that json-float is actually simpler than go-float (in the future).

Add ParseFloat method to Token

ParseXxx is actually only useful for decoding. But if added to existing Token type, we will be forced to determine what to do with jsontext.Float(123.456).ParseInt().

cc @dsnet

Metadata

Metadata

Assignees

No one assigned

    Labels

    LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a toolProposal

    Type

    No type

    Projects

    Status

    Implementing

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions