Skip to content

JSON Marshaling of large responses is excessively expensive #3601

@jacksontj

Description

@jacksontj

What did you do?
Sent a query looking like metricname[3d] to prometheus.

What did you expect to see?
metricname has ~5k labelsets, and from my math I'm expecting to get a large data response (~2G)

What did you see instead? Under which circumstances?
Instead, I seemingly never get a response from prometheus. More interestingly I see a large increase in memory utilization to the point that prometheus stops scraping and eventually OOMs. From digging in more I found that the issue is all to do with how prometheus is marshaling out the response on the HTTP API. With my below test script, we generate 3d worth of data (at a 15s period) for 5k timeseries. We generate that data (~500ms and ~1.2G RAM) and then json marshal that data (~2m -- and consumes ~11G of RAM).

Issues

  • json.Marshal uses significantly more memory than the original (1.2G) and the output (2G) at ~11G
  • requests in json.Marshal aren't cancel-able so that if a request like this ever hits prometheus, it will either complete or cause prometheus to die
  • in addition to the large memory footprint, the marshaling (in the example below) takes ~2m on my desktop -- which is excessive, especially when you consider that the data generation took ~500ms.

Suggestions
Suggestion 1
In an effort to alleviate both problems I suggest the json marshaling is made to stream the data to the wire. There's no need to make a copy of it all in memory first, especially in the API case where we literally just write to the buffer. A terrible-hacky example would be something like:

		enc := json.NewEncoder(w)
		w.Write([]byte{'['})
		
		for i, item := range m {
    		if err := enc.Encode(item); err != nil {
    		    fmt.Println(err)
    		    return
    		}
    		if i < len(m)-1 {
    		    w.Write([]byte{','})
    		}
		}
		w.Write([]byte{']'})

In this example we would spin over every entry in the response and marshal that out. This means that each samplestream (in this example) would need to be in memory, but we'd then write it to the wire and no longer need it in memory. In addition this means that the request is "cancelable" at each encode step (if the client disconnects, then you get a stream closed error). Of course the "correct" implementation of this would require a bit of type switching

Suggestion 2
Change the marshaling of the various structs to be codegen'd. Most of them are partially there, but there are some minor improvements that can be made that would give you ~2x boost in performance (mostly copying less, and reflecting less).

For both of these I'd more than happy to implement it (its not that bad), but I wanted to get some feedback prior to implementation.

Repro Script

package main

import (
	"encoding/json"
	"fmt"
	"strconv"
	"time"

	"github.com/prometheus/common/model"
)

func generateData() model.Matrix {
	NUM_TIMESERIES := 5000
	NUM_DATAPOINTS := 17280

	// Create the top-level matrix
	m := make(model.Matrix, 0)

	for i := 0; i < NUM_TIMESERIES; i++ {
		lset := map[model.LabelName]model.LabelValue{
			model.MetricNameLabel: model.LabelValue("timeseries_" + strconv.Itoa(i)),
		}

		now := model.Now()

		values := make([]model.SamplePair, NUM_DATAPOINTS)

		for x := NUM_DATAPOINTS; x > 0; x-- {
			values[x-1] = model.SamplePair{
				Timestamp: now.Add(time.Second * -15 * time.Duration(x)), // Set the time back assuming a 15s interval
				Value:     model.SampleValue(float64(x)),
			}
		}

		ss := &model.SampleStream{
			Metric: model.Metric(lset),
			Values: values,
		}

		m = append(m, ss)
	}
	return m
}

func main() {
	start := time.Now()
	m := generateData()
	took := time.Now().Sub(start)

	fmt.Println("done generatingData took:", took)

	start = time.Now()
	json.Marshal(m)
	took = time.Now().Sub(start)
	fmt.Println("done marshaling took:", took)
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions