Skip to content

Collect HTTP transfer stats#366

Merged
rubyist merged 11 commits intomasterfrom
statistician
Jun 16, 2015
Merged

Collect HTTP transfer stats#366
rubyist merged 11 commits intomasterfrom
statistician

Conversation

@rubyist
Copy link
Contributor

@rubyist rubyist commented Jun 3, 2015

I was doing some measuring today and added some code that might be generally useful with respect to some analytics functionality previously discussed.

What's in this PR is a modification of the HTTP tracing code that keeps track of the sizes and timings of every http request that goes through the default client. Once you have a response you can link it to a key in a way similar to statsd (though far less capable, at this point), e.g.:

LogTransfer("lfs.api.upload", res)

It tracks the size of both the header and body portions of the transfers, as well as the timing. The header vs body sizes can be useful for determining the point at which the HTTP overhead overtakes the file size (e.g. thousands of small files in LFS).

I'd love to be able to break down the timings into the various pieces of the http request, but that's a whole other can of worms best left for another day.

As it stands, when tracing with GIT_CURL_VERBOSE=1, you'll get output like this:

(100 of 100 files) 50.00 MB 1m22s
HTTP Transfer Stats

Concurrent Transfers: 3, Batch: false

lfs.api.upload:
  Transfers: 100
  Time: 120870.59ms
  Wire Data (Bytes): 81700
    Request Size (Bytes): 37300
      Headers: 28500
      Bodies: 8800
    Response Size (Bytes): 44400
      Headers: 12700
      Bodies: 31700
  Mean Wire Size: 817
    Requests: 373.00
    Responses: 444.00
  Mean Transfer Time: 1208.71ms

lfs.data.upload:
  Transfers: 100
  Time: 121015.36ms
  Wire Data (Bytes): 52461500
    Request Size (Bytes): 52459800
      Headers: 31000
      Bodies: 52428800
    Response Size (Bytes): 1700
      Headers: 1700
      Bodies: 0
  Mean Wire Size: 524615
    Requests: 524598.00
    Responses: 17.00
  Mean Transfer Time: 1210.15ms


Total Transfers: 200
Total Time: 241885.94ms

or

(100 of 100 files) 50.00 MB 1m15s
HTTP Transfer Stats

Concurrent Transfers: 10, Batch: false

lfs.api.upload:
  Transfers: 100
  Time: 350577.71ms
  Wire Data (Bytes): 81700
    Request Size (Bytes): 37300
      Headers: 28500
      Bodies: 8800
    Response Size (Bytes): 44400
      Headers: 12700
      Bodies: 31700
  Mean Wire Size: 817
    Requests: 373.00
    Responses: 444.00
  Mean Transfer Time: 3505.78ms

lfs.data.upload:
  Transfers: 100
  Time: 350871.42ms
  Wire Data (Bytes): 52461500
    Request Size (Bytes): 52459800
      Headers: 31000
      Bodies: 52428800
    Response Size (Bytes): 1700
      Headers: 1700
      Bodies: 0
  Mean Wire Size: 524615
    Requests: 524598.00
    Responses: 17.00
  Mean Transfer Time: 3508.71ms


Total Transfers: 200
Total Time: 701449.13ms

or

(100 of 100 files) 50.00 MB 37s
HTTP Transfer Stats

Concurrent Transfers: 10, Batch: true

lfs.api.batch:
  Transfers: 1
  Time: 766.13ms
  Wire Data (Bytes): 41046
    Request Size (Bytes): 9204
      Headers: 291
      Bodies: 8913
    Response Size (Bytes): 31842
      Headers: 128
      Bodies: 31714
  Mean Wire Size: 41046
    Requests: 9204.00
    Responses: 31842.00
  Mean Transfer Time: 766.13ms

lfs.data.upload:
  Transfers: 100
  Time: 186513.05ms
  Wire Data (Bytes): 52461500
    Request Size (Bytes): 52459800
      Headers: 31000
      Bodies: 52428800
    Response Size (Bytes): 1700
      Headers: 1700
      Bodies: 0
  Mean Wire Size: 524615
    Requests: 524598.00
    Responses: 17.00
  Mean Transfer Time: 1865.13ms


Total Transfers: 101
Total Time: 187279.18ms

Some things to consider before this is a real thing:

  • Use a separate tracing var for it, http tracing output is messy with concurrent transfers
  • Probably want some mutexes around the hashes, it's kind of OK as is but a little iffy
  • Make double sure we're not storing extra data when not tracing

Future work could further aggregate these, send them off somewhere, make more statsd-like outputs (percentiles, sums, histograms, etc).

@technoweenie
Copy link
Contributor

Big 👍. Though I think separating this from GIT_CURL_VERBOSE would be a good idea.

Future work could further aggregate these, send them off somewhere, make more statsd-like outputs (percentiles, sums, histograms, etc).

This could be a common log format (I'm partial to grohl of course). Something simple that the Git LFS tool can produce for something else to dump into splunk, statsd, spark, etc.

@rubyist
Copy link
Contributor Author

rubyist commented Jun 3, 2015

Yeah that's what I was getting at in that first checkbox, it gets drowned out in the GIT_CURL_VERBOSE output, a separate flag is def needed.

I like the common log format idea, dumping right to that would probably save a bit of memory while tracing, and some more request metadata could be added in.

@rubyist
Copy link
Contributor Author

rubyist commented Jun 5, 2015

I've done a bunch of clean up on this and I think it does enough for now to continue other benchmarking work. With the latest changes, if you set GIT_LOG_STATS (open to a better name) it will log the stats for all transfers in that command run to a timestamp-based log file in .git/lfs/objects/logs/http. I pulled the display output out of this code. I've got some code up in a gist that will render what was there. This could be made part of the lfs logs command, or left external and made into a proper repo. I'm not sure it's necessary to have all that in the client proper, external tooling might be better (and can be extended without interfering with the client release cycle).

@rubyist rubyist closed this Jun 16, 2015
@rubyist rubyist reopened this Jun 16, 2015
rubyist added a commit that referenced this pull request Jun 16, 2015
Collect HTTP transfer stats
@rubyist rubyist merged commit 257f909 into master Jun 16, 2015
@rubyist rubyist deleted the statistician branch June 16, 2015 14:43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be necessary to dump the url, since you can get it from the response object with res.Request.URL.

Maybe it makes too little difference, but you could build the transfer object outside of the lock.

t := &transfer{requestStats: reqstats, responseStats: resstats, url: req.URL.String()}
transfersLock.Lock()
transfers[res] = t
transfersLock.Unlock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should build the object outside of the lock. Not a big deal, but feels better. I'm not sure what you mean by "dump the url", though.

@technoweenie technoweenie mentioned this pull request Jun 16, 2015
38 tasks
@rubyist rubyist mentioned this pull request Jun 16, 2015
2 tasks
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jan 21, 2026
At present, when the GIT_CURL_VERBOSE environment variable is set to
a value such as "1" or "true", Git LFS client outputs some of the details
of each HTTP request and response it makes to its standard error stream.

We gather the HTTP request headers to be logged in the traceRequest()
method of the Client structure from our "lfshttp" package.  To gather
the headers, the traceRequest() method uses the DumpRequest() function
from the from the "net/http/httputil" package in the Go standard library.

Our use of the DumpRequest() function for this purpose dates back to
commit 6fafade of PR git-lfs#366 in 2015,
when it was first introduced into what was then the traceHttpRequest()
function in our "lfs" package.  This function gradually evolved into the
current traceRequest() method through a series of PRs.  First, portions
of the "lfs" package were migrated into a new "httputil" package in
PR git-lfs#1226.  That package was later removed in PR git-lfs#1846 after a new
"lfsapi" package was added in PR git-lfs#1794 with a version of the
traceRequest() method very similar to the current one.  Finally, in
PR git-lfs#3244 we moved the method into the current "lfshttp" package.

Throughout these changes, however, our use of the DumpRequest()
function has remained the same.  However, since the Git LFS client only
acts as an HTTP client and not an HTTP server, we should instead be
using the DumpRequestOut() function from the standard "httputil" package
to gather the headers of an HTTP request for logging purposes.

Replacing our use of the DumpRequest() function with the DumpRequestOut()
function would allow us to output a number of request headers that we
currently fail to log, as was noted with respect to the Accept-Encoding
header in issue git-lfs#6193.

Since Go version 1.6, the "httputil" package's documentation has stated
that the DumpRequest() function "should only be used by servers to debug
client requests", and that the DumpRequestOut() function "is like
DumpRequest but for outgoing client requests":

  https://pkg.go.dev/net/http/httputil@go1.6#DumpRequest
  https://pkg.go.dev/net/http/httputil@go1.6#DumpRequestOut

However, at the time we first adopted the use of the DumpRequest()
function, the Go documentation did not include these directives, and
merely noted that "DumpRequest returns the as-received wire representation
of req, optionally including the request body, for debugging" and that
"DumpRequestOut is like DumpRequest but includes headers that the
standard http.Transport adds, such as User-Agent":

  https://pkg.go.dev/net/http/httputil@go1.4#DumpRequest
  https://pkg.go.dev/net/http/httputil@go1.4#DumpRequestOut

It is reasonable to assume, therefore, that when PR git-lfs#366 was written
its authors simply assumed that the DumpRequest() function was sufficient,
given the state of the Go documentation at that time.

As we would prefer to log a more complete set of client HTTP request
headers, though, particular the Accept-Encoding header which the
Go "net/http" package automatically adds to outgoing requests, we
now update our traceRequest() method to use the DumpRequestOut()
function.

We then also update our Go language TestVerbose*() test functions
to confirm that the Accept-Encoding request header is now captured and
reported when verbose logging is enabled.  Note that without the change
in this PR to the use of the DumpRequestOut() function, these revised
tests would fail, since the Accept-Encoding header would not appear in
headers returned by the DumpRequest() function.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jan 21, 2026
At present, when the GIT_CURL_VERBOSE environment variable is set to
a value such as "1" or "true", the Git LFS client outputs some of the
details of each HTTP request and response it makes to its standard
error stream.

We gather the HTTP request headers to be logged in the traceRequest()
method of the Client structure from our "lfshttp" package.  To gather
the headers, the traceRequest() method uses the DumpRequest() function
from the from the "net/http/httputil" package in the Go standard library.

Our use of the DumpRequest() function for this purpose dates back to
commit 6fafade of PR git-lfs#366 in 2015,
when it was first introduced into what was then the traceHttpRequest()
function in our "lfs" package.  This function gradually evolved into the
current traceRequest() method through a series of PRs.  First, portions
of the "lfs" package were migrated into a new "httputil" package in
PR git-lfs#1226.  That package was later removed in PR git-lfs#1846 after a new
"lfsapi" package was added in PR git-lfs#1794 with a version of the
traceRequest() method very similar to the current one.  Finally, in
PR git-lfs#3244 we moved the method into the current "lfshttp" package.

Throughout these changes, however, our use of the DumpRequest()
function has remained the same.  However, since the Git LFS client only
acts as an HTTP client and not an HTTP server, we should instead be
using the DumpRequestOut() function from the standard "httputil" package
to gather the headers of an HTTP request for logging purposes.

Replacing our use of the DumpRequest() function with the DumpRequestOut()
function would allow us to output a number of request headers that we
currently fail to log, as was noted with respect to the Accept-Encoding
header in issue git-lfs#6193.

Since Go version 1.6, the "httputil" package's documentation has stated
that the DumpRequest() function "should only be used by servers to debug
client requests", and that the DumpRequestOut() function "is like
DumpRequest but for outgoing client requests":

  https://pkg.go.dev/net/http/httputil@go1.6#DumpRequest
  https://pkg.go.dev/net/http/httputil@go1.6#DumpRequestOut

However, at the time we first adopted the use of the DumpRequest()
function, the Go documentation did not include these directives, and
merely noted that "DumpRequest returns the as-received wire representation
of req, optionally including the request body, for debugging" and that
"DumpRequestOut is like DumpRequest but includes headers that the
standard http.Transport adds, such as User-Agent":

  https://pkg.go.dev/net/http/httputil@go1.4#DumpRequest
  https://pkg.go.dev/net/http/httputil@go1.4#DumpRequestOut

It is reasonable to assume, therefore, that when PR git-lfs#366 was written
its authors simply assumed that the DumpRequest() function was sufficient,
given the state of the Go documentation at that time.

As we would prefer to log a more complete set of client HTTP request
headers, though, particularly the Accept-Encoding header which the
Go "net/http" package automatically adds to outgoing requests, we
now update our traceRequest() method to use the DumpRequestOut()
function.

We then also update some of our Go language TestVerbose*() test functions
to confirm that the Accept-Encoding request header is now captured and
reported when verbose logging is enabled.  Note that without the change
in this PR to the use of the DumpRequestOut() function, these revised
tests would fail, since the Accept-Encoding header would not appear in
headers returned by the DumpRequest() function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants