Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Log external requests#44286

Merged
vdavid merged 74 commits into
mainfrom
dv/log-outgoing-requests
Nov 29, 2022
Merged

Log external requests#44286
vdavid merged 74 commits into
mainfrom
dv/log-outgoing-requests

Conversation

@vdavid

@vdavid vdavid commented Nov 11, 2022

Copy link
Copy Markdown
Contributor

This PR starts logging all outgoing requests from Sourcegraph, and makes them discoverable on the web UI.
The feature will be off by default, except for development. Can be enabled in site config.

Notes

Back end

  • Storage is in Redis.
  • I store everything that made sense around the requests, that is:
    • startedAt
    • method
    • url
    • requestHeaders
    • requestBody
    • statusCode
    • responseHeaders
    • duration
    • errorMessage
    • creationStackFrame
    • callStackFrame
  • By default, the feature is turned off, and there is no logging happening. It can be enabled by setting the outboundRequestLogLimit setting to a non-zero value in the site config. The suggested default is 50.
  • I'm enabling this by default for development in a separate PR to dev-private
  • There is a single GraphQL API. Calls to it look like this (after is optional):
    query OutboundRequests($after: String) {
      outboundRequests(after: $after) {
        nodes {
          id
          method
          url
        }
      }
    }
  • I've added some very light integration tests where it made sense. I added no E2E tests.
  • Security: I made the new endpoint only accessible to site admins.
  • Performance: Redis access is fast. The amount of data per item is very low, our responses are gzipped, so even if the admin sets the limit to 500 (the maximum value that's allowed), it'll be snappy.
  • Scalability: I've only checked the code and didn't do tests, but I see no reason why this wouldn't scale with a high amount of requests, even if a high limit is set.

Front end

  • I've added a new item under "Maintenance" (admin-only section) called "Outbound requests".
  • I've used "Migrations" and "Repositories" as templates and used FilteredComponent for the list.
  • The front end polls the back end every 5 seconds.
  • Only new items are returned, so in most cases, the responses will be empty.
  • I tried several methods for the polling, and I ended up going with useQuery() with pollInterval and a small hack to go around Variable recalculation for each polling request apollographql/apollo-client#3053.
  • As the log items are not that many and we probably want to pull all of them, I went with front-end filtering.
    • Note: This would be quite fast, but it was actually too fast for FilteredComponent, and a weird race condition started occurring inside FilterComponent, so after an hour of debugging, I found the problem and added an ugly delay(200) to the query function. Of course, it still feels quite fast, but... 🤦‍♂️
  • UI screenshot:

CleanShot 2022-11-28 at 16 09 41

Test plan

Tested these aspects:

  • Status filter: went to the new page on my localhost, waited for items to arrive, and toggled the status filter to "Successful" / "Failed" / "All", and it worked as I expected.
  • Query: tested, it nicely filters by URL, body text, status code, and the headers.
  • Network traffic: Made sure that only new items are arriving: responses contain no items, except when there is a real new request, in which case
  • Copy as curl: Clicked the button and tried it in a console. It works fine when there are no REDACTED fields. Handling redacted fields is outside the scope of this PR.
  • Accessibility: Keyboard navigation works fine, and in the "Accessibility" section of Chrome DevTools, everything looks as semantic as for example "migration log".
  • Stress test: Tested with a 5,000 limit. Pretty much what I had expected: The UI was relatively laggy, but it didn't freeze. The back end handled it very easily. The actual maximum for the limit is 500, so we seem to be quite safe.

Also created this 5-min Loom where I went through the above point. It's not great, not terrible, and probably isn't worth a watch 😄

@vdavid vdavid requested a review from mrnugget November 11, 2022 14:15
@cla-bot cla-bot Bot added the cla-signed label Nov 11, 2022
@vdavid vdavid force-pushed the dv/log-outgoing-requests branch from feab632 to 75a16a0 Compare November 15, 2022 10:01
@sg-e2e-regression-test-bob

sg-e2e-regression-test-bob commented Nov 21, 2022

Copy link
Copy Markdown

Bundle size report 📦

Initial size Total size Async size Modules
0.01% (+0.31 kb) 0.11% (+16.33 kb) 🔺 0.13% (+16.02 kb) 🔺 0.14% (+1) 🔺

Look at the Statoscope report for a full comparison between the commits b232e8a and 3010217 or learn more.

Open explanation
  • Initial size is the size of the initial bundle (the one that is loaded when you open the page)
  • Total size is the size of the initial bundle + all the async loaded chunks
  • Async size is the size of all the async loaded chunks
  • Modules is the number of modules in the initial bundle

@vdavid vdavid force-pushed the dv/log-outgoing-requests branch 2 times, most recently from 34ac077 to 36d8aea Compare November 23, 2022 13:07
@vdavid vdavid marked this pull request as ready for review November 24, 2022 00:26
@sourcegraph-bot

sourcegraph-bot commented Nov 24, 2022

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 3010217...b232e8a.

Notify File(s)
@keegancsmith internal/httpcli/client.go
internal/httpcli/client_test.go
internal/httpcli/external.go
internal/httpcli/redis_logger_middleware.go
internal/httpcli/redis_logger_middleware_test.go

@mrnugget mrnugget left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass and left a bunch of comments with suggestions/questions. I didn't look at the frontend code yet, because I assume that my comments on GraphQL schema etc. might change frontend.

One comment on the screenshot though: can we make this really dense? 😄 We send a lot of requests and I think we should maybe have a single line per request and then uncollapse on click or something.

I'm very excited for this! :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like the comment on lastKey: I think OutBoundRequestConnection would be more idiomatic to return. Especially since you can then return the pageInfo with the next cursor to use.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, at first I thought this was just a rename, but if you mean that I should change the structure to WebhookLogConnection's then my decision not to do that was intentional.

We always want to return all items (optionally filtered by "after"), we have no pagination and the total count is pretty redundant as well because we're always getting the items. So while I realized that was the most common solution, in this case I saw no value added by matching the format of WebhookLogConnection.
On the other hand, I found examples like pendingInvitations and outOfBandMigrations which followed the pattern I used, so this also seemed idiomatic.
WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! I assumed lastKey is cursor-based pagination, since it effectively works like endCursor.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always want to return all items (optionally filtered by "after")

Is this because the backend caps the responses automatically? What if we ever change the backend and decide to keep a longer history of these requests?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be in this PR, but just FYI: we also have this type https://github.com/sourcegraph/sourcegraph/blob/7d234ffe2f73177f6bcc8418499397fc4c71f858/cmd/frontend/graphqlbackend/schema.graphql#L8112-L8125

We could probably replace WebhookLogHeader with HTTPHeader in a follow-up PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use pointers for a context.Context here. It's an identity-less object that can be wrapped/unwrapped and no caller/callee should care about a specific instance.

If you want to make it optional I suggest defining a different method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly the conclusion I came to last night after going to bed 😄
Yes, the original intention was to make it optional, but the solution didn't feel right.

Also, I'm not big on deciding pointers vs no pointers in Go yet. This is probably the area where I feel a lack of knowledge of best practices the most often in Go.

Comment thread internal/httpcli/client.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short docstring please :)

Also: would IsRiskyHeader be a better name? Key is awfully generic

Comment thread internal/httpcli/redis_logger_middleware.go Outdated
Comment thread internal/httpcli/redis_logger_middleware.go Outdated
Comment thread internal/httpcli/redis_logger_middleware.go Outdated
Comment thread internal/rcache/rcache.go Outdated
Comment on lines 214 to 219

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I think you should just pass in context.Background() instead of making ctx optional

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that if logging-to-redis was turned off that this middleware would be a noop. What it does instead is to save the request and then delete it immediately, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, true, thank you! I added an if limit==0 check to the getter, but not the setter.
How much fun this function could've had saving and deleting all those items. 😿
I've added the check now.

Comment thread cmd/frontend/graphqlbackend/schema.graphql Outdated
@vdavid

vdavid commented Nov 24, 2022

Copy link
Copy Markdown
Contributor Author

@mrnugget I’ve responded to all your comments so far, and pushed all my commits. I’ll now work on the UI, remembering your ask “can we make this really dense? 😄” which also makes a lot of sense to me.

@vdavid

vdavid commented Nov 24, 2022

Copy link
Copy Markdown
Contributor Author

UI evolution:

Yesterday:
CleanShot 2022-11-24 at 01 25 39@2x

Today:
CleanShot 2022-11-24 at 17 09 28@2x

Tomorrow:
✨ ✨ ✨

@bobheadxi bobheadxi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the bits pertaining to the redis cache, as per my mention in the PR :)

On 2022-11-16, @bobheadxi added a new Redis function called ListKeys() here. This causes some duplication in functionality with my former solution GetAll(). I've decided to de-duplicate it, trying to keep the good parts of both. I'll reach out to Robert to make sure that my changes don't alter his solution too much. If he disagrees with my decision, it'll be easy to revert.

Comment thread internal/rcache/rcache.go Outdated
Comment on lines 216 to 220

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ListKeysWithPrefix lists all keys associated with this cache, filtered by a prefix.
// Use with care if you have long TTLs or no TTL configured.
func (r *Cache) ListKeysWithPrefix(ctx context.Context, prefix string) (results []string, err error) {
return r.listKeys(ctx, prefix)
}
// ListKeysWithPrefix lists all keys associated with this cache, filtered by a prefix in addition to the prefix already associated with this Cache instance.
// Use with care if you have long TTLs or no TTL configured.
func (r *Cache) ListKeysWithPrefix(ctx context.Context, prefix string) (results []string, err error) {
return r.listKeys(ctx, prefix)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see https://github.com/sourcegraph/sourcegraph/pull/44286#discussion_r1031756025 , I don't think we need these prefix variants at all - a redis cache instance in this case is already designed to represent a prefixed namespace within the cache. The cache you re-use is supposed to be for conditional requests, I would just create a new one for this new type of cache data

Comment thread internal/rcache/rcache.go Outdated
Comment thread internal/httpcli/redis_logger_middleware.go Outdated
Comment thread internal/rcache/rcache.go Outdated

@bobheadxi bobheadxi Nov 24, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redisCache instance in package httpcli is used for caching results of APIs that support conditional requests: https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph@a49c8c8b7a374deafb6841723f0afa535561c0b6/-/blob/internal/httpcli/client.go?L86-92

If you want to store a different type of data, in this case outbound requests, you should create a new outboundRequestsRedisCache and set a site-configurable TTL for it. Then you don't need to implement the various prefix variants of functions on the redis cache - each cache instance is already designed to represent a prefixed namespace within the cache, and creating new ones is not expensive or anything

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely agree, did these changes, and the code is of course a lot cleaner this way.

The only thing is that I don't see much value in making the TTL site-configurable. TTL for redisCache is also hard-coded. If customers need to configure it later, it's very easy to add a setting for this, so I'll just hard code this for now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here's how I made a live-configurable TTL: https://sourcegraph.com/github.com/sourcegraph/sourcegraph@81082be686593f286efe4febe54d3ec06df807cb/-/blob/enterprise/internal/authz/syncjobs/records_store.go?L44:1-63:2

We could potentially bundle this type of behaviour into redisCache itself as well. Configuration is useful in case e.g. the load of this is higher than expected, or you want a larger window for debugging data - up to you :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, only the last fifty requests are stored, and the intention is that they won't set this to a number higher than 1,000. Considering this, and that we are logging a request for each ping, the 7-day limit seems to be impossible to reach. So I think it's fine to leave it hard-coded.

@vdavid

vdavid commented Nov 24, 2022

Copy link
Copy Markdown
Contributor Author

Wow, awesome tips, @bobheadxi, thank you! I'll fix them tomorrow!

@bobheadxi bobheadxi Nov 24, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be gql.DateTime? In practice we want users to provide a timestamp since the redis keys are all timestamps, that they are compared as strings and/or that the timestamp is the key is an implementation detail (so we should remove mentions of the key from the API documentation too)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more an opaque after cursor, as I argued above. I think exposing it as DateTime might be a bit leaky, since it's irrelevant for the client. The client wants to say "give me the requests after this one if there are new ones" and shouldn't have to worry about timestamps etc., IMHO

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong stance on this, but I never intended to provide a "give me the items after date X" feature.
I even considered just an incremental int64 key; I went with dates because I found them better suited for the purpose, but the intention is still only to have "some incremental value", not a date.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, in that case can we change the docstring here to match other instances in the GraphQL API where we have this type of after cursor? e.g.

        """
        An opaque cursor that is used for pagination.
        """
        after: String

@vdavid vdavid force-pushed the dv/log-outgoing-requests branch from 1f7e330 to 84d49af Compare November 25, 2022 10:54
Comment thread client/web/src/site-admin/SiteAdminOutboundRequestsPage.tsx Outdated
Comment thread client/web/src/site-admin/SiteAdminOutboundRequestsPage.tsx Outdated
Comment thread client/web/src/site-admin/SiteAdminOutboundRequestsPage.tsx Outdated
Comment thread client/web/src/site-admin/SiteAdminOutboundRequestsPage.tsx Outdated
Comment thread client/web/src/site-admin/SiteAdminOutboundRequestsPage.tsx Outdated
Comment thread cmd/frontend/graphqlbackend/schema.graphql Outdated
Comment thread cmd/frontend/graphqlbackend/outbound_requests.go Outdated
Comment thread cmd/frontend/graphqlbackend/outbound_requests.go Outdated
Comment thread cmd/frontend/graphqlbackend/outbound_requests.go Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I definitely can, I just wasn't sure which one is more helpful info to show:
a.) the caller of NewExternalClientFactory (which, in some cases, is internal/httpcli/client.go:96)
b.) the caller of the internal/httpcli package.

I went with a.), but I can easily go with b.) – I guess your stance is that b.) is more useful?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm. I think I want to know where the request comes from, not where the client was instantiated, you know? But now that you mention it, both are useful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that knowing where the call comes from is more useful. I just thought it's also worth logging which client it uses.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case this might be useful, but you could take a look at this code https://sourcegraph.sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/database/dbconn/dynamic_metadata.go?L59 which shows how to exclude certain package from the stacktrace.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jhchabran! Thorsten pointed me to Eric's PR that added this, so I'm familiar with what this code does.

I think what we have now is already useful. I prefer to release the feature and collect feedback on it before further improving it. Then we'll see what improvements we see as useful.

@philipp-spiess philipp-spiess left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I saw you mentioned the connection/pagination stuff so I got curious 😅 I’m mostly asking out of curiosity because I don't fully understand what's going on yet.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part needs to be inside a useEffect otherwise you risk refetching often when something outside of this component re-renders it for unknown reasons.

Also I’m not quite sure what's going on here:

  • We make a network request and get back some data and some items.
    • In case there are no items, we continue to poll the same request
    • However if there are items, we update the network request with a new after cursor so that we only get newer items.

Is this correct?

What I miss is where do we append items to the existing array? Do they just disappear? 🤔

@vdavid vdavid Nov 25, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipp-spiess Thanks, I've wrapped it into a useEffect!

You are absolutely right that this is weird! @mrnugget also didn't get this at first, so I added this comment to avoid further confusion. Basically, the refetch is only there to apply the change to the variables, that's all. We're throwing away the result for the refetch (which is, by the way, most likely empty because we've just loaded the data milliseconds before it—but it doesn't actually matter whether it's empty or not).
We're fetching the data with useQuery here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Polling in Apollo is such a mess 🤕

Comment thread client/web/src/site-admin/SiteAdminOutboundRequestsPage.tsx Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of items is somehow capped, right? Otherwise this might get really expensive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not quite sure why we use a FilteredConnection here if we don't have a pagination of some form (maybe I misread the comments)?

Just to provide the dropdown filter for success/failure, and the free-text search pretty much for free, without doing a custom implementation of a data display component with filters etc.

The number of items is somehow capped, right? Otherwise this might get really expensive.

Yes, it's capped at a limit set by the admin, 50 being the suggested default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always want to return all items (optionally filtered by "after")

Is this because the backend caps the responses automatically? What if we ever change the backend and decide to keep a longer history of these requests?

Yes, the backend only stores a number of items (N=50 by default), and returns all of them for the first call (after being null), then only the ones with IDs larger than after. Even if the site admin sets it to something like 500, the data is transmitted only once, and each piece is fairly small.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to provide the dropdown filter for success/failure, and the free-text search pretty much for free, without doing a custom implementation of a data display component with filters etc.

Makes sense! Yeah we really need the new filter components to be usable in a standalone way 😅

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I just proved that FilteredConnection can be abused to do client-side filtering 😆
The only annoying part is that ugly delay(200) here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always want to return all items (optionally filtered by "after")

Is this because the backend caps the responses automatically? What if we ever change the backend and decide to keep a longer history of these requests?

Comment on lines 1582 to 1585

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Filters the requests by key, only returning keys greater than the given one.
"""
after: String
"""
An opaque cursor that is used for pagination.
"""
after: String

Comment thread internal/rcache/rcache.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GetMulti uses varargs, so we might want to keep consistency here with varargs as well:

Suggested change
func (r *Cache) DeleteMulti(keys []string) {
func (r *Cache) DeleteMulti(keys ...string) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Error does not create a log entry, it creates a log field, so you need a logger here to actually log this log field - see https://docs.sourcegraph.com/dev/how-to/add_logging#writing-log-messages

@bobheadxi bobheadxi Nov 25, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging in httpcli is complicated because clients are often instantiated as a package-level variable, before loggers are available

@bobheadxi bobheadxi Nov 25, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit complicated, but you could set the error in request context and have it be picked up by the logging middleware:

https://github.com/sourcegraph/sourcegraph/blob/250437c8851968143d071f700ee26f70e8869a48/internal/httpcli/client.go#L324-L332

@bobheadxi bobheadxi Nov 25, 2022

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, log.Error does not create a log entry, only a log field

@vdavid vdavid requested a review from a team November 28, 2022 17:26
vdavid and others added 2 commits November 28, 2022 18:37
Co-authored-by: Robert Lin <robert@bobheadxi.dev>
@vdavid vdavid requested a review from bobheadxi November 29, 2022 10:03

@philipp-spiess philipp-spiess left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend LGTM 🚀

styles[node.method.toLowerCase() as keyof typeof styles]
)}
>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only used for visual guidance, it should probably by wrapped with <VisuallyHidden/>

@vdavid vdavid dismissed bobheadxi’s stale review November 29, 2022 11:01

I've incorporated Robert's changes (https://github.com/sourcegraph/sourcegraph/pull/44861) in my final version of this PR.

@vdavid vdavid merged commit cd28a4c into main Nov 29, 2022
@vdavid vdavid deleted the dv/log-outgoing-requests branch November 29, 2022 11:08
burmudar added a commit that referenced this pull request Nov 29, 2022
burmudar added a commit that referenced this pull request Nov 29, 2022

@eseliger eseliger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for being late to the party! I had a few comments pending that I didn't get around finishing yet - nothing critical at all, just some comments/advice around GraphQL

pageInfo: PageInfo!
}
"""
A single logged webhook delivery.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is wrong

"""
type OutboundRequestConnection {
"""
A list of webhook logs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is wrong

nodes: [OutboundRequest!]!

"""
The total number of webhook logs in the connection.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is wrong

"""
type OutboundRequest {
"""
The request log item ID. Looks like "2021-01-01T00_00_00.000000" – Same as the Redis keys.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to explain the format here, ids are meant to be opaque. Also, should this type implement Node? Otherwise it can't be queried by ID.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, true. It actually already implements Node, just I missed the implements Node here.

Comment on lines +8126 to +8129
"""
The total time the request took to complete.
"""
duration: Float!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the unit of this float? ms?
More so, does it have to be float or is ms resolution good enough for us?

@vdavid vdavid Nov 29, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eseliger Thanks for pointing it out! I also hate when an interface doesn't specify the unit of measurement; it's embarrassing that I feel into this trap now. It's seconds. Current precision is nanoseconds, but a millisecond resolution would probably be totally enough.

I'm seeing these solutions:
a.) Just clarify this in the comment. That is, add "The total time the request took to complete, in seconds."
b.) Change the name in the GraphQL to durationInSeconds
c.) Like b.), but also on the back end (it starts getting problematic here because to do this properly, I'd need to migrate the old format in Redis
d.) Change the graphQL and front-end to milliseconds and int, and change the name to durationMs.
e.) like d.), but back end also (also needs migrating old data)

I'd go with a.). b.) and d.) are also fine with me if you believe it helps much. I probably wouldn't put more effort into it than that.

I'll go with a.) until you respond!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah d feels good to me as it would also get around potential floating point precision issues, but if it's a ton of work don't bother with it.

Comment on lines +1582 to +1585
"""
An opaque cursor that is used for pagination.
"""
after: ID

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after should be of type String usually as per relay, also would be good to stay consistent and also allow the first: Int property here - that'll help with the pagination effort

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after

Hmm, it was a string at first, but when I implemented the Node interface I had to change it to ID because I needed to unmarshal the base64 encoding. I can look into how to make this a string if you believe this is a mistake.

first

Context: In earlier stages, this endpoint only returned an array, I intentionally didn't want to match the GraphQL Cursor Connection Spec because it seemed like introducing too much dead code we won't use. But reviewers started requesting it to be more like our other endpoints, so by now it is almost like a normal endpoint. At this point, I agree that it'd make sense to add first, too. I'll add it!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it was a string at first, but when I implemented the Node interface I had to change it to ID because I needed to unmarshal the base64 encoding. I can look into how to make this a string if you believe this is a mistake.

I think it should be as simple as doing graphql.ID(arg.ID) when passing to the unmarshaller, the reason is simply that it should not be a client concern nor should the client be encouraged to pass an ID to the after field, so we can simply change what the cursor contains later, should we have a need for that. Doing that case is pretty standard in our code base fwiw, so I don't feel bad about it :)

Context: In earlier stages, this endpoint only returned an array, I intentionally didn't want to match the GraphQL Cursor Connection Spec because it seemed like introducing too much dead code we won't use. But reviewers started requesting it to be more like our other endpoints, so by now it is almost like a normal endpoint. At this point, I agree that it'd make sense to add first, too. I'll add it!

Perfect, thanks!

"WebhookLog": func(ctx context.Context, id graphql.ID) (Node, error) {
return webhookLogByID(ctx, db, id)
},
"OutboundRequest": func(ctx context.Context, id graphql.ID) (Node, error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works once the type also implements the Node interface in the graphql schema

@vdavid

vdavid commented Nov 29, 2022

Copy link
Copy Markdown
Contributor Author

@eseliger Thanks for the review! I'll fix these in a follow-up PR!

burmudar added a commit that referenced this pull request Nov 29, 2022
burmudar added a commit that referenced this pull request Nov 29, 2022
… Log External Requests (#44893)

Revert "Revert "Log external requests (#44286)" (#44890)"

This reverts commit 8dc8fb0.

Co-authored-by: David Veszelovszki <veszelovszki@gmail.com>
@vdavid

vdavid commented Nov 30, 2022

Copy link
Copy Markdown
Contributor Author

@eseliger addressed everything in https://github.com/sourcegraph/sourcegraph/pull/44937
Thanks for taking the time for doing the review. The copy&paste mistakes were embarrassing, the rest was instructional!

@keegancsmith

Copy link
Copy Markdown
Member

@vdavid looking at the code here, is there a reason you don't use redis list operations? It looks like you are basically simulating that, but relying on ListKeys which has poor performance (we should document this better). ListKeys is essentially O(N), where N is the number of keys in all of redis. The surprising complexity here is you call Scan on every single outgoing request. Redis list operations are much more appropriate for this data structure, and I see we have a FIFOList structure which looks like it will handle this use case? cc @burmudar

@burmudar

Copy link
Copy Markdown
Contributor

@jhchabran ☝🏼

@jhchabran

jhchabran commented Jan 17, 2023

Copy link
Copy Markdown
Contributor

@keegancsmith @vdavid yes, I wrote the FIFOList with that exact case in mind, as I had a pretty similar case with the slow graphql requests. A big convenience is that because the FIFOList is capped in capacity, you don't have to deal with expiring old keys.

@keegancsmith

Copy link
Copy Markdown
Member

@vdavid I might just port this code to FIFOList since the only users of rcache's ListKeys, DeleteMulti, SetMulti is this code. Likely faster to just port it and remove those functions instead of adding another API surface to support in "no redis" world.

@vdavid

vdavid commented Jan 17, 2023

Copy link
Copy Markdown
Contributor Author

@keegancsmith I agree. It was not the optimal choice and I considered switching to a LIST a few times during the feature development, but as the list is always short (usually 50 items, but there's a hard cap at 500) and it's an in-memory store, the gains didn't seem enough to justify the refactoring at the time. Thanks a lot if you're doing this, and let me know if I should jump in and help, happy to clean this up if it's a blocker.

@keegancsmith

Copy link
Copy Markdown
Member

Just to make it clearer, the gains are in fact worth it since ListKeys will look at every key in the entire redis store, not just the keys related to slow requests. These aren't marshalled over the wire luckily, since match runs server side on redis. But if I am not mistaken since redis is just a big hash table, it doesn't have an effecient range query / prefix matching on keys.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tooling: Log outgoing requests

10 participants