search jobs: switch from CSV to JSON#59619
Conversation
| @@ -72,13 +72,13 @@ func (h *exhaustiveSearchRepoRevHandler) Handle(ctx context.Context, logger log. | |||
| return err | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
This is the "write" part. We replace the CSV writer with the JSON writer.
| } | ||
| } | ||
|
|
||
| func writeSearchJobJSON(ctx context.Context, iter *iterator.Iterator[string], uploadStore uploadstore.Store, w io.Writer) (int64, error) { |
There was a problem hiding this comment.
This is the "read" part. We simply concatenate the result blobs from the various repo-revisions we searched.
|
|
||
| type MatchJSONWriter struct { | ||
| w *http.JSONArrayBuf | ||
| w *bufferedWriter |
There was a problem hiding this comment.
switching to line-separted JSON for the internal format. See discussion
| m.Path("/insights/export/{id}").Methods("GET").Handler(trace.Route(handlers.CodeInsightsDataExportHandler)) | ||
| m.Path("/search/stream").Methods("GET").Handler(trace.Route(frontendsearch.StreamHandler(db))) | ||
| m.Path("/search/export/{id}.csv").Methods("GET").Handler(trace.Route(handlers.SearchJobsDataExportHandler)) | ||
| m.Path("/search/export/{id}.json").Methods("GET").Handler(trace.Route(handlers.SearchJobsDataExportHandler)) |
There was a problem hiding this comment.
I guess technically we could support something which converts the json into csv in the future? I quite liked the csv support, but we should actually get feedback it is useful.
There was a problem hiding this comment.
Agreed. I also like the CSV. However, I think it might get a unwieldy quickly once we want to support multiple result types or return individual result chunks with their position.
| func (j *bufferedWriter) Append(v any) error { | ||
| oldLen := j.buf.Len() | ||
|
|
||
| enc := json.NewEncoder(&j.buf) |
There was a problem hiding this comment.
was wondering if it was worth saving the returned encoder for use between calls. But after reading the implementation, it seems all the real state that can be inferred between calls is stored in a sync.pool. So the only downside of this is a tiny allocation which won't outlive this function call. LGTM.
Relates to https://github.com/sourcegraph/sourcegraph/pull/59619 We have moved from CSV to JSON as the new export format. Note: I also fixed some typos and tweaked the copy a bit. Several customers have asked about the ENVs for the blobstore so I made it more explicit that the blobstore doesn't require setting any ENVs.
Closes #59329
Relates to #59352
We switch the format of the results download from CSV to line-separated JSON.
Each line corresponds to a JSON object containing chunk matches.
The JSON object has the same format as the matches served by the Stream API.
example.json
This is a breaking change motivated by customer feedback.
Search Jobs is still released as EAP so a breaking change is acceptable.
Pros:
Cons:
The commits can be reviewed separatedly.
The first 2 commits contain the core of this change.
In the third commit I delete code and update tests
Next: update documentation
Test plan