This repository was archived by the owner on Sep 30, 2024. It is now read-only.
Cody web: add server-side fetching for URL mentions#64223
Merged
Conversation
Contributor
❌ Cloud Controller GraphQL Compatability Test Resultsourcegraph/controller uses the GraphQL API to perform automation. The compatibility test has failed and this pull request may have introduced breaking changes to the GraphQL schema. Next steps:
|
Comment on lines
+444
to
+474
| // 🚨 SECURITY: This endpoint allows API users to create GET requests against arbitrary URLs. | ||
| // To mitigate risk of SSRF, we use an the ExternalClient, which denies requests to internal targets. | ||
| resp, err := httpcli.UncachedExternalClient.Do(req) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer resp.Body.Close() | ||
|
|
||
| if resp.StatusCode >= http.StatusBadRequest { | ||
| return nil, errors.Errorf("request failed with status %d", resp.StatusCode) | ||
| } | ||
|
|
||
| // 🚨 SECURITY: Limit the amount of data we will read into memory. | ||
| content, err := io.ReadAll(io.LimitReader(resp.Body, urlContextReadLimit)) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| // Attempt to extract the title | ||
| var title *string | ||
| if match := titleRegexp.FindSubmatch(content); match != nil { | ||
| title = pointers.Ptr(string(match[1])) | ||
| } | ||
|
|
||
| // Trim to main if it exists since that's a decent signal pointing to the important part of the page. | ||
| if idx := bytes.Index(content, []byte("<main")); idx > 0 { | ||
| content = content[idx:] | ||
| } | ||
| if idx := bytes.Index(content, []byte("</main>")); idx > 0 { | ||
| content = content[:idx+len("</main>")] | ||
| } |
Check notice
Code scanning / Semgrep OSS
Semgrep Finding: security-semgrep-rules.semgrep-rules.generic.comment-tagging-rule
Comment on lines
+456
to
+485
| // 🚨 SECURITY: Limit the amount of data we will read into memory. | ||
| content, err := io.ReadAll(io.LimitReader(resp.Body, urlContextReadLimit)) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| // Attempt to extract the title | ||
| var title *string | ||
| if match := titleRegexp.FindSubmatch(content); match != nil { | ||
| title = pointers.Ptr(string(match[1])) | ||
| } | ||
|
|
||
| // Trim to main if it exists since that's a decent signal pointing to the important part of the page. | ||
| if idx := bytes.Index(content, []byte("<main")); idx > 0 { | ||
| content = content[idx:] | ||
| } | ||
| if idx := bytes.Index(content, []byte("</main>")); idx > 0 { | ||
| content = content[:idx+len("</main>")] | ||
| } | ||
|
|
||
| // Convert the HTML to text to make the ouptut higher density. The output | ||
| // is still pretty crude, but it does enough to capture the description and | ||
| // most comments from a github PR. There is significant room to improve | ||
| // content extraction here. | ||
| textified := html2text.HTML2TextWithOptions(string(content), html2text.WithUnixLineBreaks()) | ||
| textified = textified[:min(len(textified), urlContextOutputLimit)] | ||
| return &graphqlbackend.UrlMentionContextResponse{ | ||
| Title: title, | ||
| Content: textified, | ||
| }, nil |
Check notice
Code scanning / Semgrep OSS
Semgrep Finding: security-semgrep-rules.semgrep-rules.generic.comment-tagging-rule
vovakulikov
approved these changes
Aug 1, 2024
vovakulikov
left a comment
Contributor
There was a problem hiding this comment.
Maybe I'm missing some security problem but it looks fine to me
vovakulikov
referenced
this pull request
in sourcegraph/cody-public-snapshot
Aug 2, 2024
This updates the web context provider to optionally use the new endpoint in the GraphQL API (introduced [here](https://github.com/sourcegraph/sourcegraph/pull/64223)) to fetch web context in cody web to get around CORS issues. ## Test plan Tested dev build against local Sourcegraph instance. Also tested that web mentions still work in VSCode. 
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Due to CORS rules, cody web cannot make requests to arbitrary URLs. Because of this, mentions for URLs do not currently work in Cody Web.
This adds a GraphQL endpoint to the cody context resolvers that resolves the contents of a mentioned URL.
Test plan
Tested end-to-end with dev build of cody web.
