Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Cody web: add server-side fetching for URL mentions#64223

Merged
camdencheek merged 8 commits into
mainfrom
cc/url-mention-proxy
Aug 1, 2024
Merged

Cody web: add server-side fetching for URL mentions#64223
camdencheek merged 8 commits into
mainfrom
cc/url-mention-proxy

Conversation

@camdencheek

@camdencheek camdencheek commented Aug 1, 2024

Copy link
Copy Markdown
Member

Due to CORS rules, cody web cannot make requests to arbitrary URLs. Because of this, mentions for URLs do not currently work in Cody Web.

This adds a GraphQL endpoint to the cody context resolvers that resolves the contents of a mentioned URL.

Test plan

Tested end-to-end with dev build of cody web.
screenshot-2024-08-01_13-55-23@2x

@cla-bot cla-bot Bot added the cla-signed label Aug 1, 2024
@github-actions

github-actions Bot commented Aug 1, 2024

Copy link
Copy Markdown
Contributor

❌ Cloud Controller GraphQL Compatability Test Result

sourcegraph/controller uses the GraphQL API to perform automation. The compatibility test has failed and this pull request may have introduced breaking changes to the GraphQL schema.

Next steps:

  • Review the GitHub Actions workflow logs for more details.
  • Reach out to the Cloud Ops team to resolve the issue before merging this pull request.

Comment on lines +444 to +474
// 🚨 SECURITY: This endpoint allows API users to create GET requests against arbitrary URLs.
// To mitigate risk of SSRF, we use an the ExternalClient, which denies requests to internal targets.
resp, err := httpcli.UncachedExternalClient.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()

if resp.StatusCode >= http.StatusBadRequest {
return nil, errors.Errorf("request failed with status %d", resp.StatusCode)
}

// 🚨 SECURITY: Limit the amount of data we will read into memory.
content, err := io.ReadAll(io.LimitReader(resp.Body, urlContextReadLimit))
if err != nil {
return nil, err
}

// Attempt to extract the title
var title *string
if match := titleRegexp.FindSubmatch(content); match != nil {
title = pointers.Ptr(string(match[1]))
}

// Trim to main if it exists since that's a decent signal pointing to the important part of the page.
if idx := bytes.Index(content, []byte("<main")); idx > 0 {
content = content[idx:]
}
if idx := bytes.Index(content, []byte("</main>")); idx > 0 {
content = content[:idx+len("</main>")]
}

Check notice

Code scanning / Semgrep OSS

Semgrep Finding: security-semgrep-rules.semgrep-rules.generic.comment-tagging-rule

Code that highlight SECURITY in comment has changed. Please review the code for changes. The changes might be sensitive.
Comment on lines +456 to +485
// 🚨 SECURITY: Limit the amount of data we will read into memory.
content, err := io.ReadAll(io.LimitReader(resp.Body, urlContextReadLimit))
if err != nil {
return nil, err
}

// Attempt to extract the title
var title *string
if match := titleRegexp.FindSubmatch(content); match != nil {
title = pointers.Ptr(string(match[1]))
}

// Trim to main if it exists since that's a decent signal pointing to the important part of the page.
if idx := bytes.Index(content, []byte("<main")); idx > 0 {
content = content[idx:]
}
if idx := bytes.Index(content, []byte("</main>")); idx > 0 {
content = content[:idx+len("</main>")]
}

// Convert the HTML to text to make the ouptut higher density. The output
// is still pretty crude, but it does enough to capture the description and
// most comments from a github PR. There is significant room to improve
// content extraction here.
textified := html2text.HTML2TextWithOptions(string(content), html2text.WithUnixLineBreaks())
textified = textified[:min(len(textified), urlContextOutputLimit)]
return &graphqlbackend.UrlMentionContextResponse{
Title: title,
Content: textified,
}, nil

Check notice

Code scanning / Semgrep OSS

Semgrep Finding: security-semgrep-rules.semgrep-rules.generic.comment-tagging-rule

Code that highlight SECURITY in comment has changed. Please review the code for changes. The changes might be sensitive.
@camdencheek camdencheek marked this pull request as ready for review August 1, 2024 19:48
@camdencheek camdencheek requested a review from a team August 1, 2024 19:48
@camdencheek camdencheek requested a review from a team August 1, 2024 20:17

@vovakulikov vovakulikov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing some security problem but it looks fine to me

@camdencheek camdencheek merged commit aaa464e into main Aug 1, 2024
@camdencheek camdencheek deleted the cc/url-mention-proxy branch August 1, 2024 22:01
vovakulikov referenced this pull request in sourcegraph/cody-public-snapshot Aug 2, 2024
This updates the web context provider to optionally use the new endpoint
in the GraphQL API (introduced
[here](https://github.com/sourcegraph/sourcegraph/pull/64223)) to fetch
web context in cody web to get around CORS issues.

## Test plan

Tested dev build against local Sourcegraph instance. Also tested that
web mentions still work in VSCode.

![screenshot-2024-08-01_14-40-53@2x](https://github.com/user-attachments/assets/d6ba00aa-459a-41c9-a6ce-eb53af55239d)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants