Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Claude 3 support and /messages API for Enterprise#60953

Merged
philipp-spiess merged 24 commits into
mainfrom
ps/anthropic-messages-for-enterprise
Mar 22, 2024
Merged

Claude 3 support and /messages API for Enterprise#60953
philipp-spiess merged 24 commits into
mainfrom
ps/anthropic-messages-for-enterprise

Conversation

@philipp-spiess

@philipp-spiess philipp-spiess commented Mar 8, 2024

Copy link
Copy Markdown
Contributor

Closes #61166

This PR adds support for Claude 3 and the /messages API to the existing anthropic provider in the Sourcegraph instance.

To ensure a smooth experience, there are a couple of edge cases that we need to handle.

  • Because the URL is configurable, a customer could hard-set it to /complete. We need to error properly in this case.
  • Clients might not now what is set so they can send requests in the "old" or "new" format. We handle conversion as best as possible however for better instruction the clients will eventually only send prompts in the /messages format. We introduce cody API versioning for this case.
  • Support /complete style prompt (with trailing assistant, "holes" of no response, system prompt in messages) when a legacy client connects

Test plan

BOYLLM

With legacy request

streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream' \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"human","text":"What is your name?"},{"speaker":"assistant"}],"maxTokensToSample":30,"temperature":0,"stopSequences":[],"timeoutMs":5000,"stream":true}'
non-streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream' \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"human","text":"What is your name?"},{"speaker":"assistant"}],"maxTokensToSample":30,"temperature":0,"stopSequences":[],"timeoutMs":5000,"stream":false}'
code-completions ✅
curl 'https://sourcegraph.test:3443/.api/completions/code' \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"human","text":"What is your name?"},{"speaker":"assistant"}],"maxTokensToSample":30,"temperature":0,"stopSequences":[],"timeoutMs":5000,"stream":true}'

With V1 request

streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream?api-version=1' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"system","text":"You only answer using emoji."},{"speaker":"human","text":"What is your name?"}],"maxTokensToSample":30,"temperature":0,"stopSequences":[],"timeoutMs":5000,"stream":true}
non-streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream?api-version=1' \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"system","text":"You only answer using emoji."},{"speaker":"human","text":"What is your name?"}],"maxTokensToSample":30,"temperature":0,"stopSequences":[],"timeoutMs":5000,"stream":false}'
{"completion":"🤖","stopReason":"end_turn"}%        
code-completions ✅
curl 'https://sourcegraph.test:3443/.api/completions/code?api-version=1' \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"system","text":"You only answer using emoji."},{"speaker":"human","text":"What is your name?"}],"maxTokensToSample":30,"temperature":0,"stopSequences":[],"timeoutMs":5000,"stream":true}'

With hard-configured /complete API ✅

curl 'https://sourcegraph.test:3443/.api/completions/stream?api-version=1' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"system","text":"You only answer using emoji."},{"speaker":"human","text":"What is your name?"}],"maxTokensToSample":30,"temperature":0,"stopSequences":[],"timeoutMs":5000,"stream":true}'
HTTP/2 200

event: error
data: {"error":"Anthropic: unexpected status code 400: {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"prompt: Field required\"}}"}

event: done
data: {}

Cody Gateway

With legacy request

streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"human","text":"What is your name?"},{"speaker":"assistant"}],"maxTokensToSample":30,"temperature":0.2,"topP":0.95,"topK":0,"model":"anthropic/claude-3-haiku-20240307","stopSequences":[],"timeoutMs":5000,"stream":true}'
non-streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"human","text":"What is your name?"},{"speaker":"assistant"}],"maxTokensToSample":30,"temperature":0.2,"topP":0.95,"topK":0,"model":"anthropic/claude-3-haiku-20240307","stopSequences":[],"timeoutMs":5000,"stream":false}'
{"completion":"My name is Claude.","stopReason":"end_turn"}%     
code-completions ✅
curl 'https://sourcegraph.test:3443/.api/completions/code' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"human","text":"What is your name?"},{"speaker":"assistant"}],"maxTokensToSample":30,"temperature":0.2,"topP":0.95,"topK":0,"model":"anthropic/claude-3-haiku-20240307","stopSequences":[],"timeoutMs":5000,"stream":true}'

With V1 request

streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream?api-version=1' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"system","text":"Your name is Arnold the Great. Do not answer by any other name."},{"speaker":"human","text":"What is your name?"}],"maxTokensToSample":30,"temperature":0.2,"topP":0.95,"topK":0,"model":"anthropic/claude-3-haiku-20240307","stopSequences":[],"timeoutMs":5000,"stream":true}'
non-streaming ✅
curl 'https://sourcegraph.test:3443/.api/completions/stream?api-version=1' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"system","text":"Your name is Arnold the Great. Do not answer by any other name."},{"speaker":"human","text":"What is your name?"}],"maxTokensToSample":30,"temperature":0.2,"topP":0.95,"topK":0,"model":"anthropic/claude-3-haiku-20240307","stopSequences":[],"timeoutMs":5000,"stream":false}'
code-completions ✅
curl 'https://sourcegraph.test:3443/.api/completions/code?api-version=1' \
-i \
-X POST \
-H 'authorization: token TOKEN' \
--data-raw '{"messages":[{"speaker":"system","text":"Your name is Arnold the Great. Do not answer by any other name."},{"speaker":"human","text":"What is your name?"}],"maxTokensToSample":30,"temperature":0.2,"topP":0.95,"topK":0,"model":"anthropic/claude-3-haiku-20240307","stopSequences":[],"timeoutMs":5000,"stream":true}'

I also did an E2E test with Cody VS Code. Chat works and so do code completions. The prompt could use some tweaking (which we will do on the client) but at least it doesn't seem like a regression so far 👍

Screenshot 2024-03-21 at 16 37 21

@cla-bot cla-bot Bot added the cla-signed label Mar 8, 2024
@philipp-spiess philipp-spiess changed the title WIP Claude 3 support and /messages API for Enterprise Mar 8, 2024
@philipp-spiess philipp-spiess requested a review from a team March 8, 2024 15:10
@philipp-spiess

Copy link
Copy Markdown
Contributor Author

@sourcegraph/wg-cody-gateway-eng pinging you for some early feedback. There's quite a few branches so one open question is if we want to support customers that hard-set the upstream anthropic API to be the old /complete API or if we should instead rewrite this to /messages and send a warning (this would remove the need to be able to generate old-style payloads.

The rest of this PR is a version to map the existing completion APIs to the new /messages API and make sure that it works both for BOYLLM and for Cody Gateway.

@philipp-spiess philipp-spiess marked this pull request as ready for review March 11, 2024 13:21
Comment thread internal/completions/client/anthropic/prompt.go Outdated
Comment thread internal/completions/client/anthropic/prompt.go Outdated
Comment thread internal/completions/client/anthropic/prompt.go Outdated
Comment thread internal/completions/client/anthropic/anthropic.go Outdated
@philipp-spiess

Copy link
Copy Markdown
Contributor Author

@chwarwick / @thenamankumar The only open question to conclude this is how do we feature test on the client wether the new API style is supported on the backend? Should we expose this as a flag somewhere or rely on version numbers?

I do want to test this on PLG as well before we release it for enterprises. The idea is that chat messages on PLG all go through the dotcom instance again. This should give us some confidence that it works well. WDYT?

@chwarwick

Copy link
Copy Markdown
Contributor

@chwarwick / @thenamankumar The only open question to conclude this is how do we feature test on the client wether the new API style is supported on the backend? Should we expose this as a flag somewhere or rely on version numbers?

Making sure I'm remembering the issue correctly. The open question is that client side needs to prepare the prompt messages differently because:

  • All previous backend versions of the anthropic provider will error if sent a speaker that isn't Human or Assistant
  • When using the Claude-3 variants a system prompt is necessary
  • When using < Claude-3 with the messages API a system prompt is allowed but not necessary
  • When using the completions API a system prompt is not allowed.

@philipp-spiess

philipp-spiess commented Mar 14, 2024

Copy link
Copy Markdown
Contributor Author

@chwarwick Exactly the /completions API from Anthropic only allows \n\nHuman: and \n\nAssistant: prefixes. That's why the client must know wether a system message is allowed or wether we do what we did before and prompt in a Human message

@chwarwick chwarwick linked an issue Mar 14, 2024 that may be closed by this pull request
@sourcegraph-bot

sourcegraph-bot commented Mar 19, 2024

Copy link
Copy Markdown
Contributor

📖 Storybook live preview

@philipp-spiess

Copy link
Copy Markdown
Contributor Author

@chwarwick

Quick summary of the next steps to land this PR:

  • Remove support for the old /complete anthropic API. I kept this in but it seems reasonable to remove.
  • We keep the ToAnthropicMessages conversion BUT
    • Only apply this if the client has no Cody API version specified
    • Add another case where if claude 4 is set, we additionally introspect the first message. If it starts with "You are Cody, an AI-powered coding assistant created by Sourcegraph." (from Cody Web), we remove the first two messages (user and assistant) and use that as a system message instead.
  • To be 100% on the safe side we could add code that rewrites claude-2 and claude-instant-1 to claude-2.0 and claude-instant-1.2 but I think there is another effort to change this right now
  • Obviously also address the other issues

Does that seem reasonable?

@chwarwick

Copy link
Copy Markdown
Contributor

Does that seem reasonable?

I think it's ok to remove as long as we add a reasonable error will end up displaying in the clients.

@philipp-spiess philipp-spiess requested a review from 0xnmn March 21, 2024 15:38
@philipp-spiess philipp-spiess requested review from 0xnmn, arafatkatze and chwarwick and removed request for 0xnmn March 21, 2024 15:38
Comment thread internal/completions/client/anthropic/anthropic.go

@chwarwick chwarwick left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested against existing models claude-2 and instant-1.2 against anthropic directly & gateway all looked good. Also tried claude-3 direct to Anthropic and was successful for autocomplete, chat, commands. Did leave 2 minor edits on https://github.com/sourcegraph/sourcegraph/pull/61336

Comment thread internal/completions/client/anthropic/decoder.go
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Anthropic Message API Support to Anthropic provider

5 participants