Skip to content

.Net: Remove GPT tokenizer #2508

@dluc

Description

@dluc

Scope

  1. Remove GPT Tokenizer from SDK
  2. Remove tokenizers from notebook

This code works in a typical C# project, but fails in a notebook, because GPT resources are in a different path.

Console app:

using Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers;

Console.WriteLine(GPT3Tokenizer.Encode("hello world").Count + " tokens");

Output:

2 tokens

C# notebook:

#r "nuget: Microsoft.SemanticKernel.Connectors.AI.OpenAI, 0.19.230804.2-preview"

using Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers;

Console.WriteLine(GPT3Tokenizer.Encode("hello world").Count + " tokens");

Output:

Error: System.IO.FileNotFoundException: vocab.bpe not found, path: '~/.nuget/packages/microsoft.semantickernel.connectors.ai.openai/0.19.230804.2-preview/lib/netstandard2.0/Tokenizers/Settings/vocab.bpe'
at Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers.Settings.EmbeddedResource.ReadFile(String fileName)
at Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers.Settings.EmbeddedResource.ReadBytePairEncodingTable()
at Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers.Settings.GPT3Settings.<>c.<.cctor>b__6_1()
at System.Lazy1.ViaFactory(LazyThreadSafetyMode mode) at System.Lazy1.ExecutionAndPublication(LazyHelper executionAndPublication, Boolean useDefaultConstructor)
at System.Lazy1.CreateValue() at Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers.Settings.GPT3Settings.get_BpeRanks() at Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers.GPT3Tokenizer.BytePairEncoding(String token) at Microsoft.SemanticKernel.Connectors.AI.OpenAI.Tokenizers.GPT3Tokenizer.Encode(String text) at Submission#2.<<Initialize>>d__0.MoveNext() --- End of stack trace from previous location --- at Microsoft.CodeAnalysis.Scripting.ScriptExecutionState.RunSubmissionsAsync[TResult](ImmutableArray1 precedingExecutors, Func2 currentExecutor, StrongBox1 exceptionHolderOpt, Func`2 catchExceptionOpt, CancellationToken cancellationToken)

See how the code is looking here:

~/.nuget/packages/microsoft.semantickernel.connectors.ai.openai/0.19.230804.2-preview/
   lib/netstandard2.0/Tokenizers/Settings/vocab.bpe

but the file is actually here:

~/.nuget/packages/microsoft.semantickernel.connectors.ai.openai/0.19.230804.2-preview/
   contentFiles/any/netstandard2.0/Tokenizers/Settings/vocab.bpe

or here:

~/.nuget/packages/microsoft.semantickernel.connectors.ai.openai/0.19.230804.2-preview/
   content/Tokenizers/Settings/vocab.bpe

Metadata

Metadata

Assignees

Labels

.NETIssue or Pull requests regarding .NET codebugSomething isn't workingsk team issueA tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)

Type

No type

Projects

Status

Sprint: Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions