Skip to content

Python: adding external tokenizer to python text_chunker#1388

Merged
shawncal merged 4 commits intomicrosoft:mainfrom
gramhagen:gramhagen/add_text_tokenizer
Jul 8, 2023
Merged

Python: adding external tokenizer to python text_chunker#1388
shawncal merged 4 commits intomicrosoft:mainfrom
gramhagen:gramhagen/add_text_tokenizer

Conversation

@gramhagen
Copy link
Contributor

Motivation and Context

addressing issue #1387
chunking text should allow use of an external tokenizer

Description

added pass-through of an token counting function, defaulting to the existing _token_counter() method
while fixing a type hint bug I got sucked into making a few changes to clean up the code.

future work would be nice to add chunk overlap functionality similar to langchain's TextSplitter

Contribution Checklist

@github-actions github-actions bot added the python Pull requests for the Python Semantic Kernel label Jun 8, 2023
@shawncal shawncal changed the title adding external tokenizer to python text_chunker Python: adding external tokenizer to python text_chunker Jun 29, 2023
@shawncal shawncal requested a review from a team as a code owner July 8, 2023 04:42
@shawncal
Copy link
Contributor

shawncal commented Jul 8, 2023

@gramhagen Cool change! Thanks for the contribution.

Welcome to Semantic Kernel!

@shawncal shawncal added this pull request to the merge queue Jul 8, 2023
Merged via the queue into microsoft:main with commit 8527c58 Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Pull requests for the Python Semantic Kernel

Projects

Archived in project
Status: Sprint: Done

Development

Successfully merging this pull request may close these issues.

4 participants