I'd like to use tiktoken or HuggingFace tokenizer when splitting text using the python text chunker.
Example usage:
from semantic_kernel.text.text_chunker import split_plaintext_lines
import tiktoken
encoding = tiktoken.get_encoding('cl100k_base')
token_counter = lambda x: len(encoding.encode(x))
lines = split_plaintext_lines(text=text, max_token_per_line=256, token_counter=token_counter)
Related Issues:
#1240
#478