Skip to content

Add FFI bindings for tiktoken-rs#27

Merged
yethee merged 3 commits intomasterfrom
ffi
Mar 31, 2025
Merged

Add FFI bindings for tiktoken-rs#27
yethee merged 3 commits intomasterfrom
ffi

Conversation

@yethee
Copy link
Copy Markdown
Owner

@yethee yethee commented Mar 30, 2025

Added an alternative implementation of the encoder using tiktoken-rs library to improve performance in some cases.

Fixes: #6, #25

Benchmark

NOTE: Memory measurement for the LibEncoder is not relevant.

> phpbench run -l dots --report=agg_by_subject --report=enc_chart --profile=jit
PHPBench (1.4.1) running benchmarks...
with configuration file: /workspace/phpbench.json
with PHP version 8.3.19, xdebug ❌, opcache ✔

................................................

Subjects: 4, Assertions: 0, Failures: 0, Errors: 0
encode
+--------------------+---------------------------------+------+-----+----------+-----------+---------+
| benchmark          | set                             | revs | its | mem_peak | mode      | rstdev  |
+--------------------+---------------------------------+------+-----+----------+-----------+---------+
| LibEncoderBench    | p50k_base,baconipsum            | 5    | 3   | 1.317mb  | 7.176ms   | ±9.33%  |
| LibEncoderBench    | cl100k_base,baconipsum          | 5    | 3   | 1.317mb  | 8.147ms   | ±1.83%  |
| LibEncoderBench    | o200k_base,baconipsum           | 5    | 3   | 1.317mb  | 8.218ms   | ±1.73%  |
| LibEncoderBench    | p50k_base,cyrillic              | 5    | 3   | 1.317mb  | 1.826ms   | ±1.51%  |
| LibEncoderBench    | cl100k_base,cyrillic            | 5    | 3   | 1.317mb  | 2.145ms   | ±0.58%  |
| LibEncoderBench    | o200k_base,cyrillic             | 5    | 3   | 1.317mb  | 1.942ms   | ±1.04%  |
| LibEncoderBench    | p50k_base,latin                 | 5    | 3   | 1.317mb  | 758.602μs | ±0.16%  |
| LibEncoderBench    | cl100k_base,latin               | 5    | 3   | 1.317mb  | 1.110ms   | ±13.83% |
| LibEncoderBench    | o200k_base,latin                | 5    | 3   | 1.317mb  | 1.510ms   | ±16.32% |
| LibEncoderBench    | p50k_base,without-whitespaces   | 5    | 3   | 4.833mb  | 2.053s    | ±1.05%  |
| LibEncoderBench    | cl100k_base,without-whitespaces | 5    | 3   | 4.833mb  | 2.413s    | ±1.77%  |
| LibEncoderBench    | o200k_base,without-whitespaces  | 5    | 3   | 2.736mb  | 31.133ms  | ±0.44%  |
| NativeEncoderBench | p50k_base,baconipsum            | 5    | 3   | 7.271mb  | 9.862ms   | ±1.86%  |
| NativeEncoderBench | cl100k_base,baconipsum          | 5    | 3   | 13.994mb | 7.598ms   | ±3.17%  |
| NativeEncoderBench | o200k_base,baconipsum           | 5    | 3   | 27.583mb | 6.488ms   | ±1.06%  |
| NativeEncoderBench | p50k_base,cyrillic              | 5    | 3   | 7.271mb  | 4.217ms   | ±5.89%  |
| NativeEncoderBench | cl100k_base,cyrillic            | 5    | 3   | 13.994mb | 4.682ms   | ±1.93%  |
| NativeEncoderBench | o200k_base,cyrillic             | 5    | 3   | 27.583mb | 3.561ms   | ±1.75%  |
| NativeEncoderBench | p50k_base,latin                 | 5    | 3   | 7.271mb  | 256.463μs | ±4.12%  |
| NativeEncoderBench | cl100k_base,latin               | 5    | 3   | 13.994mb | 274.299μs | ±1.57%  |
| NativeEncoderBench | o200k_base,latin                | 5    | 3   | 27.583mb | 299.513μs | ±13.34% |
| NativeEncoderBench | p50k_base,without-whitespaces   | 5    | 3   | 34.318mb | 49.407s   | ±0.45%  |
| NativeEncoderBench | cl100k_base,without-whitespaces | 5    | 3   | 39.993mb | 56.818s   | ±0.69%  |
| NativeEncoderBench | o200k_base,without-whitespaces  | 5    | 3   | 27.583mb | 35.300ms  | ±0.33%  |
+--------------------+---------------------------------+------+-----+----------+-----------+---------+

decode
+--------------------+---------------------------------+------+-----+----------+-----------+---------+
| benchmark          | set                             | revs | its | mem_peak | mode      | rstdev  |
+--------------------+---------------------------------+------+-----+----------+-----------+---------+
| LibEncoderBench    | p50k_base,baconipsum            | 5    | 3   | 1.317mb  | 750.609μs | ±0.91%  |
| LibEncoderBench    | cl100k_base,baconipsum          | 5    | 3   | 1.317mb  | 657.150μs | ±2.18%  |
| LibEncoderBench    | o200k_base,baconipsum           | 5    | 3   | 1.317mb  | 668.732μs | ±2.42%  |
| LibEncoderBench    | p50k_base,cyrillic              | 5    | 3   | 1.317mb  | 407.333μs | ±18.12% |
| LibEncoderBench    | cl100k_base,cyrillic            | 5    | 3   | 1.317mb  | 268.550μs | ±42.99% |
| LibEncoderBench    | o200k_base,cyrillic             | 5    | 3   | 1.317mb  | 238.260μs | ±1.02%  |
| LibEncoderBench    | p50k_base,latin                 | 5    | 3   | 1.317mb  | 105.187μs | ±8.90%  |
| LibEncoderBench    | cl100k_base,latin               | 5    | 3   | 1.317mb  | 123.266μs | ±18.81% |
| LibEncoderBench    | o200k_base,latin                | 5    | 3   | 1.317mb  | 114.973μs | ±1.97%  |
| LibEncoderBench    | p50k_base,without-whitespaces   | 5    | 3   | 3.121mb  | 3.781ms   | ±0.98%  |
| LibEncoderBench    | cl100k_base,without-whitespaces | 5    | 3   | 3.100mb  | 3.756ms   | ±2.45%  |
| LibEncoderBench    | o200k_base,without-whitespaces  | 5    | 3   | 2.031mb  | 3.669ms   | ±0.85%  |
| NativeEncoderBench | p50k_base,baconipsum            | 5    | 3   | 7.271mb  | 1.238ms   | ±0.37%  |
| NativeEncoderBench | cl100k_base,baconipsum          | 5    | 3   | 13.994mb | 1.103ms   | ±0.99%  |
| NativeEncoderBench | o200k_base,baconipsum           | 5    | 3   | 27.583mb | 954.397μs | ±2.41%  |
| NativeEncoderBench | p50k_base,cyrillic              | 5    | 3   | 7.271mb  | 700.969μs | ±4.67%  |
| NativeEncoderBench | cl100k_base,cyrillic            | 5    | 3   | 13.994mb | 337.236μs | ±3.64%  |
| NativeEncoderBench | o200k_base,cyrillic             | 5    | 3   | 27.583mb | 230.186μs | ±2.12%  |
| NativeEncoderBench | p50k_base,latin                 | 5    | 3   | 7.271mb  | 139.868μs | ±5.20%  |
| NativeEncoderBench | cl100k_base,latin               | 5    | 3   | 13.994mb | 133.633μs | ±2.12%  |
| NativeEncoderBench | o200k_base,latin                | 5    | 3   | 27.583mb | 137.276μs | ±3.20%  |
| NativeEncoderBench | p50k_base,without-whitespaces   | 5    | 3   | 32.217mb | 6.593ms   | ±1.03%  |
| NativeEncoderBench | cl100k_base,without-whitespaces | 5    | 3   | 37.892mb | 6.274ms   | ±1.01%  |
| NativeEncoderBench | o200k_base,without-whitespaces  | 5    | 3   | 27.583mb | 5.968ms   | ±4.05%  |
+--------------------+---------------------------------+------+-----+----------+-----------+---------+

ASCII text ~40k characters long:
image

UTF8 text ~7k characters long:
image

ASCII text ~6k characters long:
image

Text 100k characters long without any spaces:
image

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Mar 30, 2025

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-11.29% (target: -1.00%) 69.05%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (5ce64fe) 186 155 83.33%
Head commit (4add7b3) 279 (+93) 201 (+46) 72.04% (-11.29%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#27) 210 145 69.05%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@yethee yethee marked this pull request as ready for review March 31, 2025 10:42
@yethee yethee merged commit 6866101 into master Mar 31, 2025
19 of 21 checks passed
@yethee yethee deleted the ffi branch March 31, 2025 10:50
@flexchar
Copy link
Copy Markdown

Just to be clear, lib means using FFI and native means using existing way?

@yethee
Copy link
Copy Markdown
Owner Author

yethee commented Mar 31, 2025

@flexchar Yes, that's right.

@flexchar
Copy link
Copy Markdown

Out of curiosity, I'm so surprised that native is faster in certain cases. Any idea how that is possible?

@yethee
Copy link
Copy Markdown
Owner Author

yethee commented Mar 31, 2025

Using FFI we have a performance overhead (marshalling costs). For example, strings need to be copied from C to Rust, etc.

This approach can be profitable when there are a lot of CPU-bound computations. Mainly for encoding text into tokens. In the case of decoding, both implementations are close in performance, since we only need to traverse array of tokens once and concat the string.

@flexchar
Copy link
Copy Markdown

Thank you for sharing your wisdom, dear person!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is PHP cursed to be much slower?

2 participants