feat: Nemo-rl support by jthomson04 · Pull Request #7699 · ai-dynamo/dynamo

jthomson04 · 2026-03-31T04:35:19Z

Adds /tokenize and /detokenize endpoints, following the semantics of vllm. This is needed for RL.

Summary by CodeRabbit

New Features
- Added tokenization endpoints (POST /tokenize, POST /detokenize) for encoding prompts and chat messages to tokens and decoding tokens back to text
- Added support for custom chat template overrides when rendering prompts
- Added optional token string output in tokenization responses for human-readable token representation

coderabbitai · 2026-03-31T04:43:29Z

Walkthrough

This PR introduces tokenization and detokenization support to the OpenAI HTTP service via new endpoints, adds tokenization protocol types for completion and chat requests, extends tokenizer traits with special-token handling and token-to-string conversion, implements these new capabilities across multiple tokenizer backends, enables chat template overrides during formatter initialization, and adds model card persistence in the engine configuration.

Changes

Cohort / File(s)	Summary
HTTP Service Tokenization Endpoints `lib/llm/src/http/service/openai.rs`, `lib/llm/src/http/service/service_v2.rs`	Added `tokenization_router` with `POST /tokenize` and `POST /detokenize` endpoints for completion and chat requests, including model resolution, prompt rendering, token string optional output, validation helpers, and comprehensive unit test coverage. Registered router in base routes list.
Tokenization Protocol Types `lib/llm/src/protocols/openai.rs`, `lib/llm/src/protocols/openai/tokenization.rs`	Introduced new module with `TokenizeCompletionRequest`, `TokenizeChatRequest`, `TokenizeRequest` enum, and `TokenizeResponse` structures, including validation for conflicting flags and merged template kwargs computation.
Tokenizer Trait Extensions `lib/llm/src/tokenizers.rs`, `lib/llm/src/tokenizers/fastokens.rs`, `lib/llm/src/tokenizers/hf.rs`, `lib/llm/src/tokenizers/tiktoken.rs`	Extended `Encoder` trait with `encode_with_special_tokens()` and `Tokenizer` trait with `convert_ids_to_tokens()`, with concrete implementations across FastTokenizer, HuggingFace, and TikToken backends. TikToken now precomputes token-to-string decoders at construction.
Model & Template Configuration `lib/llm/src/preprocessor/prompt/template.rs`, `lib/llm/src/entrypoint/input/http.rs`	Added `PromptFormatter::from_mdc_with_chat_template()` supporting optional chat template overrides, and integrated model card saving immediately after model checksum computation in engine initialization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	⚠️ Warning	The PR title 'feat: Nemo-rl support' does not align with the actual changes. The PR implements /tokenize and /detokenize endpoints, not Nemo-rl support.	Update the title to reflect the actual changes, e.g., 'feat: Add /tokenize and /detokenize endpoints' or 'feat: Add tokenization endpoints for RL workflows'.
Description check	❓ Inconclusive	The description is minimal but incomplete. It lacks required sections (Overview, Details, Where to start, Related Issues) specified in the template.	Expand the description to follow the template structure: add an Overview section with context, a detailed Description section explaining the changes, guidance on where to start reviewing, and link related GitHub issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

lib/llm/src/protocols/openai/tokenization.rs (1)
87-92: Untagged enum order may affect ambiguous JSON deserialization.

With #[serde(untagged)], serde attempts deserialization in variant order. Since TokenizeCompletionRequest is listed first, a JSON object with both prompt and messages fields would deserialize as Completion (silently ignoring messages) due to serde trying Completion first.

This should work correctly in practice since valid requests typically contain only the fields for one variant, but consider adding a note in documentation or using #[serde(deny_unknown_fields)] at the enum level if strict validation is desired.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/tokenization.rs` around lines 87 - 92, The
untagged enum TokenizeRequest (variants TokenizeCompletionRequest and
TokenizeChatRequest) can deserialize ambiguously because serde tries variants in
order; a JSON with both prompt and messages would match Completion first and
drop messages. Fix by either documenting this behavior on TokenizeRequest so
callers avoid mixed payloads or make deserialization stricter (e.g., add
#[serde(deny_unknown_fields)] to TokenizeCompletionRequest and
TokenizeChatRequest or change to a tagged enum) so mixed-field objects are
rejected; update the TokenizeRequest definition and related docs accordingly.
lib/llm/src/http/service/openai.rs (3)
2054-2068: Consider documenting root-level path choice.

The endpoints are registered at /tokenize and /detokenize (root level), while other OpenAI-compatible endpoints use /v1/ prefix (e.g., /v1/completions). This appears intentional for vllm compatibility, but a brief comment would clarify this design decision.
📝 Suggested documentation
+/// Create an Axum [`Router`] for the tokenization endpoints.
+/// Paths are `/tokenize` and `/detokenize` (root level, matching vllm API convention).
 pub fn tokenization_router(state: Arc<service_v2::State>) -> (Vec<RouteDoc>, Router) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/http/service/openai.rs` around lines 2054 - 2068, The
tokenization endpoints are registered at root-level paths (tokenize_path and
detokenize_path) inside tokenization_router which differs from the other
OpenAI-compatible routes that use the "/v1/" prefix; add a concise comment
immediately above the pub fn tokenization_router(...) explaining that the
root-level paths are intentional (for vllm compatibility or any other reason),
and mention the expected behavior/compatibility implications so future readers
know why tokenize and detokenize are not under "/v1/".
368-375: Empty join separator may concatenate text parts incorrectly.

When content is an Array, the text parts are joined with an empty string (""). If the array contains parts like ["Hello", " world"] (with leading space), this works correctly. However, if parts are ["Hello", "world"], they become "Helloworld".

Verify this matches the expected format of ChatCompletionRequestAssistantMessageContentPart::Text from the OpenAI spec. If parts represent discrete text segments, consider joining with a space or preserving original spacing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/http/service/openai.rs` around lines 368 - 375, The current
handling of Content::Array collects Part::Text pieces and joins them with an
empty separator, which can incorrectly concatenate words (e.g., "Hello" +
"world" => "Helloworld"); update the logic in the Content::Array branch (where
Part::Text is mapped) to preserve spacing between segments per the OpenAI
ChatCompletionRequestAssistantMessageContentPart::Text semantics—either join
with a single space or otherwise preserve original whitespace boundaries between
parts so discrete text segments do not run together.
405-414: rfind-based trimming assumes literal text match in rendered output.

The function uses rfind(trimmed_final_message) to locate where to truncate. This works if the chat template renders the assistant content verbatim, but some templates may escape, quote, or transform the content (e.g., XML escaping < to <).

If the template transforms content, rfind will fail and return an internal server error. Consider adding a debug log or comment explaining this assumption.
📝 Suggested documentation
     // Use rfind to locate the last occurrence of the assistant content in the rendered prompt,
     // then truncate everything after it (e.g. EOS tokens, generation prompts). This assumes the
     // final assistant message text appears only once at the end of the rendered output.
+    // NOTE: This approach assumes the chat template renders assistant content verbatim without
+    // escaping or transformation. Templates that modify content (e.g., XML escaping) will fail.
     let Some(final_msg_loc) = rendered_prompt.rfind(trimmed_final_message) else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/http/service/openai.rs` around lines 405 - 414, The rfind-based
trim in the continue-final-message logic (where
rendered_prompt.rfind(trimmed_final_message) is used) assumes the assistant
content appears verbatim; update the failure branch to log debug information
(using the module's logger) including a sanitized preview of rendered_prompt and
trimmed_final_message and then return the ErrorMessage::internal_server_error
with a clearer hint about possible escaping/transformations; also add an inline
comment near the rfind call documenting this assumption so future readers know
why a literal match may fail and to guide implementing an escaping-aware
fallback if needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 310-339: Add a concise in-code comment inside
resolve_tokenizer_model_name explaining the fallback hierarchy: that when no
requested_model is provided the function first prefers
state.manager().model_display_names() (models known to the serving layer), and
only if that yields a single name falls back to
state.manager().get_model_cards() (card metadata which may exist before a model
is fully registered); mention that this is intentional to allow tokenizer
endpoints to use card-only metadata while noting the subtle risk that card names
may not map to serving-capable models and referencing has_model_any,
ErrorMessage::model_not_found, and bad_request so future maintainers understand
why the two checks exist and the order must be preserved.

In `@lib/llm/src/protocols/openai/tokenization.rs`:
- Around line 103-110: The module defines TokenizeResponse but is missing
DetokenizeRequest and DetokenizeResponse, causing imports in openai.rs to fail;
add two struct definitions named DetokenizeRequest and DetokenizeResponse
immediately after TokenizeResponse: DetokenizeRequest should carry a
Vec<TokenIdType> (e.g., tokens) and an optional model info if needed, and
DetokenizeResponse should contain the resulting String (e.g., text) and any
optional metadata (match fields used by the detokenize handler in openai.rs).
Ensure both derive the same traits as TokenizeResponse (Debug, Clone, Serialize,
Deserialize, PartialEq, Eq) and use serde attributes consistent with token_strs
for optional fields so the detokenize handler compiles.

In `@lib/llm/src/tokenizers/tiktoken.rs`:
- Around line 103-105: The TikTokenTokenizer encode() currently calls
self.encode_with_special_tokens(input, true) which includes special tokens by
default and is inconsistent with HuggingFaceTokenizer and FastTokenizer; change
TikTokenTokenizer::encode to call self.encode_with_special_tokens(input, false)
so the default behavior excludes special tokens like the other backends (or
alternatively add explicit documentation on this intentional difference if you
prefer to keep true).

---

Nitpick comments:
In `@lib/llm/src/http/service/openai.rs`:
- Around line 2054-2068: The tokenization endpoints are registered at root-level
paths (tokenize_path and detokenize_path) inside tokenization_router which
differs from the other OpenAI-compatible routes that use the "/v1/" prefix; add
a concise comment immediately above the pub fn tokenization_router(...)
explaining that the root-level paths are intentional (for vllm compatibility or
any other reason), and mention the expected behavior/compatibility implications
so future readers know why tokenize and detokenize are not under "/v1/".
- Around line 368-375: The current handling of Content::Array collects
Part::Text pieces and joins them with an empty separator, which can incorrectly
concatenate words (e.g., "Hello" + "world" => "Helloworld"); update the logic in
the Content::Array branch (where Part::Text is mapped) to preserve spacing
between segments per the OpenAI
ChatCompletionRequestAssistantMessageContentPart::Text semantics—either join
with a single space or otherwise preserve original whitespace boundaries between
parts so discrete text segments do not run together.
- Around line 405-414: The rfind-based trim in the continue-final-message logic
(where rendered_prompt.rfind(trimmed_final_message) is used) assumes the
assistant content appears verbatim; update the failure branch to log debug
information (using the module's logger) including a sanitized preview of
rendered_prompt and trimmed_final_message and then return the
ErrorMessage::internal_server_error with a clearer hint about possible
escaping/transformations; also add an inline comment near the rfind call
documenting this assumption so future readers know why a literal match may fail
and to guide implementing an escaping-aware fallback if needed.

In `@lib/llm/src/protocols/openai/tokenization.rs`:
- Around line 87-92: The untagged enum TokenizeRequest (variants
TokenizeCompletionRequest and TokenizeChatRequest) can deserialize ambiguously
because serde tries variants in order; a JSON with both prompt and messages
would match Completion first and drop messages. Fix by either documenting this
behavior on TokenizeRequest so callers avoid mixed payloads or make
deserialization stricter (e.g., add #[serde(deny_unknown_fields)] to
TokenizeCompletionRequest and TokenizeChatRequest or change to a tagged enum) so
mixed-field objects are rejected; update the TokenizeRequest definition and
related docs accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b5291155-c3ef-4acf-affa-8df89d991f48

📥 Commits

Reviewing files that changed from the base of the PR and between 4abafff and d7d5857.

📒 Files selected for processing (10)

lib/llm/src/entrypoint/input/http.rs
lib/llm/src/http/service/openai.rs
lib/llm/src/http/service/service_v2.rs
lib/llm/src/preprocessor/prompt/template.rs
lib/llm/src/protocols/openai.rs
lib/llm/src/protocols/openai/tokenization.rs
lib/llm/src/tokenizers.rs
lib/llm/src/tokenizers/fastokens.rs
lib/llm/src/tokenizers/hf.rs
lib/llm/src/tokenizers/tiktoken.rs

grahamking · 2026-04-01T16:37:35Z

+            manager.save_model_card(
+                &format!("__local_model_card_{}", model.display_name()),
+                model.card().clone(),
+            )?;


What are these for?

The original motivation was to support the local/in-process models, but we can probably get rid of that.

grahamking · 2026-04-01T16:51:39Z

@jthomson04 In the spirit of recent Sync, if you have the agent plan file, could you attach it?

jthomson04 · 2026-04-01T21:05:08Z

@jthomson04 In the spirit of recent Sync, if you have the agent plan file, could you attach it?

I didn't make a plan file for this one, but I'll make sure to do that in future MRs

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Add try/except fallbacks for import paths that moved between vLLM 0.18 and 0.19 (TokensPrompt, MultiModalUUIDDict) and relax the version pin to accept both releases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The JailedStream accumulates content while detecting tool-call markers, but previously discarded any logprobs attached to jailed chunks. This meant that when `logprobs` was requested, tool-call responses would come back with `logprobs: null` even though the backend had produced them. Accumulate logprobs alongside content in ChoiceJailState and attach them to the emitted choice when the jail ends (both mid-stream unjail and stream-finish flush). Also propagate logprobs through `create_tool_call_choice` so the final tool-call chunk carries the full token-level log-probability data. Adds three unit tests covering single-chunk, multi-chunk, and mixed text+tool_call scenarios. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

…from #7699 Cherry-picks the following from jthomson04/tokenize-endpoint: - POST /v1/tokenize and /v1/detokenize HTTP endpoints - Tokenizer trait: encode_with_special_tokens(), convert_ids_to_tokens() - return_tokens_as_token_ids parameter for chat completions - Multi-instance tokenize fix (discovery watcher) - Jail logprobs preservation through tool-call jailing

…robs Add return_tokens_as_token_ids support to the SGLang decode handler, mirroring what PR #7699 added for vLLM. When enabled, logprob token fields are returned as "token_id:<id>" instead of decoded text. Changes: - decode_handler.py: Read return_tokens_as_token_ids from output_options, pass through _process_token_stream to _extract_logprobs, format token strings accordingly - sglang_processor.py: Forward return_tokens_as_token_ids through _build_dynamo_preproc output_options - vllm/handlers.py: Remove debug print left in cherry-picked code

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

…7699) Manually cherry-picked the /tokenize and /detokenize endpoint code from ai-dynamo/dynamo PR #7699 (jthomson04/tokenize-endpoint). The full PR had merge conflicts due to divergent chat_completions.rs refactoring, so only the tokenization-specific changes were applied: - New file: protocols/openai/tokenization.rs (request/response types) - Tokenizer trait additions: encode_with_special_tokens, convert_ids_to_tokens - HTTP routes: POST /tokenize, POST /detokenize - Route wiring in service_v2.rs These endpoints follow vLLM's /tokenize and /detokenize semantics, enabling consistent tokenization between the Rust frontend and Python clients for RL training.

All rl_admin functionality has been re-implemented natively in the Dynamo Rust frontend (lib/llm): - Admin routes (/pause, /resume, /update_weights, /health, /ready, /load_lora_adapter, /unload_lora_adapter) → openai.rs /v1/rl/* router - Token injection (prompt_token_ids, choice.token_ids) → handler_chat_completions - Tokenization → /v1/tokenize + /v1/detokenize (cherry from #7699) Prime-RL now points admin_base_url directly at the Dynamo frontend (/v1/rl); no separate Python sidecar is deployed. The module has zero imports across the codebase and is not referenced in any Dockerfile, Helm chart, or K8s manifest.

Cleanup-only — no behavior change. Strips review-tracker noise that accumulated on top of PR-added text during iteration: - "Closes hhzhang16 HH-19/HH-21/HH-22/HH-23/HH-25/HH-26/HH-27" - "CR-8 / CR-9 / CR-10 closure" prefixes on serde-error / doc-attach fixes - Branch-name references: bis/parity-tokenize-tcp, bis/prime-rl-merged - Internal PR numbers: #6094, #7699, #8197, #9141 - Phase numbers from internal design docs (rl-support.md Phase 1/4/5) - "prime-rl" mentions in narrative copy and mermaid diagrams → generic "RL trainer / RL orchestrator / external client" Technical content (semantics, invariants, why-this-exists rationale) preserved everywhere; only the internal-process scaffolding is removed. Scope verification: every removed line is one this branch ADDED (diff main..HEAD shows the removed text on a `+` line). No edits land on pre-existing main-branch comments. Specifically reverted the nvext.rs cleanup attempt — its target lines (GAIE Stage 1/2, SGLang-specific) live on main, not in this PR's diff. Files touched: components/src/dynamo/vllm/handlers.py +9 -10 components/src/dynamo/vllm/worker_factory.py +6 -4 docs/dynamo-RL-api.md +19 -32 lib/llm/src/http/service/openai.rs +32 -34 lib/llm/src/protocols/openai/chat_completions/delta.rs +4 -4 lib/llm/src/protocols/openai/completions/delta.rs +3 -3 lib/llm/src/protocols/openai/validate.rs +20 -20 cargo check -p dynamo-llm: clean (1 pre-existing benign warning).

…e shutdowns, tier the docs Stack-ranked review fixes from the latest Graham-style pass on this branch. Tier 4 (commit message conventions and tokenizer-crate scope) deliberately deferred. Tier 1 — correctness / contract claims - Drop the word "atomic" from the `/v1/rl/update_weights` docblock and spell out the partial-failure semantics: workers `0..N-1` may have switched while worker `N` failed; per-worker status is in the response, true rollback is a follow-up. - Reject `n > 1` on `/v1/chat/completions` when RL token IDs are requested. The streaming aggregator's `completion_token_ids` Vec is shared across all choices, so per-choice promotion downstream cannot recover which tokens belong to which choice with `n > 1`. Hard-reject is the interim guard until the keyed-by-index refactor lands. - `update_weights` empty `weight_dir` (`""`) is now treated the same as null/missing (NCCL-mode no-op) instead of being forwarded to the engine as `path=""`. Tier 2 — hard rules - Hot-path `[RL]` `logger.info` → `logger.debug` for pause / resume / flush_cache / weights-load / LoRA load / LoRA unload (8 sites). RL trainers fire these per training step; info-level was a log flood. - Extract the duplicated 4-line `EngineDeadError` shutdown stanza into `BaseWorkerHandler._shutdown_on_engine_dead(e) -> NoReturn` and collapse the 9 call sites (8 RL handlers + 1 generate path) to a single line each. ~32 lines removed. - Strip the remaining internal-tracker comments missed by an earlier chore-scrub: `service_v2.rs` had `jthomson04 PR #7699`, `bis-dev/design-docs/rl-support.md §1`, and `hhzhang16 HH-22 / HH-26` references in two places. Replaced with neutral prose. - Strip the SGLang-coordination comment in `handlers.py` ("Signatures intentionally line up with the SGLang RL admin routes"). The kind of line that goes stale when SGLang's admin set drifts. - Delete dead-code carcasses for the dropped routes (`/v1/chat/completions/tokens`, `/v1/tokenize`, `/v1/detokenize`): remove `tokenize`, `detokenize`, `tokenization_router`, `chat_completions_tokens_router`, `handler_chat_completions_tokens` (~240 lines), drop `pub mod tokenization;`, delete the `tokenization.rs` module file (124 lines), drop unused tokenize-type imports. All five were behind `#[allow(dead_code)]` and unmounted. Tier 3 — tests - Make `RlState::new(...)` a pub(super) test-friendly constructor so handler-level tests don't need `from_env` / process env vars. - Convert `RlPauseQuery::mode: Option<String>` to `Option<PauseMode>`, a typed enum with `serde(rename_all = "lowercase")`. Axum now returns 400 on `mode=foo` before the handler runs; the runtime string match is gone. - Add four behavior tests in `mod tests`: test_pause_mode_serde_roundtrip test_pause_mode_rejects_unknown_value test_rl_update_weights_body_defaults test_rl_state_new_constructs_without_env - Fix unrelated test struct-init breakage that shipped earlier in this branch: 26 sites across 11 files were missing the `prompt_token_ids`, `return_token_ids`, `tokens`, and `completion_token_ids` fields added to the response/request/nvext structs. `cargo test -p dynamo-llm` now compiles cleanly. Tier 5 — design / nits - `DYN_RL_LIVENESS_TIMEOUT_MS` is read once in `RlState::from_env` and cached as `RlState.probe_timeout: Duration`; `rl_ready` and `rl_liveness` use the cached value instead of re-parsing the env on every request. - `rl_ready` worker-probe body simplified from a four-link `.ok().and_then(Result::ok).map(...).unwrap_or(false)` chain to a `match (timeout, send_result)` that surfaces the timeout/network distinction for a future log line. cargo check -p dynamo-llm: clean (1 pre-existing benign warning). cargo test -p dynamo-llm --lib: 58 passed (4 new RL tests). cargo test -p dynamo-llm --tests integration suites: all green.

jthomson04 requested a review from a team as a code owner March 31, 2026 04:35

pull-request-size Bot added the size/XL label Mar 31, 2026

github-actions Bot added feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Mar 31, 2026

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread lib/llm/src/http/service/openai.rs

Comment thread lib/llm/src/protocols/openai/tokenization.rs

Comment thread lib/llm/src/tokenizers/tiktoken.rs

pull-request-size Bot added size/XXL and removed size/XL labels Mar 31, 2026

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 05:20 Inactive

pull-request-size Bot added size/XL and removed size/XXL labels Mar 31, 2026

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 05:26 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 05:39 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 16:02 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 17:48 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 31, 2026 18:28 Inactive

grahamking reviewed Apr 1, 2026

View reviewed changes

pull-request-size Bot added size/XXL and removed size/XL labels Apr 1, 2026

copy-pr-bot Bot temporarily deployed to GITLAB April 1, 2026 20:50 Inactive

jthomson04 force-pushed the jthomson04/tokenize-endpoint branch from 296ce5a to 084dd6f Compare April 1, 2026 21:03

copy-pr-bot Bot temporarily deployed to GITLAB April 1, 2026 21:03 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 1, 2026 21:11 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 2, 2026 02:39 Inactive

jthomson04 requested a review from a team as a code owner April 5, 2026 02:23

jthomson04 requested a review from a team April 5, 2026 02:23

copy-pr-bot Bot temporarily deployed to GITLAB April 5, 2026 02:23 Inactive

jthomson04 and others added 11 commits April 16, 2026 21:28

comment

57f72cb

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

fix clippy

c27e576

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

remove in-process model support

8721cb6

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

more little fix

742348c

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

fmt+clippy

2790a51

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

additional nemo-rl support

4804094

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

fix tokenize endpoint with multiple instances of the model

5194cd1

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

more little fixes + remove debug log

e2a0e4c

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

fix build

d3d39f5

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 force-pushed the jthomson04/tokenize-endpoint branch from 4cad176 to d3d39f5 Compare April 17, 2026 05:16

copy-pr-bot Bot temporarily deployed to GITLAB April 17, 2026 05:16 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 17, 2026 05:17 Inactive

logs

6a7e8fb

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 02:53 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 02:58 Inactive

fixed prefix

d9c657c

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 21:39 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 20, 2026 22:35 Inactive

prefix substitution in /tokenize

07c4f28

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

copy-pr-bot Bot temporarily deployed to GITLAB April 21, 2026 02:42 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB April 21, 2026 02:43 Inactive

biswapanda mentioned this pull request May 7, 2026

feat(RL): RL integration layer to support Prime-RL #9131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Nemo-rl support#7699

feat: Nemo-rl support#7699
jthomson04 wants to merge 20 commits into
mainfrom
jthomson04/tokenize-endpoint

jthomson04 commented Mar 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking Apr 1, 2026

Uh oh!

jthomson04 Apr 1, 2026

Uh oh!

grahamking commented Apr 1, 2026

Uh oh!

jthomson04 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jthomson04 commented Mar 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

jthomson04 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

grahamking commented Apr 1, 2026

Uh oh!

jthomson04 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jthomson04 commented Mar 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading