Sequence prob computation by CharlesMoslonka · Pull Request #223 · artefactory/artefactual

CharlesMoslonka · 2026-02-23T16:52:00Z

Big PR for the sequence probability computation.
Tagged for release.

…act class that handles logprobs.

BREAKING CHANGE: Since we introduce other types of parsing for the output, we change the name for it to be clearer.

We add a new parser (just for OpenAI objects, for now) for the sampled tokens probabilities, a new abstract class, and a new scorer that returns the probability of the genereted tokens and of the whole sequence.

gemini-code-assist

Code Review

The pull request introduces sequence probability computation and refactors the uncertainty detector base classes. While the overall structure is good, there are significant inconsistencies between the vLLM and OpenAI implementations for sampled token logprobs. Specifically, the OpenAI functions only process the first output/choice and return a 1D array, whereas the vLLM function returns a 2D array (list of sequences). This inconsistency will cause the SentenceProbabilityScorer to fail or produce incorrect results when used with OpenAI outputs. Additionally, there are some type hint mismatches and potential IndexError risks in the OpenAI parsing logic.

New parser, along with slight refactorization of the OpenAI parser

CharlesMoslonka · 2026-02-23T17:34:56Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant refactoring to better distinguish between parsing log probabilities for entropy-based methods and for sequence probability calculations. It renames several functions for clarity (e.g., parse_model_outputs to parse_top_logprobs) and adds new functionality to compute sequence probabilities from sampled token logprobs. The class hierarchy for uncertainty detectors has been refined, and the test suite has been reorganized and expanded accordingly. Overall, these changes improve the structure and clarity of the codebase. I've provided a few suggestions to improve type hints and docstrings for better accuracy and developer experience.

(instead of returning np.exp(0) = 1 silently)

Solves the multiple length sequences problem.

chicham

Small fixes before merge. Test coverage is handled by PR #224.

chicham · 2026-03-06T16:55:17Z

+
+    def compute_token_scores(self, inputs: list[NDArray[np.floating]]) -> list[NDArray[np.floating]]:
+        """
+        Compute sentence-level probability scores by summing token log probabilities.


This docstring is copy-pasted from compute. It says "sentence-level" and "summing" but this method exponentiates each token individually via np.exp(seq). It computes token-level probabilities, not sentence-level.

chicham · 2026-03-06T17:18:57Z

Test coverage: This PR does not include unit tests for the new parser functions (vllm_sampled_tokens_logprobs, sampled_tokens_logprobs_responses_api, sampled_tokens_logprobs_chat_completion_api). These are covered by PR #224 which is now ready for review.

@CharlesMoslonka can you merge #224 after this one?

CharlesMoslonka added 5 commits February 9, 2026 17:33

feat: make UncertaintyDetector a true ABC, and add a specific abstr…

0caeff9

…act class that handles logprobs.

refacto!: rename parse_model_outputs into parse_top_logprobs

8c3657f

BREAKING CHANGE: Since we introduce other types of parsing for the output, we change the name for it to be clearer.

feat: add a sentence probability scorer

54f51ec

We add a new parser (just for OpenAI objects, for now) for the sampled tokens probabilities, a new abstract class, and a new scorer that returns the probability of the genereted tokens and of the whole sequence.

refacto: Change vllm parser method name

78800de

style: docs of vllm parser

c6f8b52

This comment was marked as resolved.

Sign in to view

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

CharlesMoslonka marked this pull request as draft February 23, 2026 16:56

CharlesMoslonka added 3 commits February 23, 2026 18:05

feat: vllm sampled token parser

893b121

New parser, along with slight refactorization of the OpenAI parser

chore: update names in calibration script

b04817d

feat: unit tests and refactorisation for list of arrays

c9d0106

CharlesMoslonka marked this pull request as ready for review February 23, 2026 17:32

test: big refactorization of tests

e931005

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

Comment thread src/artefactual/preprocessing/parser.py Outdated

Comment thread src/artefactual/preprocessing/openai_parser.py Outdated

Comment thread src/artefactual/preprocessing/openai_parser.py Outdated

Comment thread src/artefactual/preprocessing/vllm_parser.py Outdated

CharlesMoslonka added 4 commits February 23, 2026 18:54

fix: wrong bump-my-version syntax for tool

be8bbd2

Bump version: 2026.01.0 → 2026.02.0

0f2ae53

fix: align OpenAI parser to the vllm format for score computation

0d92672

fix: Typechecking in parser functions

8ea6fd2

CharlesMoslonka requested a review from chicham February 24, 2026 09:31

style: type declaration

8a44e22