Conversation
…act class that handles logprobs.
BREAKING CHANGE: Since we introduce other types of parsing for the output, we change the name for it to be clearer.
We add a new parser (just for OpenAI objects, for now) for the sampled tokens probabilities, a new abstract class, and a new scorer that returns the probability of the genereted tokens and of the whole sequence.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Code Review
The pull request introduces sequence probability computation and refactors the uncertainty detector base classes. While the overall structure is good, there are significant inconsistencies between the vLLM and OpenAI implementations for sampled token logprobs. Specifically, the OpenAI functions only process the first output/choice and return a 1D array, whereas the vLLM function returns a 2D array (list of sequences). This inconsistency will cause the SentenceProbabilityScorer to fail or produce incorrect results when used with OpenAI outputs. Additionally, there are some type hint mismatches and potential IndexError risks in the OpenAI parsing logic.
New parser, along with slight refactorization of the OpenAI parser
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant refactoring to better distinguish between parsing log probabilities for entropy-based methods and for sequence probability calculations. It renames several functions for clarity (e.g., parse_model_outputs to parse_top_logprobs) and adds new functionality to compute sequence probabilities from sampled token logprobs. The class hierarchy for uncertainty detectors has been refined, and the test suite has been reorganized and expanded accordingly. Overall, these changes improve the structure and clarity of the codebase. I've provided a few suggestions to improve type hints and docstrings for better accuracy and developer experience.
(instead of returning np.exp(0) = 1 silently)
|
|
||
| def compute_token_scores(self, inputs: list[NDArray[np.floating]]) -> list[NDArray[np.floating]]: | ||
| """ | ||
| Compute sentence-level probability scores by summing token log probabilities. |
There was a problem hiding this comment.
This docstring is copy-pasted from compute. It says "sentence-level" and "summing" but this method exponentiates each token individually via np.exp(seq). It computes token-level probabilities, not sentence-level.
|
Test coverage: This PR does not include unit tests for the new parser functions ( @CharlesMoslonka can you merge #224 after this one? |
Big PR for the sequence probability computation.
Tagged for release.