Skip to content

fix: do not attempt to deserialize llm text response#233

Merged
nabinchha merged 3 commits into
mainfrom
nmulepati/fix/232-donot-attempt-to-deserialize-llm-text-response
Jan 20, 2026
Merged

fix: do not attempt to deserialize llm text response#233
nabinchha merged 3 commits into
mainfrom
nmulepati/fix/232-donot-attempt-to-deserialize-llm-text-response

Conversation

@nabinchha

Copy link
Copy Markdown
Contributor

closes: #232

@nabinchha nabinchha requested review from a team and eric-tramel January 20, 2026 22:12

@eric-tramel eric-tramel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, one doesn't need to do this in the recipe itself because the text recipe is a string pass-through anyway. However, you introduce the true JSON deserialization conditionally in the case of structured & judge because we require structured output in these cases.

In the ol' days we were forcibly deserializing columns as we processed, are we sure that serialized string columns stay serialized? E.g. perhaps this is fine for one column, but if the next one attempts to eagerly deserialize everything that can be deserialized, do we have any problems with it getting updated back into the dataset deserialized, or do we now have contract that the only way that entries can get mutated in the dataset is on their generation call? This wasn't the case previously, so I just want to be sure.

@nabinchha

Copy link
Copy Markdown
Contributor Author

IIUC, one doesn't need to do this in the recipe itself because the text recipe is a string pass-through anyway. However, you introduce the true JSON deserialization conditionally in the case of structured & judge because we require structured output in these cases.

In the ol' days we were forcibly deserializing columns as we processed, are we sure that serialized string columns stay serialized? E.g. perhaps this is fine for one column, but if the next one attempts to eagerly deserialize everything that can be deserialized, do we have any problems with it getting updated back into the dataset deserialized, or do we now have contract that the only way that entries can get mutated in the dataset is on their generation call? This wasn't the case previously, so I just want to be sure.

Oh yea good point. If the next column references this column, this line will try to deserialized the nested deserializable string in the dictionary container for the row. At that point the next column will overwrite the value for that initial column with a deserialized json object, right?

@nabinchha

Copy link
Copy Markdown
Contributor Author

IIUC, one doesn't need to do this in the recipe itself because the text recipe is a string pass-through anyway. However, you introduce the true JSON deserialization conditionally in the case of structured & judge because we require structured output in these cases.
In the ol' days we were forcibly deserializing columns as we processed, are we sure that serialized string columns stay serialized? E.g. perhaps this is fine for one column, but if the next one attempts to eagerly deserialize everything that can be deserialized, do we have any problems with it getting updated back into the dataset deserialized, or do we now have contract that the only way that entries can get mutated in the dataset is on their generation call? This wasn't the case previously, so I just want to be sure.

Oh yea good point. If the next column references this column, this line will try to deserialized the nested deserializable string in the dictionary container for the row. At that point the next column will overwrite the value for that initial column with a deserialized json object, right?

Actually, the next column doesn't have to reference this column at all. Just the fact that there's another generation column would make the above true.

@nabinchha

nabinchha commented Jan 20, 2026

Copy link
Copy Markdown
Contributor Author

IIUC, one doesn't need to do this in the recipe itself because the text recipe is a string pass-through anyway. However, you introduce the true JSON deserialization conditionally in the case of structured & judge because we require structured output in these cases.
In the ol' days we were forcibly deserializing columns as we processed, are we sure that serialized string columns stay serialized? E.g. perhaps this is fine for one column, but if the next one attempts to eagerly deserialize everything that can be deserialized, do we have any problems with it getting updated back into the dataset deserialized, or do we now have contract that the only way that entries can get mutated in the dataset is on their generation call? This wasn't the case previously, so I just want to be sure.

Oh yea good point. If the next column references this column, this line will try to deserialized the nested deserializable string in the dictionary container for the row. At that point the next column will overwrite the value for that initial column with a deserialized json object, right?

Actually, the next column doesn't have to reference this column at all. Just the fact that there's another generation column would make the above true.

Tracing though, while we do deserialize the dictionary with values from previous columns here we don't modify the original data container dictionary with the deserialized values. The deserialized values are only used for template rendering... so shouldn't have any side effect.

@nabinchha nabinchha requested a review from eric-tramel January 20, 2026 22:50
@johnnygreco

Copy link
Copy Markdown
Contributor

IIUC, one doesn't need to do this in the recipe itself because the text recipe is a string pass-through anyway. However, you introduce the true JSON deserialization conditionally in the case of structured & judge because we require structured output in these cases.
In the ol' days we were forcibly deserializing columns as we processed, are we sure that serialized string columns stay serialized? E.g. perhaps this is fine for one column, but if the next one attempts to eagerly deserialize everything that can be deserialized, do we have any problems with it getting updated back into the dataset deserialized, or do we now have contract that the only way that entries can get mutated in the dataset is on their generation call? This wasn't the case previously, so I just want to be sure.

Oh yea good point. If the next column references this column, this line will try to deserialized the nested deserializable string in the dictionary container for the row. At that point the next column will overwrite the value for that initial column with a deserialized json object, right?

Actually, the next column doesn't have to reference this column at all. Just the fact that there's another generation column would make the above true.

Tracing though, while we do deserialize the dictionary with values from previous columns here we don't modify the original data container dictionary with the deserialized values. The deserialized values are only used for template rendering... so shouldn't have any side effect.

Nice digging @nabinchha! Might be worth adding some comments (maybe in the generate method and in deserialize_json_values to make sure future versions of us (or Ralph Wiggum lol) understand that this is an important assumption

@nabinchha

Copy link
Copy Markdown
Contributor Author

Tracing though, while we do deserialize the dictionary with values from previous columns here we don't modify the original data container dictionary with the deserialized values. The deserialized values are only used for template rendering... so shouldn't have any side effect.

good call. Updated the docstring a bit + made sure data passed into deserialize_json_values is immutable in 61fe5bf

@nabinchha nabinchha merged commit 671dfd8 into main Jan 20, 2026
21 checks passed
@nabinchha nabinchha deleted the nmulepati/fix/232-donot-attempt-to-deserialize-llm-text-response branch January 20, 2026 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llm-text column generation may lead to pyarrow backend errors

3 participants