Skip to content

Fix: pin Pydantic#2559

Merged
paulteehan merged 4 commits intomainfrom
platl-324-fix-pydantic-issue
Feb 5, 2026
Merged

Fix: pin Pydantic#2559
paulteehan merged 4 commits intomainfrom
platl-324-fix-pydantic-issue

Conversation

@paulteehan
Copy link
Contributor

@paulteehan paulteehan commented Feb 2, 2026

Description

Pins pydnatic to 2.12 or higher. Solves errors when using Soda in Databricks runtime. No expected downstream impact.

Checklist

@paulteehan paulteehan marked this pull request as ready for review February 2, 2026 17:58
@mivds
Copy link
Contributor

mivds commented Feb 3, 2026

Do we know what was the actual problem here? Is there a specific feature and/or bug fix in pydantic 2.12 that we are using? I had a quick look at the release notes, but nothing stood out to me as particularly relevant for us

@paulteehan
Copy link
Contributor Author

Do we know what was the actual problem here? Is there a specific feature and/or bug fix in pydantic 2.12 that we are using? I had a quick look at the release notes, but nothing stood out to me as particularly relevant for us

That's a good question... honestly I haven't had time to investigate deeper

I know that it was broken with pydantic==2.10.6 and pydantic_core==2.27.2 and it works with pydantic==2.12.5 and pydantic_core==2.41.5.

With the old version it was throwing these errors:

ValidationError: 2 validation errors for SparkDataFrameDataSource connection Field required [type=missing, input_value={'name': 'ias_bug_ds', 'c...schema_': 'ias_bug_ds'}}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/missing connection_properties Extra inputs are not permitted [type=extra_forbidden, input_value={'spark_session': <pyspar...'schema_': 'ias_bug_ds'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden

I don't really have a hypothesis as to why this was happening and I think I should probably not spend more time on this

@paulteehan paulteehan requested review from Niels-b, m1n0 and mivds February 4, 2026 02:14
Copy link
Contributor

@Niels-b Niels-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very strange things happening with the pydantic versions on Databricks 🤔 .

Ok for me to pin this version if that fixes everything 👌

@mivds
Copy link
Contributor

mivds commented Feb 4, 2026

Took a look at what caused this. We're using a feature that was introduced in pydantic 2.11, so naturally it breaks with older versions.

To be more specific, we configure DataSourceBase as follows:

class DataSourceBase(BaseModel, abc.ABC):
    model_config = ConfigDict(
        frozen=True,
        extra="forbid",
        validate_by_name=True,  # Allow to use both field names and aliases when populating from dict
    )
    connection_properties: DataSourceConnectionProperties = Field(
        ..., alias="connection", description="Data source connection details"
    )

We define connection_properties and alias it as connection. The validate_by_name flag (new in pydantic 2.11) allows initializing the field using the non-aliased key, i.e. connection_properties. Versions before that ignore this flag and don't recognize the field. Add in the fact we have extra="forbid", and it will explicitly error out on receiving connection_properties as a field.

image

That is exactly what pydantic reports in the output @paulteehan shared:

ValidationError: 2 validation errors for SparkDataFrameDataSource connection Field required [type=missing, input_value={'name': 'ias_bug_ds', 'c...schema_': 'ias_bug_ds'}}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/missing connection_properties Extra inputs are not permitted [type=extra_forbidden, input_value={'spark_session': <pyspar...'schema_': 'ias_bug_ds'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden

Formatted for readability:

ValidationError: 2 validation errors for SparkDataFrameDataSource
* connection
  * Field required [type=missing, input_value={'name': 'ias_bug_ds', 'c...schema_': 'ias_bug_ds'}}, input_type=dict]
  * For further information visit https://errors.pydantic.dev/2.8/v/missing
* connection_properties
  * Extra inputs are not permitted [type=extra_forbidden, input_value={'spark_session': <pyspar...'schema_': 'ias_bug_ds'}, input_type=dict]
  * For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden

i.e. connection_properties was provided when connection is required.

In summary: this isn't a bug in the databricks runtimes nor pydantic. It's just us messing up our versioning.

@mivds
Copy link
Contributor

mivds commented Feb 4, 2026

Updated to pydantic>=2.11, as that is the lowest version we actually need

@paulteehan
Copy link
Contributor Author

Nicely done @mivds !!! Really appreciate the deep dive, it's great to have an actual explanation for this

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2026

@paulteehan paulteehan merged commit 13d41f3 into main Feb 5, 2026
41 checks passed
@paulteehan paulteehan deleted the platl-324-fix-pydantic-issue branch February 5, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants