Skip to content

Apache Pinot provider.yaml references missing PinotHook class #33596

@alexbegg

Description

@alexbegg

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

When starting Airflow (the problem seems to be in both 2.6.3 and in 2.7.0, see "How to reproduce" below) I am getting the following warning:

{providers_manager.py:253} WARNING - Exception when importing 'airflow.providers.apache.pinot.hooks.pinot.PinotHook' from 'apache-airflow-providers-apache-pinot' package
Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/module_loading.py", line 39, in import_string
    return getattr(module, class_name)
AttributeError: module 'airflow.providers.apache.pinot.hooks.pinot' has no attribute 'PinotHook'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/providers_manager.py", line 285, in _sanity_check
    imported_class = import_string(class_name)
  File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/module_loading.py", line 41, in import_string
    raise ImportError(f'Module "{module_path}" does not define a "{class_name}" attribute/class')
ImportError: Module "airflow.providers.apache.pinot.hooks.pinot" does not define a "PinotHook" attribute/class

I looked into the issue and it appears the problem in the Apache Pinot provider. The airflow/providers/apache/pinot/provider.yaml (which is loaded by the _sanity_check in providers_manager) is referencing a PinotHook class that does not exist:

connection-types:
- hook-class-name: airflow.providers.apache.pinot.hooks.pinot.PinotHook
connection-type: pinot

The module airflow.providers.apache.pinot.hooks.pinot contains PinotAdminHook and PinotDbApiHook, but not PinotHook (and the classes have been separate since before the Apache classes were split into the Apache provider).

I am willing to fix this, but I am not sure which is a better fix:

  1. I could list both classes in connection-types of provider.yaml, but keep both as connection-type: pinot, but there will be two connection types with the same name (which may not be possible?):
    connection-types:
      - hook-class-name: airflow.providers.apache.pinot.hooks.pinot.PinotAdminHook
        connection-type: pinot
      - hook-class-name: airflow.providers.apache.pinot.hooks.pinot.PinotDbApiHook
        connection-type: pinot
    • Note: create_default_connections in airflow/utils/db.py is currently including both connection with the same conn_type="pinot":

      airflow/airflow/utils/db.py

      Lines 474 to 492 in 487b174

      merge_conn(
      Connection(
      conn_id="pinot_admin_default",
      conn_type="pinot",
      host="localhost",
      port=9000,
      ),
      session,
      )
      merge_conn(
      Connection(
      conn_id="pinot_broker_default",
      conn_type="pinot",
      host="localhost",
      port=9000,
      extra='{"endpoint": "/query", "schema": "http"}',
      ),
      session,
      )
  2. or we change one (or both) of the connection types to a different name. PinotAdminHook already uses a default connection name of pinot_admin_default, and PinotDbApiHook already uses a default connection name of pinot_broker_default, so it might make sense to name these connection types pinot_admin and pinot_broker:
    connection-types:
      - hook-class-name: airflow.providers.apache.pinot.hooks.pinot.PinotAdminHook
        connection-type: pinot_admin
      - hook-class-name: airflow.providers.apache.pinot.hooks.pinot.PinotDbApiHook
        connection-type: pinot_broker
    • I think we will need to change create_default_connections in airflow/utils/db.py (as shown above) if we end up changing the connection types. Possibly other places, but I have not seen any other references of conn_type="pinot" beside the default connections.

Thoughts on which approach is better / less disruptive to users?

What you think should happen instead

When starting Airflow the _sanity_check in providers_manager should not trigger a warning. Both connection types should be useable.

How to reproduce

I am seeing this every time I start Airflow with the Docker image bitnami/airflow:2.6.3, but I will also test using Breeze and other ways to see if I see the same warning in 2.7.0. But since the problem line of code is unchanged in the mainbranch I am sure this is still an issue.

Operating System

Debian GNU/Linux 11 (bullseye)

Versions of Apache Airflow Providers

apache-airflow-providers-apache-pinot==4.1.1

Deployment

Docker-Compose

Deployment details

Docker image bitnami/airflow:2.6.3 in Docker Compose (this is the latest Airflow version for Bitnami's image, they have not yet pushed up a 2.7.0 version)

Anything else

The issue #28790 is related as it mentions that both hooks of the Apache Pinot provider is missing conn_type.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions