Skip to content

Conversation

@blcksrx
Copy link
Contributor

@blcksrx blcksrx commented Oct 28, 2023

closes: #35012


Enhancement on this PR:

  • Support all the other connection parameters that exists on the pydruid.connect in the get_conn method.
  • Connections documentation
  • Fix return values

Note:
The issue mentions about get_pandas_df method also. Since this is a DBAPIHook and the connection uses PEP8 DB API, there should be no problem to not override it from the DBAPIHook.
In addition, back in 2019 I wrote a test function for this specific method:

def test_get_pandas_df(self):
statement = "SQL"
column = "col"
result_sets = [("row1",), ("row2",)]
self.cur.description = [(column,)]
self.cur.fetchall.return_value = result_sets
df = self.db_hook().get_pandas_df(statement)
assert column == df.columns[0]
for i, item in enumerate(result_sets):
assert item[0] == df.values.tolist()[i][0]
assert self.conn.close.call_count == 1
assert self.cur.close.call_count == 1
self.cur.execute.assert_called_once_with(statement)

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Comment on lines +21 to +26
Apache Airflow implemented connection to Apache Druid in different ways that listed here:
* Connection that uses the druid ingestion type that makes it possible to submit index/ingestion jobs.
To use the ingestion job, choose ``Druid Ingest Connection``
* Connection that uses to Query data using SQL.
To Query data use ``Druid Broker Connection``

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks kind of hard to read in additional to repeatable bullet lists

image

If the difference mentioned into the connections, maybe better remove section and keep only toctree? WDYT @eladkal @blcksrx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the page would be empty, I believe it it good to be for the context and reduce the risk of confusion

default_conn_name = "druid_broker_default"
conn_type = "druid"
hook_name = "Druid"
conn_type = "druid_broker"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this might be a breaking changes. Because if you rename conn_type then previously created connection IDs will keep conn_type=druid as result Airflow UI have no idea how to deal with this connection type

image

And fallback to the first available Connection type into the list

image

So we need to write down information into the change log how to resolve this situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. How about writing a alembic migration for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have mechanism to run alembic when upgrading provider version.

I think noting it the changelog and making it a major release with breaking change should be enough.
cc @potiuk WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or adding backwards compatibility to handle both cases - I think it should be possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem that DruidDbApiHook use DbApiHook and this might be the single place outside of the UI where conn_type also matter. And it could be hard, but not impossible to make it by backward compatible. Better just keep conn_type="druid" since it required almost zero effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the easiest way would be switch to druid. how about also adding a warning that the name would be change in the next major release?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide on deprecating first or on cutting major release immediately.
We always prefer to deprecate first if we can.

Connection(
conn_id="druid_broker_default",
conn_type="druid",
conn_type="druid_broker",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunetly this not affect already existed connection ID

Connection(
conn_id="druid_ingest_default",
conn_type="druid",
conn_type="druid_ingest",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunetly this not affect already existed connection ID

@eladkal
Copy link
Contributor

eladkal commented Jan 7, 2024

@blcksrx are you planning completing this PR?

@blcksrx
Copy link
Contributor Author

blcksrx commented Jan 7, 2024 via email

@eladkal
Copy link
Contributor

eladkal commented Jan 7, 2024

Actually i asked a question but didn’t got any answers

Which question?

@blcksrx
Copy link
Contributor Author

blcksrx commented Jan 7, 2024 via email

@eladkal
Copy link
Contributor

eladkal commented Jan 7, 2024

About warning or decision about deprecation on the Major release

I answered about it in #35244 (comment)
Is there a followup question?

@blcksrx
Copy link
Contributor Author

blcksrx commented Jan 7, 2024 via email

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Feb 22, 2024
@github-actions github-actions bot closed this Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers kind:documentation provider:apache-druid stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new functions to DruidDbApiHook

4 participants