Skip to content

Conversation

@jbbqqf
Copy link
Contributor

@jbbqqf jbbqqf commented Apr 15, 2023

Yet another operator.

I added that transfer operator by mimicry with the existing BigQueryToMySqlOperator.

I am aware this operator adds duplication with BigQueryToMySqlOperator and BigQueryToMsSqlOperator. The implementation is basically the same. On the other hand, I feel that factorizing that code would make it unreadable.

The BigQueryToMySqlOperator had a deprecated argument: delegate_to. I removed it from the Postgres adaptation.

I noticed the BigQueryToMySqlOperator is referenced in airflow/contrib/operators/__init__.py (__deprecated_classes) and in tests/always/test_project_structure.py (MISSING_EXAMPLES_FOR_CLASSES). I am not 100% sure of what those references mean but I guess the new BigQueryToPostgresOperator doesn't need to be referenced here.

@boring-cyborg boring-cyborg bot added area:providers kind:documentation provider:google Google (including GCP) related issues labels Apr 15, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 15, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually, we keep a transfer operator in Google provider if data move from some services to Google services like from Postgres to Bigquery or from s3 to Bigquery/GCS so not sure if Google provider is the right place for this transfer operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not catch something like dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the implementation. I am now using a BaseBigQueryToSqlOperator. In order to remain as backward compatible as possible, I used the old implementation tied to the MySQL operator.

If the user does not provide a correct input it will fail at runtime anyway (in the execute method).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why fmt off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially copy/pasted the MySQL implementation without thinking too much about it. After digging in the commit history it looks like those fmt directives are old artifacts that can be removed with no side effect, which I did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add a test to check bigquery_get_data has been called correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current assertion already ensures bigquery_get_data has been called as a side effect:

        mock_hook.return_value.list_rows.assert_called_once_with(

list_rows comes from bigquery_get_data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing docs string for this param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed those missing docstrings

@eladkal
Copy link
Contributor

eladkal commented Apr 15, 2023

I am aware this operator adds duplication with BigQueryToMySqlOperator and BigQueryToMsSqlOperator. The implementation is basically the same. On the other hand, I feel that factorizing that code would make it unreadable.

The BigQueryToMySqlOperator had a deprecated argument: delegate_to. I removed it from the Postgres adaptation.

We should not duplicate code.

We should have either BaseBigQueryToSqlOperator then each flavour DB can inherit and do the specific things or preferably have a single BigQueryToSqlOperator that knows to handle any db hook that uses DbApiHook.

As for the delegate_to. The next provider release is a breaking one. We can handle it now. It's simple as removing the warnings added in #29088 and remove the parameter all together @shahar1 maybe you are up for it?
but even if the parameter stays moving to generalized solution makes this a no concern.

@shahar1
Copy link
Contributor

shahar1 commented Apr 15, 2023

I am aware this operator adds duplication with BigQueryToMySqlOperator and BigQueryToMsSqlOperator. The implementation is basically the same. On the other hand, I feel that factorizing that code would make it unreadable.

The BigQueryToMySqlOperator had a deprecated argument: delegate_to. I removed it from the Postgres adaptation.

We should not duplicate code.

We should have either BaseBigQueryToSqlOperator then each flavour DB can inherit and do the specific things or preferably have a single BigQueryToSqlOperator that knows to handle any db hook that uses DbApiHook.

As for the delegate_to. The next provider release is a breaking one. We can handle it now. It's simple as removing the warnings added in #29088 and remove the parameter all together @shahar1 maybe you are up for it? but even if the parameter stays moving to generalized solution makes this a no concern.

I'm up to it, will happen during this week

@jbbqqf
Copy link
Contributor Author

jbbqqf commented Apr 16, 2023

usually, we keep a transfer operator in Google provider if data move from some services to Google services like from Postgres to Bigquery or from s3 to Bigquery/GCS so not sure if Google provider is the right place for this transfer operator.

I am new to AirFlow so I don't have an opinion on this. I added the module next to other similar operators (BigQueryToMySqlOperator and BigQueryToMsSqlOperator). Did you notice those operators?

@potiuk
Copy link
Member

potiuk commented Apr 16, 2023

usually, we keep a transfer operator in Google provider if data move from some services to Google services like from Postgres to Bigquery or from s3 to Bigquery/GCS so not sure if Google provider is the right place for this transfer operator.

I am new to AirFlow so I don't have an opinion on this. I added the module next to other similar operators (BigQueryToMySqlOperator and BigQueryToMsSqlOperator). Did you notice those operators?

Summarizing: yes the current location is good.

cc: @pankajastro : The rule is a bit more complex here and not only based on whether the provider is on the "To" or "From" side of the transfer. In most cases, yes, the "To" side is more important, but not for the reason you think.

The reason is that usually the "To" side (means - service) is more interested in receiving the data and is more lilely to be maintained by whoever puts the focus on the "to" provider. But it only makes sense if you have provider like "S3 <=> GCS" - where you have Amazon on one side and Google on the other. In this case Google is more insterested to maintain S3 To GCS and Amazon is more interested in GCS to S3 for obvious reasons. That's why the first one should be in Google provider and the other in the Amazon one - because respectively Google and Amazon will be interested in maintaining those.

For Bigquery to MySQL or Postgres , this is not as easy - because there is no "service provider" who would be interested in getting data to Postgres / MySQL, because those are generic databases, and while they have soime "stakeholders" those stakeholders do not own the service that they would be interested to get data in. And in this case Google is determined to be the "more likely to maintain it".

This is succintly described in our CONTRIBUTING.rst:

  • It is often debatable where to put transfer operators but we agreed to the following criteria:
    • We use "maintainability" of the operators as the main criteria - so the transfer operator
      should be kept at the provider which has highest "interest" in the transfer operator
    • For Cloud Providers or Service providers that usually means that the transfer operators
      should land at the "target" side of the transfer

@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch 2 times, most recently from 9458e98 to 6aeb2d1 Compare April 16, 2023 23:02
@jbbqqf
Copy link
Contributor Author

jbbqqf commented Apr 16, 2023

We should not duplicate code.

We should have either BaseBigQueryToSqlOperator then each flavour DB can inherit and do the specific things or preferably have a single BigQueryToSqlOperator that knows to handle any db hook that uses DbApiHook.

@eladkal ok, deal. I refactored the two twin operators with a BaseBigQueryToSqlOperator and used it to implement the new BigQueryToPostgresOperator. I had to make some tweaks though to remain backward compatible because of subtle differences.

I took the opportunity to factorize docstrings. That said, I'm not 100% sure that documenting only two parameters in BigQueryToPostgresOperator is the way to go, considering some mandatory parameters handled (and documented) in the parent class, BaseBigQueryToSqlOperator, are required. You tell me.

The generated documentation looks fine. Here is an example (I also checked the other impacted pages):
image

The only tests I have not been able to run with Breeze are the example dags in tests/system/providers/google/cloud/bigquery/. The command breeze testing tests tests/system/providers/google/cloud/bigquery/ doesn't seem to do the trick.

@jbbqqf jbbqqf requested a review from pankajastro April 16, 2023 23:19
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jbbqqf - just noticed we are getting the class name split across multiple lines in the documentation, is there anything we could do to fix it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phanikumv I don't want to fix this issue in this PR. The diff is not the cause of the issue from what I can see on Airflow's latest online documentation (https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_api/airflow/providers/google/cloud/transfers/bigquery_to_mysql/index.html).

That said, this is how I would have done it:

image

@pankajastro
Copy link
Member

usually, we keep a transfer operator in Google provider if data move from some services to Google services like from Postgres to Bigquery or from s3 to Bigquery/GCS so not sure if Google provider is the right place for this transfer operator.

I am new to AirFlow so I don't have an opinion on this. I added the module next to other similar operators (BigQueryToMySqlOperator and BigQueryToMsSqlOperator). Did you notice those operators?

Summarizing: yes the current location is good.

cc: @pankajastro : The rule is a bit more complex here and not only based on whether the provider is on the "To" or "From" side of the transfer. In most cases, yes, the "To" side is more important, but not for the reason you think.

The reason is that usually the "To" side (means - service) is more interested in receiving the data and is more lilely to be maintained by whoever puts the focus on the "to" provider. But it only makes sense if you have provider like "S3 <=> GCS" - where you have Amazon on one side and Google on the other. In this case Google is more insterested to maintain S3 To GCS and Amazon is more interested in GCS to S3 for obvious reasons. That's why the first one should be in Google provider and the other in the Amazon one - because respectively Google and Amazon will be interested in maintaining those.

For Bigquery to MySQL or Postgres , this is not as easy - because there is no "service provider" who would be interested in getting data to Postgres / MySQL, because those are generic databases, and while they have soime "stakeholders" those stakeholders do not own the service that they would be interested to get data in. And in this case Google is determined to be the "more likely to maintain it".

This is succintly described in our CONTRIBUTING.rst:

  • It is often debatable where to put transfer operators but we agreed to the following criteria:

    • We use "maintainability" of the operators as the main criteria - so the transfer operator
      should be kept at the provider which has highest "interest" in the transfer operator
    • For Cloud Providers or Service providers that usually means that the transfer operators
      should land at the "target" side of the transfer

thanks for the detailed explanation makes sense 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it makes sense to combine Postgres and MySQL system test here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you'd suggest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting it due to the similarities between the files?
While it may improve running time, I would rather keep the test files specific to their intended purpose. I believe that combining different SQL providers into a single file may cause confusion when trying to identify the source of an operator failure during debugging.
Separating them into distinct files should make the debugging process more manageable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left as is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Better to keep them separate. In this case usability of those examples trumps DRY

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting request changes just to make sure #30748 is merged before this PR so the delegate_to parameter won't be in BaseBigQueryToSqlOperator

@eladkal
Copy link
Contributor

eladkal commented Apr 21, 2023

@jbbqqf you can now rebase and the delegate_to won't be an issue as it was removed in main

@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch from 6aeb2d1 to c744f08 Compare April 23, 2023 09:29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed from main. This parameter doesn't exist any more

@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch 2 times, most recently from 6e711ea to eaecfb1 Compare April 23, 2023 17:44
@jbbqqf
Copy link
Contributor Author

jbbqqf commented Apr 23, 2023

I made everything I could with the feedback I've received on this PR.

The code is factorized so that the new PostgresOperator is not a copy/paste version of the Mysql one. The documentation is also as factorized as it can be.

I rebased and removed the delegate_to parameter. I eventually managed to make static checks + build pass thanks to breeze. All CI checks designed to make sure no doc / tests are forgotten do pass.

I built the documentation locally. It looks fine.

I tested operators with a local airflow / postgresql / mysql + test gcp project. Basic operations work.

By basic operation I mean I tested those tasks:

    bigquery_to_mysql = BigQueryToMySqlOperator(
        task_id="bigquery_to_mysql",
        dataset_table="dataset.table_42",
        mysql_table="{{ var.value.get('my_var', 'test') }}",
        replace=False,
    )

    bigquery_to_postgres = BigQueryToPostgresOperator(
        task_id="bigquery_to_postgres",
        dataset_table="dataset.table_42",
        postgres_table="{{ var.value.get('my_var', 'test') }}",
        replace=False,
    )

During my tests I noticed there was an error with template_fields. It should now be fixed.

I performed my local tests with a postgres and a mysql database. I didn't manually test the mssql operator.

Is there some additional tests I should perform that would cover common pitfalls?

@eladkal you seem to have some perspective on the airflow project. What is the next step? How many people should validate the PR so that it can be merged? 🙂

@potiuk
Copy link
Member

potiuk commented Apr 23, 2023

@eladkal you seem to have some perspective on the airflow project. What is the next step? How many people should validate the PR so that it can be merged? 🙂

To start - maybe get all the tests succeed :D (they are failing currently)?

@potiuk
Copy link
Member

potiuk commented Apr 23, 2023

And the second thing -> wait for a maintainer to merge it. And sometimes be patient with it @jbbqqf. Yes it takes sometimes a number of passes, multiple builds, conflict resolving and sometimes even few positive reviews to pass.

All that so that your change will be good, reviewed, tested, and being ready for anyone else to take over and develop further. Yes it can be annoying but if you consider that you might disappear tomorrow and there are a number of people here who will take over the maintenance of the code with confidence.

I hope you understand all that.

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks OK to me.
left one comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to deprecate this parameter and replace it with a unified parameter in the base class (target_table_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I changed that.

Comment on lines 35 to 36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd emphasize here that it's a base operator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 99 to 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring and remove the pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it is unnecessary because concrete implementations are self-explanatory.

I have nonetheless taken into account your remark by adding a docstring.

Copy link
Contributor

@shahar1 shahar1 Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited to explain myself better

I understand your opinion that it might seem redundant, however:

  1. Sometimes methods that look trivial might do something completely different than what you think they do - especially in an open source where many people with different code styles take part.
  2. Putting opinions aside, it's required by PEP 257, at least for public methods :)

Therefore, I suggest getting used to writing docstrings starting with self-explanatory methods, even as a preparation for more complicated functions. That being said, for simple methods - one line could do.

Copy link
Contributor Author

@jbbqqf jbbqqf May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public methods (including the init constructor) should also have docstrings.

The wording of PEP 257 implies it's optional.

In Clean Code, Robert Martin enumerates a collection of smells and heuristics including:

C3: Redundant Comment

A comment is redundant if it describes something that adequately describes itself. For example:
i++; // increment i

Another example is a Javadoc that says nothing more than (or even less than) the function signature.

Comments should say things that the code cannot say for itself.

As suggested, I put my opinion aside and described persist_links with """This function persists the connection to the SQL provider""".

Comment on lines 102 to 103
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is it trivial to reference all children implementation / calls to this function the best added value I could provide was to inform future readers what the history of this method is.

If you have another opinion, can you propose a docstring you think would fit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to overcomplicate :)
A one-liner describing what this function does would do better (e.g.,: This function persists the connection to the SQL provider). The history of this method in this context seems irrelevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting it due to the similarities between the files?
While it may improve running time, I would rather keep the test files specific to their intended purpose. I believe that combining different SQL providers into a single file may cause confusion when trying to identify the source of an operator failure during debugging.
Separating them into distinct files should make the debugging process more manageable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could pass source_project_dataset_table directly to dataset_table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you compare the old vs new version, it would introduce a slight difference of behavior between the two versions. It can lead to production issues.

I'd keep it this way because, unless I am mistaken, this version achieves 100% backward compatibility.

I agree that it doesn't make sense to have two distinct parameters for this use case. I searched in the google provider for helper functions to parse project.dataset.table vs dataset.table. I didn't find any.

If you insist on deprecating something here I'd vote to deprecate it in another PR because this one is already becoming quite complex to manage / review ✌️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now I realize that I missed the project definition in the dataset_table :)
It could wait for another time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🙏

@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch from eaecfb1 to 38fc173 Compare April 29, 2023 13:14
@jbbqqf
Copy link
Contributor Author

jbbqqf commented Apr 29, 2023

To start - maybe get all the tests succeed :D (they are failing currently)?

Touché 😉

@potiuk Actually when I posted my comment, CI was all green and I thought for a moment I had solved the issue. As I can see it takes time so that all pipelines finish.

I tried to run those tests with breeze but I can't manage to find the appropriate command that would specifically target the modified operators (running static checks with breeze is OK though). Can you suggest arguments I should pass to this tool to understand locally what's going on ? (CI failures don't seem to output verbose enough logs)

And the second thing -> wait for a maintainer to merge it. And sometimes be patient with it @jbbqqf. Yes it takes sometimes a number of passes, multiple builds, conflict resolving and sometimes even few positive reviews to pass.

All that so that your change will be good, reviewed, tested, and being ready for anyone else to take over and develop further. Yes it can be annoying but if you consider that you might disappear tomorrow and there are a number of people here who will take over the maintenance of the code with confidence.

I hope you understand all that.

Sure 👍

I read in Airflow's contributing guide that being pushy with maintainers was interpreted well, as a sign of interest. This was my intent. I will be patient as long as necessary.

I appreciate your efforts to demonstrate diplomacy.

Reference:

https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst

Ping @ #development slack, comment @people. Be annoying. Be considerate.

It's your responsibility as an author to ping committers to review your PR - be mildly annoying sometimes, it's OK to be slightly annoying with your change - it is also a sign for committers that you care

@jbbqqf jbbqqf requested a review from eladkal April 29, 2023 13:27
@jbbqqf jbbqqf requested a review from shahar1 April 29, 2023 13:27
@potiuk
Copy link
Member

potiuk commented Apr 29, 2023

@potiuk Actually when I posted my comment, CI was all green and I thought for a moment I had solved the issue. As I can see it takes time so that all pipelines finish.

Complain to GitHub - not to us, this is an isue I reported to them ~ 2 years ago and there is no fix in sight.

I tried to run those tests with breeze but I can't manage to find the appropriate command that would specifically target the modified operators (running static checks with breeze is OK though). Can you suggest arguments I should pass to this tool to understand locally what's going on ? (CI failures don't seem to output verbose enough logs)

All the test types of airflow, including the way how to distribute them is decribed in https://github.com/apache/airflow/blob/main/TESTING.rst

Generally speaking - seems that unit tests are failing for you and you should be able to follow "unit test" chapter there. If you follow the instructions there, it should help.

Also Breeze commands are described in https://github.com/apache/airflow/blob/main/BREEZE.rst - including help and screen outputs. You can also run --help and --dry-run with every breeze command. The first will tell you what options you have, the second will tell you exactly what command breeze runs under the hood and it will print copy-pasteable command that you can run and modify to replicate what you do. in CI you can see what is being run by unfolding the steps you see and you will see what is going on and which command is run.

I read in Airflow's contributing guide that being pushy with maintainers was interpreted well, as a sign of interest. This was my intent. I will be patient as long as necessary.

It's your responsibility as an author to ping committers to review your PR - be mildly annoying sometimes, it's OK to be slightly annoying with your change - it is also a sign for committers that you care

There is a word "considerate". Have you considered that people responding here have likley even 100s of PRs to look at, while you have one to contribute? and that they are often doing it in their free time and people are demanding attention becasue they think they have to get response NOW? If not - you can take a look at my talk about empathy:

https://airflowsummit.org/sessions/2022/hey-maintainer-exercise-your-empathy/

Mildly annoying does not mean unempathettic.

@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch from 38fc173 to c459c42 Compare May 1, 2023 09:56
Comment on lines 62 to 70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use AirflowProviderDeprecationWarning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not do target_table_name or mssql_table
it makes it harder to understand where the "real" value is.
we should handle the value as part of the deprecation check. If user passed mssql_table then we also assign it to self. target_table_name

Copy link
Contributor Author

@jbbqqf jbbqqf May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand what you have in mind. Something like this?

    selected_target_table_name = target_table_name

    if mssql_table is not None:
            selected_target_table_name = mssql_table
            warnings.warn(
                "The `mssql_table` parameter has been deprecated. Use `target_table_name` instead.",
                DeprecationWarning,
            )

        super().__init__(
            target_table_name=selected_target_table_name,
            dataset_table=f"{dataset_id}.{table_id}",
            **kwargs,
        )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

if channel:
warnings.warn(
"Argument `channel` is deprecated and will removed in a future releases. "
"Please use `channels` instead.",
DeprecationWarning,
stacklevel=2,
)
if channels:
raise ValueError(f"Cannot set both arguments: channel={channel!r} and channels={channels!r}.")
channels = channel
self.channels = channels

basically once you pass that block

New parameter takes precedence but if older parameter is supplied we will assign it to the new parameter + raise warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is missing from the added docs. is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inspired by the implementation of BaseSQLToGCSOperator. This is also an abstract class which is not documented either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking about the BigQueryToMsSqlOperator not on the base class :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is intentional. There was a slight difference between the MySQL documentation and the MsSQL documentation. I standardized the location of the example snippet. I used the MsSQL version. It did not require to change the MsSQL version, but this is why there is a change on the MySQL version.

As for the new interface (s/mssql_table/target_table_name/), it has been changed directly in example_bigquery_to_mssql.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asking about the rst files.

docs/apache-airflow-providers-google/operators/transfer/bigquery_to_mssql.rst
Is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbbqqf can you add docs/apache-airflow-providers-google/operators/transfer/bigquery_to_mssql.rst
?

@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch 3 times, most recently from b50b591 to d483676 Compare May 4, 2023 19:28
@jbbqqf jbbqqf requested a review from eladkal May 4, 2023 19:30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Copying data from one BigQuery table to another is performed with the
Copying data from BigQuery table to Postgres table is performed with the

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small NIT: in the docs, otherwise it looks good to me - @eladkal ?

@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch from d483676 to 4948e8a Compare May 8, 2023 19:23
@jbbqqf jbbqqf force-pushed the add-bigquery-to-postgres-operator branch from 4948e8a to 0bed8a8 Compare May 14, 2023 08:34
@eladkal
Copy link
Contributor

eladkal commented May 15, 2023

@jbbqqf can you add docs/apache-airflow-providers-google/operators/transfer/bigquery_to_mssql.rst

@jbbqqf
Copy link
Contributor Author

jbbqqf commented May 16, 2023

@jbbqqf can you add docs/apache-airflow-providers-google/operators/transfer/bigquery_to_mssql.rst

@eladkal I don't understand what you mean. This file already exist. I let it as is.

@eladkal
Copy link
Contributor

eladkal commented May 16, 2023

@eladkal I don't understand what you mean. This file already exist. I let it as is.

Right. sorry!
Note to self: not review code when it's late!

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

will merge when CI is green

@eladkal eladkal merged commit cf1e26b into apache:main May 16, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented May 16, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@jbbqqf
Copy link
Contributor Author

jbbqqf commented May 16, 2023

So glad to officially become an Airflow contributor ❤️

My thank to you guys. Special mention to @eladkal @potiuk @shahar1

@potiuk
Copy link
Member

potiuk commented May 16, 2023

looking for next ones :)

@shahar1
Copy link
Contributor

shahar1 commented May 16, 2023

So glad to officially become an Airflow contributor ❤️

My thank to you guys. Special mention to @eladkal @potiuk @shahar1

Welcome to the club, well done :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers kind:documentation provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants