Skip to content

Add default converter for timestamp with timezone#554

Merged
laughingman7743 merged 4 commits intopyathena-dev:masterfrom
gontzalm:master
Aug 6, 2024
Merged

Add default converter for timestamp with timezone#554
laughingman7743 merged 4 commits intopyathena-dev:masterfrom
gontzalm:master

Conversation

@gontzalm
Copy link

Description

Add the

  • timestamp with time zone -> _to_datetime_with_tz

key value pair to the _DEFAULT_CONVERTERS dictionary.

Context

When using the feature #360 from the dbt-athena connector with partitioned columns of timestamp with time zone type, I am getting the error

TYPE_MISMATCH: line 41:31: Cannot apply operator: timestamp(6) with time zone = varchar(30).

This is because pyathena does not convert the timestamp with time zone type (it leaves it as str) and dbt-athena compiles the SQL where clause as

where date_trunc('day',  ts) = '2008-03-01 00:00:00.000000 UTC'
    or date_trunc('day',  ts) = '2008-03-02 00:00:00.000000 UTC'
    or date_trunc('day',  ts) = '2008-03-03 00:00:00.000000 UTC'
    or ...

With this PR, the dbt-athena connector works as expected:

[0m14:52:22.461301 [info ] [MainThread]: Running with dbt=1.8.3
[0m14:52:22.929296 [info ] [MainThread]: Registered adapter: athena=1.8.2
[0m14:52:27.645255 [info ] [MainThread]: Concurrency: 8 threads (target='dev')
[0m14:52:27.659016 [info ] [Thread-1 (]: 1 of 1 START sql incremental model dbt_gontzal_finance.fct_sales ............... [RUN]
[0m14:52:47.157724 [debug] [Thread-1 (]: dbt.adapters.athena.constants adapter: CAUGHT EXCEPTION: ICEBERG_TOO_MANY_OPEN_PARTITIONS: Exceeded limit of 100 open writers for partitions. You may need to manually clean the data at location 's3://tendam-temporary/dbt-dev/tables/a3f72242-44af-46fb-bc03-427ddf1207b4' before retrying. Athena will not delete data in your account.
[0m14:52:47.158524 [debug] [Thread-1 (]: COMPILED CODE RESULT: TOO_MANY_OPEN_PARTITIONS
[0m14:52:51.380641 [debug] [Thread-1 (]: CREATE NON-PARTIONED STAGING TABLE: "awsdatacatalog"."dbt_gontzal_finance"."fct_sales__tmp_not_partitioned"
[0m14:53:20.604253 [debug] [Thread-1 (]: SQL status: OK 66952030 in 29.0 seconds
[0m14:53:20.631700 [debug] [Thread-1 (]: PARTITIONED KEYS: date_trunc('day', ts)
[0m14:53:24.235316 [debug] [Thread-1 (]: SQL status: OK -1 in 4.0 seconds
[0m14:53:26.113469 [debug] [Thread-1 (]: TOTAL PARTITIONS TO PROCESS: 5997
[0m14:53:26.117918 [debug] [Thread-1 (]: BATCHES TO PROCESS: 60
[0m14:53:26.846422 [debug] [Thread-1 (]: BATCH PROCESSING: 1 OF 60
[0m14:53:27.198886 [debug] [Thread-1 (]: On model.tendam.fct_sales: -- /* {"app": "dbt", "dbt_version": "1.8.3", "profile_name": "tendam", "target_name": "dev", "node_id": "model.tendam.fct_sales"} */
      	...
              where date_trunc('day', ts)=TIMESTAMP'2008-03-01 00:00:00+00:00' or date_trunc('day', ts)=TIMESTAMP'2008-03-02 00:00:00+00:00'
      	...
[0m14:53:37.097085 [debug] [Thread-1 (]: SQL status: OK 627918 in 10.0 seconds
[0m14:53:37.099683 [debug] [Thread-1 (]: BATCH PROCESSING: 2 OF 60

@laughingman7743
Copy link
Member

https://github.com/laughingman7743/PyAthena/blob/master/tests/pyathena/test_cursor.py#L410-L506
We will need to add tests for SELECTing timestamp with time zone types.

@laughingman7743
Copy link
Member

Add python-dateutil as a dependency.

[project]
name = "PyAthena"
description = "Python DB API 2.0 (PEP 249) client for Amazon Athena"
authors = [
    {name = "laughingman7743", email = "laughingman7743@gmail.com"},
]
dependencies = [
    "boto3>=1.26.4",
    "botocore>=1.29.4",
    "tenacity>=4.1.0",
    "fsspec",
+    "python-dateutil"
]

Add types-python-date to the development environment dependencies.

[tool.hatch.envs.default]
python = "3.11"
type = "pip-compile"
lock-filename = "requirements/requirements.txt"
pip-compile-verbose = true
pip-compile-hashes = true
pip-compile-install-args = [
    "--no-deps"
]
dependencies = [
    "wheel",
    "twine",
    "sqlalchemy>=1.0.0",
    "pandas>=1.3.0",
    "numpy>=1.26.0;python_version>=\"3.9\"",
    "numpy>=1.24.0,<1.26.0;python_version<\"3.9\"",
    "pyarrow>=7.0.0",
    "fastparquet>=0.4.0",
    "Jinja2>=3.1.0",
    "mypy>=0.900",
    "pytest>=3.5",
    "pytest-cov",
    "pytest-xdist",
    "pytest-dependency",
    "ruff>=0.1.13",
    "hatch-pip-compile",
    "sphinx",
+    "types-python-dateutil"
]

The following command will update the files under the requirements directory.

$ hatch prune
$ hatch create
$ hatch create test

Commit the changes.

@laughingman7743
Copy link
Member

laughingman7743 commented Aug 4, 2024

I think the change looks good. Please update the code format and dependencies.

$ make test                                                                                                                                                                                                                                                                                                                  2.7m  Sun Aug  4 17:26:52 2024
h
hatch run chk
cmd [1] | ruff check .
cmd [2] | ruff format --check .
73 files already formatted
cmd [3] | mypy .
Success: no issues found in 41 source files
hatch run test
================================================================================================================================================================================================ test session starts =================================================================================================================================================================================================
platform darwin -- Python 3.11.9, pytest-8.0.0, pluggy-1.4.0
rootdir: /Users/foobar/github/PyAthena
configfile: pyproject.toml
plugins: cov-4.1.0, dependency-0.6.0, xdist-3.5.0, anyio-4.2.0
8 workers [574 items]   
.............................................................................................................................................................................................................................................................................................................................................................................................................. [ 69%]
................................................................................................................................................s...............................                                                                                                                                                                                                                               [100%]
================================================================================================================================================================================================== warnings summary ==================================================================================================================================================================================================
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex[pandas_cursor0-None]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex[pandas_cursor1-1000]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex[pandas_cursor2-1000000]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex_as_pandas[pandas_cursor0-None]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex_as_pandas[pandas_cursor1-1000]
tests/pyathena/pandas/test_cursor.py::TestPandasCursor::test_complex_as_pandas[pandas_cursor2-1000000]
  /Users/foobar/github/PyAthena/pyathena/pandas/result_set.py:214: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[datetime.time(0, 0)]' has dtype incompatible with datetime64[ns], please explicitly cast to a compatible dtype first.
    df.loc[:, times] = df.loc[:, times].apply(lambda r: r.dt.time)

tests/pyathena/sqlalchemy/test_base.py: 65 warnings
  /Users/foobar/github/PyAthena/tests/pyathena/conftest.py:96: SADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi().  Implement an import_dbapi() classmethod directly on class <class 'pyathena.sqlalchemy.rest.AthenaRestDialect'> to remove this warning; the old .dbapi() classmethod may be maintained for backwards compatibility.
    return sqlalchemy.engine.create_engine(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform darwin, python 3.11.9-final-0 ----------
Name                                  Stmts   Miss  Cover
---------------------------------------------------------
pyathena/__init__.py                     43      9    79%
pyathena/arrow/__init__.py                0      0   100%
pyathena/arrow/async_cursor.py           42      0   100%
pyathena/arrow/converter.py              32      2    94%
pyathena/arrow/cursor.py                 90      1    99%
pyathena/arrow/result_set.py            148     14    91%
pyathena/arrow/util.py                   45      3    93%
pyathena/async_cursor.py                 54      1    98%
pyathena/common.py                      291     47    84%
pyathena/connection.py                  127     32    75%
pyathena/converter.py                    87     10    89%
pyathena/cursor.py                       72      1    99%
pyathena/error.py                        21      0   100%
pyathena/fastparquet/__init__.py          0      0   100%
pyathena/fastparquet/util.py             44      3    93%
pyathena/filesystem/__init__.py           0      0   100%
pyathena/filesystem/s3.py               581     84    86%
pyathena/filesystem/s3_object.py        315     10    97%
pyathena/formatter.py                    99      5    95%
pyathena/model.py                       490     13    97%
pyathena/pandas/__init__.py               3      0   100%
pyathena/pandas/async_cursor.py          44      0   100%
pyathena/pandas/converter.py             23      0   100%
pyathena/pandas/cursor.py                97      1    99%
pyathena/pandas/result_set.py           230     26    89%
pyathena/pandas/util.py                 160      6    96%
pyathena/result_set.py                  523     97    81%
pyathena/spark/__init__.py                0      0   100%
pyathena/spark/async_cursor.py           34      5    85%
pyathena/spark/common.py                188     48    74%
pyathena/spark/cursor.py                 32      2    94%
pyathena/sqlalchemy/__init__.py           0      0   100%
pyathena/sqlalchemy/arrow.py             15     15     0%
pyathena/sqlalchemy/base.py             526     71    87%
pyathena/sqlalchemy/pandas.py            19     19     0%
pyathena/sqlalchemy/requirements.py      98     98     0%
pyathena/sqlalchemy/rest.py               4      0   100%
pyathena/sqlalchemy/types.py             31      9    71%
pyathena/sqlalchemy/util.py               3      1    67%
pyathena/util.py                         31      1    97%
---------------------------------------------------------
TOTAL                                  4642    634    86%
Coverage HTML written to dir htmlcov

============================================================================================================================================================================== 573 passed, 1 skipped, 71 warnings in 3771.20s (1:02:51) ==============================================================================================================================================================================

Copy link
Member

@laughingman7743 laughingman7743 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants