Skip to content

[Bug] [hdfs-connector] Timestamp type column can't read #1334

@liumengkai

Description

@liumengkai

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

master分支

orc filetype
Using hdfs-connector to write timestamp type cannot be read。
使用hdfs-connector写timestamp类型不能读取

error message:

Caused by: java.lang.RuntimeException: ORC split generation failed with exception: org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1851)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:425)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:395)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:314)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)
	... 16 more
Caused by: java.util.concurrent.ExecutionException: org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1845)
	... 21 more
Caused by: org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)
	at org.apache.orc.impl.SchemaEvolution.buildConversion(SchemaEvolution.java:559)
	at org.apache.orc.impl.SchemaEvolution.buildConversion(SchemaEvolution.java:528)
	at org.apache.orc.impl.SchemaEvolution.<init>(SchemaEvolution.java:123)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1669)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1533)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2700(OrcInputFormat.java:1329)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1513)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1510)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1510)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1329)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

hive table struct:

CREATE TABLE `1020_test`(
  `create_time` timestamp)
stored as orc

chunjun json:

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1,
        "bytes": 0
      },
      "errorLimit": {
        "record": 0
      },
      "restore": {
        "isStream": false,
        "isRestore": false,
        "restoreColumnName": "",
        "restoreColumnIndex": 0,
        "maxRowNumForCheckpoint": 0
      },
      "log": {
        "isLogger": false,
        "level": "info",
        "path": "",
        "pattern": ""
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "${mysql_username}",
            "password": "${mysql_password}",
            "connection": [
              {
                "jdbcUrl": [
                  "jdbc:mysql://${mysql_server}:3306/test?useUnicode=true&characterEncoding=utf-8&useSSL=false"
                ],
                "table": [
                  "dict"
                ]
              }
            ],
            "column": [
              {
                "name": "create_time",
                "type": "timestamp"
              }
            ],
            "customSql": "",
            "where": "",
            "queryTimeOut": 1000,
            "requestAccumulatorInterval": 2,
            "startLocation": "0",
            "polling": false,
            "pollingInterval": 3000
          }
        },
        "writer": {
          "name": "hdfswriter",
          "parameter": {
            "path": "/user/hive/warehouse/1020_test",
            "column": [
              {
                "name": "create_time",
                "type": "timestamp"
              }
            ],
            "writeMode": "overwrite",
            "fileType": "orc",
            "encoding": "utf-8",
            "fieldDelimiter": "\u0001",
            "defaultFS": "hdfs://127.0.0.1:9000"
          }
        }
      }
    ]
  }
}

What you expected to happen

org.apache.orc.impl.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type struct<nanos:int> (1) to reader type timestamp (1)

It looks like the timestamp type in java cannot be directly written to the timestamp type in orc, But I did some research and didn't find a more appropriate type.

How to reproduce

as above

Anything else

I'm willing to make a PR,but before I try,we can have a little discuss.

Version

master

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions