Skip to content

Bug in JavaDateFormatter when using DOY #89096

@HayDegha0917

Description

@HayDegha0917

Elasticsearch Version

7.10.2

Installed Plugins

No response

Java Version

1.8

OS Version

CentOS

Problem Description

When defining a mapping that has a field that uses DOY format, data gets indexed correctly However, we run into issues when using the rounding parser. See steps below for a test case.

We believe the issue is related to this section of code:

https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java#L48-L59

Steps to Reproduce

DELETE /test-doy-date

PUT /test-doy-date
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis"
      }
    }
  }
}

POST _bulk
{ "index" : { "_index" : "test-doy-date", "_id" : "1" } }
{ "timestamp" : "2022-104T14:08:30.100" }
{ "index" : { "_index" : "test-doy-date", "_id" : "2" } }
{ "timestamp" : "2022-104T14:08:30.540Z" }
{ "index" : { "_index" : "test-doy-date", "_id" : "3" } }
{ "timestamp" : "2022-104T14:08:31.100111" }
{ "index" : { "_index" : "test-doy-date", "_id" : "4" } }
{ "timestamp" : "2022-104T14:08:31.234567Z" }

GET /test-doy-date/_search

GET /test-doy-date/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2022-104T14:08:30.293",
            "lte": "2022-104T14:08:31.355",
            "format": "yyyy-DDD'T'HH:mm:ss.SSS"
          }
        }
      }
    }
  }
}

All steps complete without error except for the range filter, which outputs:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]]"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test-doy-date",
        "node" : "8QwQl8a5SvWteKZqObjW5g",
        "reason" : {
          "type" : "parse_exception",
          "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]]",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS]",
            "caused_by" : {
              "type" : "date_time_parse_exception",
              "reason" : "date_time_parse_exception: Text '2022-104T14:08:31.355' could not be parsed: Conflict found: Field DayOfYear 1 differs from DayOfYear 104 derived from 2022-01-01",
              "caused_by" : {
                "type" : "date_time_exception",
                "reason" : "date_time_exception: Conflict found: Field DayOfYear 1 differs from DayOfYear 104 derived from 2022-01-01"
              }
            }
          }
        }
      }
    ]
  },
  "status" : 400
}

If we omit the explicit format in the filter, we instead get

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]]"
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test-doy-date",
        "node" : "MB7SIp_pSrSComH5XlvIPQ",
        "reason" : {
          "type" : "parse_exception",
          "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]: [failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]]",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "failed to parse date field [2022-104T14:08:31.355] with format [yyyy-DDD'T'HH:mm:ss.SSS||yyyy-DDD'T'HH:mm:ss.SSSX||yyyy-DDD'T'HH:mm:ss.n||yyyy-DDD'T'HH:mm:ss.nX||strict_date_optional_time||epoch_millis]",
            "caused_by" : {
              "type" : "date_time_parse_exception",
              "reason" : "Failed to parse with all enclosed parsers"
            }
          }
        }
      }
    ]
  },
  "status" : 400
}

If we change lte to lt in the filter, we get a valid response:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:31.100111"
        }
      },
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:31.234567Z"
        }
      },
      {
        "_index" : "test-doy-date",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "timestamp" : "2022-104T14:08:30.540Z"
        }
      }
    ]
  }
}

If we use a custom format in the field mapping that does not use DOY, everything works fine.

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions