Timestamps not matching format are replaced with nulls

Hi.

I'm trying to parse simple xml file:
```xml
<item>
  <created-at>2021-01-01T01:01:01+00:00</created-at>
</item>
```

```python
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, TimestampType

spark = SparkSession.builder.config("spark.jars.packages", "com.databricks:spark-xml_2.12:0.17.0").getOrCreate()
schema = StructType([StructField("created-at", TimestampType())])
spark.read.format("xml").options(rowTag='item').schema(schema).load("1.xml").show()
```

Result:
|         created-at|
|-------------------|
|2021-01-01 01:01:01|

But if timestamp does not match format, e.g. `T` is replaced with space:
```xml
<item>
  <created-at>2021-01-01 01:01:01+00:00</created-at>
</item>
```

It is read as `null`:
|created-at|
|----------|
|      null|

I see that there is an option `mode` with `PERMISSIVE` as default, which leads to `when it encounters a field of the wrong datatype, it sets the offending field to null`. But malformed value is not being added to column `_corrupt_record` because there is nothing wrong with xml structure.
So there is no way to detect if input file contains tag with wrong field value or `nullValue`, unless user set a different `mode`.
Is that desired behavior?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamps not matching format are replaced with nulls #662

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timestamps not matching format are replaced with nulls #662

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions