Skip to content

Conversation

@HyukjinKwon
Copy link
Member

https://github.com/databricks/spark-csv/issues/218 https://github.com/databricks/spark-csv/issues/219

In this PR, I made the pruned scan try to parse all the values in columns when DROPMALFORMED is enabled and return only required fields.

In addition, I changed the condition for table scan. If required columns are empty, then it just produces empty rows.

@codecov-io
Copy link

Current coverage is 85.82%

Merging #220 into master will increase coverage by +0.11% as of 5b06cea

@@            master    #220   diff @@
======================================
  Files           11      11       
  Stmts          497     501     +4
  Branches       146     148     +2
  Methods          0       0       
======================================
+ Hit            426     430     +4
  Partial          0       0       
  Missed          71      71       

Review entire Coverage Diff as of 5b06cea

Powered by Codecov. Updated on successful CI builds.

@falaki
Copy link
Member

falaki commented Dec 28, 2015

Thanks @HyukjinKwon for investigating this. Do you think we should also include FAILFAST mode?

@HyukjinKwon
Copy link
Member Author

@falaki Ah..if you meant handling some logics to work FAILFAST mode properly, actually it looks fine with FAILFAST mode as it just compares schema to tokens (which are always read all regardless of pruned scan or table scan).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply using schemaFields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the order of columns. The required columns can be in a different order with the original schema.
If you look through the code below, you will notice I slice the row by take().

@HyukjinKwon
Copy link
Member Author

And.. would you mind if you look through the codes, in particular, the variable names (although it is trivial)?
I am not too sure if the naming looks appropriate.

@falaki
Copy link
Member

falaki commented Dec 28, 2015

Thanks this looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants