Skip to content

[VL] Failures need fix for UTs which are newly added in vanilla spark3.3 #2169

@yma11

Description

@yma11
  • SPARK-35675: EnsureRequirements remove shuffle should respect PartitioningCollection
    --------GlutenBroadcastJoinSuite-------
  • replace partial hash aggregate with sort aggregate
  • replace partial and final hash aggregate together with sort aggregate
  • do not replace hash aggregate if child does not have sort order
  • do not replace hash aggregate if there is no group-by column
  • Merge runtime bloom filters
  • GlutenParquetDeltaByteArrayEncodingSuite
  • GlutenParquetDeltaLengthByteArrayEncodingSuite
  • GlutenParquetFieldIdIOSuite
  • Parquet reads infer fields using field ids correctly
  • absence of field ids
  • multiple id matches
  • read parquet file without ids
  • global read/write flag should work correctly
    ----- GlutenParquetVectorizedSuite ------
  • metadata struct (parquet): read partial/all metadata struct fields
  • metadata struct (parquet): read metadata struct fields with random ordering
  • metadata struct (parquet): read metadata struct fields with expressions
  • metadata struct (parquet): select only metadata
  • metadata struct (parquet): select and re-select
  • metadata struct (parquet): alias
  • metadata struct (parquet): upper/lower case when case sensitive is true
  • metadata struct (parquet): read metadata with offheap set to true
  • metadata struct (parquet): read metadata with offheap set to false
  • metadata struct (parquet): read metadata withnestedSchemaPruning set to true
  • metadata struct (parquet): read metadata withnestedSchemaPruning set to false
  • metadata struct (parquet): prune metadata schema in projects
  • metadata struct (parquet): write _metadata in parquet and read back
  • aggregate push down - different data types

____________GlutenParquetV2AggregatePushDownSuite------------

  • nested column: Count(top level column) push down
  • Count(partition column): push down
  • filter alias over aggregate
  • alias over aggregate
  • aggregate over alias push down
  • aggregate with partition filter can be pushed down
  • aggregate with partition group by can be pushed down
  • aggregate with multi partition group by columns can be pushed down
  • aggregate push down - MIN/MAX/COUNT
  • aggregate push down - different data types
  • column name case sensitivity
  • aggregate push down - different data types

___________GlutenOrcV2AggregatePushDownSuite-------------

  • nested column: Count(top level column) push down
  • Count(partition column): push down
  • filter alias over aggregate
  • alias over aggregate
  • aggregate over alias push down
  • aggregate with partition filter can be pushed down
  • aggregate with partition group by can be pushed down
  • aggregate with multi partition group by columns can be pushed down
  • aggregate push down - MIN/MAX/COUNT
  • aggregate push down - different data types
  • column name case sensitivity
  • replace partial hash aggregate with sort aggregate
  • replace partial and final hash aggregate together with sort aggregate
  • do not replace hash aggregate if child does not have sort order
  • do not replace hash aggregate if there is no group-by column
  • Merge runtime bloom filters
  • determining the number of reducers: aggregate operator
  • determining the number of reducers: join operator
  • determining the number of reducers: complex query 1
  • determining the number of reducers: complex query 2
  • SPARK-24705 adaptive query execution works correctly when exchange reuse enabled
  • Union two datasets with different pre-shuffle partition number
  • SPARK-34790: enable IO encryption in AQE partition coalescing
    ---- GlutenEnsureRequirementsSuite: reorder should handle PartitioningCollection

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions