Consider introducing unique expression IDs in Logical/Physical plan

### Is your feature request related to a problem or challenge?

In Spark, they have a concept of `ExprId` which is used to uniquely identify named expressions:

https://github.com/apache/spark/blob/9bb358b51e30b5041c0cd20e27cf995aca5ed4c7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L41-L57

```scala
/**
 * A globally unique id for a given named expression.
 * Used to identify which attribute output by a relation is being
 * referenced in a subsequent computation.
 *
 * The `id` field is unique within a given JVM, while the `uuid` is used to uniquely identify JVMs.
 */
case class ExprId(id: Long, jvmId: UUID) {

  override def equals(other: Any): Boolean = other match {
    case ExprId(id, jvmId) => this.id == id && this.jvmId == jvmId
    case _ => false
  }

  override def hashCode(): Int = id.hashCode()

}
```

Is it worth as attempting to introduce something similar in DataFusion?

There are issues being caused by rules in the optimizer comparing directly on column name leading to bugs when duplicate names appear, such as https://github.com/apache/arrow-datafusion/issues/8374

If during the analysis of a plan we can assign unique numeric IDs for columns, we could check for column equality based on these IDs and not need to compare string names.

The obvious downside would be this seems like a large effort in refactoring, not to mention breaking changes.

### Describe the solution you'd like

Consider introduction of unique ID for columns/expressions to potentially simplify optimization/planning code

### Describe alternatives you've considered

Don't do this (large refactoring effort? breaking changes?)

### Additional context

Just a thought I had bouncing in my head, would appreciate to hear more thoughts on this (even if this seems unfeasible), or if there was already some prior discussion on a similar topic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider introducing unique expression IDs in Logical/Physical plan #8379

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider introducing unique expression IDs in Logical/Physical plan #8379

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions