Skip to content

[Enhancement]: call chaining readability rule #286

@sbrugman

Description

@sbrugman

Overview

Many popular python libraries support an API for call chaining, also known as fluent interface. Examples are pandas, pyspark and pytorch.

Using the fluent interface increases readability and has more than once helped in spotting bugs early.
The core pattern of reassigning to the same variable is missing.

def process(spark, file_name: str):
    common_columns = ["col1_renamed", "col2_renamed", "custom_col"]
    df = spark.read.parquet(file_name)
    df = df \
        .withColumnRenamed('col1', 'col1_renamed') \
        .withColumnRenamed('col2', 'col2_renamed')
    df = df \
        .select(common_columns) \
        .withColumn('service_type', F.lit('green'))
    return df
def process(spark, file_name: str):
    common_columns = ["col1_renamed", "col2_renamed", "custom_col"]
    return (
        spark.read.parquet(file_name)
        .withColumnRenamed('col1', 'col1_renamed')
        .withColumnRenamed('col2', 'col2_renamed')
        .select(common_columns)
        .withColumn('service_type', F.lit('green'))
    )

In existing linters and formatters, I have found partial functionality to enforce this pattern. :

Proposal

Rather than creating a stand-alone lint for this, I would like to propose to include a rule to refurb that detects multiple assignments to the same variable that could be chained.

Are you open to adding this rule to the library?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions