Overview
Many popular python libraries support an API for call chaining, also known as fluent interface. Examples are pandas, pyspark and pytorch.
Using the fluent interface increases readability and has more than once helped in spotting bugs early.
The core pattern of reassigning to the same variable is missing.
def process(spark, file_name: str):
common_columns = ["col1_renamed", "col2_renamed", "custom_col"]
df = spark.read.parquet(file_name)
df = df \
.withColumnRenamed('col1', 'col1_renamed') \
.withColumnRenamed('col2', 'col2_renamed')
df = df \
.select(common_columns) \
.withColumn('service_type', F.lit('green'))
return df
def process(spark, file_name: str):
common_columns = ["col1_renamed", "col2_renamed", "custom_col"]
return (
spark.read.parquet(file_name)
.withColumnRenamed('col1', 'col1_renamed')
.withColumnRenamed('col2', 'col2_renamed')
.select(common_columns)
.withColumn('service_type', F.lit('green'))
)
In existing linters and formatters, I have found partial functionality to enforce this pattern. :
Proposal
Rather than creating a stand-alone lint for this, I would like to propose to include a rule to refurb that detects multiple assignments to the same variable that could be chained.
Are you open to adding this rule to the library?
Overview
Many popular python libraries support an API for call chaining, also known as fluent interface. Examples are pandas, pyspark and pytorch.
Using the fluent interface increases readability and has more than once helped in spotting bugs early.
The core pattern of reassigning to the same variable is missing.
In existing linters and formatters, I have found partial functionality to enforce this pattern. :
Proposal
Rather than creating a stand-alone lint for this, I would like to propose to include a rule to
refurbthat detects multiple assignments to the same variable that could be chained.Are you open to adding this rule to the library?