Improving auto correction's API by marcandre · Pull Request #7868 · rubocop/rubocop

marcandre · 2020-04-12T00:25:24Z

I am writing my first cop and I feel the current autocorrect API could be improved.

This PR is just a demo and not meant for merging. A few cops were refactored using the proposed API.

When to specify corrections

My main observation is that the very best moment to do an autocorrection is when the offense is detected. Currently the autocorrection and the offense detection are (officially) decoupled. When an offense is detected, we know exactly what it is (location, type, etc.) yet we are unable to make the correction there and then.

When we are later asked to make the autocorrection, we are passed the offending "node". All the state that we might have created to analyze the offense has to be rebuilt at that point, and if we detect different style of offenses, we even have to deduct again which offense this is and correct accordingly.

Current workarounds

Some cops simply redo the detection work twice

Some cops resort to "cheats" for the current state, like AccessModifierIndentation temporarily storing @column_delta. This works only provided that each autocorrect calls are made after each add_offense. Also, a naive approach wouldn't work reliably:

def auto_correct(node)
  lambda do |corrector|
    corrector.insert_after(node, ' ' * @column_delta) # Would be the last value for all corrections
  end
end

Overly complex de-dupping

The processing currently has two de-duping being done.

Firstly, extra offenses on the same range are ignored. I can see how this could be useful for certain cops.

Secondly, if offenses are on different ranges but the same node, the autocorrector is called only for the first one. The only builtin cop that uses this one is SpaceInsideReferenceBrackets: it will add one or two offenses on the opening '[' and/or closing ']', and autocorrect will correct the node (detecting again which of the opening or closing brackets need to be corrected) and relies on the dedupping not to double-correct if both the opening and closing brackets were wrong. Clearly, this is an example where this de-dupping isn't needed but instead the correction should be specifiable when detected...

That dedupping is the only reason I can see why the API of add_offense officially asks for a Node and optionally a location. There's no good reason why an offenses would be attached to a node rather than a range. Indeed a few pass a range instead (like EmptyLineAfterMagicComment) and cheat by repeating location: <same location>.

I propose that the newer API simply asks for a range (or node/comment and uses its loc.expression) and dedupes only on the range.

Alternative to returning lambdas / false

Finally I personally dislike returning lambdas. Ruby makes it trivial to capture blocks which feels to me a much more natural way to proceed.

It also wasn't clear at first that autocorrect can also return false when no correction can be made. This complicates processing a bit... Before you want to autocorrect, you have to check that you can autocorrect.
Moreover, there is also redundancy and it permits a cop writer to return a lambda that nevertheless wouldn't make any correction.

This trouble should be avoided and deduced automatically. Technically the issue is that we would need a way to build the corrections, check if there are any, and only later merge them.

In this "demo" I use a hack. I simply run the block twice, first on a fake corrector that simply records if it is called and the usual processing will call it later on Parser's TreeRewriter. The plan would be not to use a hack but I would instead add the merging functionality into TreeRewriter (I am its author btw). Before I spend time doing it though, I'd like to see if an agreement can reached on the add_offense API.

Proposed API - simple case

I would propose an improved (and fully compatible) API.

# Before:
def on_something(node)
  # ...
  add_offense(node)
end

def auto_correct(node)
  lambda do |corrector|
    corrector.wrap(node, '(', ')')
  end
end

# After:
include Autocorrector

def on_something(node)
  # ...
  add_offense(node) do |corrector|
    corrector.wrap(node, '(', ')')
  end
end

Post processing

One complication is that add_offense already yields (unless the offense is disabled) for some rare cases where post processing is needed. I noticed that 5% of the builtin cops use this post processing vs 45% that do autocorrections; most with post processing also have autocorrection. There are many solutions to this, but the simple case is simply to use the same block:

# Before:
def on_something(node)
  # ...
  add_offense(node) do
    # ... do only if not disabled
  end
end

def auto_correct(node)
  return false unless can_correct_this_one?(node)
  lambda do |corrector|
    corrector.wrap(node, '(', ')')
    # ...
  end
end

# After:
include Autocorrector

def on_something(node)
   # ...
  add_offense(node) do |corrector|
    # ... do only if not disabled

    if can_correct_this_one?(node)
      corrector.wrap(node, '(', ')')
      # ...
    end
end

Compatiblity

What I'm proposing should be compatible with existing cops.

The new API is opt-in for cops that decide to include Autocorrector. I'm proposing that the location: key isn't supported in the new API (pass the range directly instead of the node in those case).

It's not clear to what extent compatibility is required, in particular with Cop#corrections array. I wrote a CorrectionsProxy so that corrections could still be used. Actually the rest of the code still uses it for now (Team, ...), this seemed like a good way to test compatiblity. I intend on cleaning things up though so that the core doesn't use Cop#corrections...

marcandre · 2020-04-15T15:54:02Z

The new parser has the required changes to the TreeRewriter (couldn't resist implementing them). I'll update this PR but the API proposal remains the same.

PS: I think I have an even cleaner proposal. Will reopen when updated.

bbatsov · 2020-04-16T18:33:15Z

Sounds good to me! Sorry for the radio silence, but I didn't find the time to take a careful look at your proposal. Looking forward to the improved version!

marcandre · 2020-04-20T05:40:56Z

Updated the code and tweaked the API. I removed the syntax sugar part as it conflicts with the strict rules of RuboCop (no multiple block chaining). I believe it's not actually necessary and both the corrections and the code-to-execute-only-if-enabled can be in the same block. The revised API is thus simpler.

I never spelled out the downside of my proposal: it's potentially slower. For example, if RuboCop is run without the --autocorrect flag, the block is still call so the calculations for the autocorrections will still be run. My guess is that the time spent to calculated the corrections should typically be negligible compared to parsing and finding the offenses, so I don't think it will make a noticeable difference.

bbatsov · 2020-05-10T09:54:42Z

I'll try to find some time soon to review your proposal more carefully. In the mean time it'd be great if you ran some benchmarks regarding the performance impact.

marcandre · 2020-05-11T02:43:26Z

benchmarks regarding the performance impact.

Did you have anything in mind?

I probably should not have mentioned the theoretical performance impact, because this PR affects only the correction part, which is typically negligible to the inspection part.

I think the performance impact will be undetectable:
case
when no offenses? then no speed difference
when many offenses for cops that can't autocorrect? then no difference
when many offenses for autocorrection cops and running with auto-correction? then no speed difference
when many offenses for autocorrection cops and running with auto-correction and cops don't spend an inordinate amount of time figuring out what the correction would be? then no speed difference
else
maybe a slowdown
end

I ran the test suite before and after the changes and got similar speeds (1m12 before vs 1m7 after, no I don't think it's noticeably faster either)

marcandre · 2020-05-11T02:45:22Z

Question remaining to be answered:

Is compatiblity with Cop#corrections array required? It's not clear to me why a cop would fiddle with that. Still, I wrote a CorrectionsProxy that should be mostly compatible

marcandre · 2020-05-12T05:38:58Z

Actually 1.0 is coming up real soon, which is great timing. I propose to include my CorrectionsProxy solution with deprecation warnings, in case any code out there uses the array directly.

I will try to finalize this PR (although the proposed API will remain unchanged) and think it would make a great addition to the 1.0 release.

bbatsov · 2020-05-12T05:47:33Z

Is compatiblity with Cop#corrections array required? It's not clear to me why a cop would fiddle with that. Still, I wrote a CorrectionsProxy that should be mostly compatible

I don't think so. I can't even remember why we introduced this. Maybe @koic or @jonas054 do.

I went over your proposal more carefully and I like it quite a lot overall. I guess we'll gradually be shifting the auto-corrections to the new API as times goes on. The problem with big codebases is that after some point in time it gets really hard to keep the uniform - e.g. we never migrated all the code to the NodePattern, after it was introduced, we never adopted everywhere all the node extensions and so on.

marcandre · 2020-05-12T05:51:58Z

Well, the corrections array was needed to hold the corrections because they might not be applied afterall (cops can exclude other cops). In an ideal world though it would have been part of a CopRunner or of the Team class, not of the Cop class that is meant to be extended.

It's going to be fun de refactor the existing cops to use the new API 😄; I'll focus first on cleaning up the Team.

marcandre · 2020-05-17T06:52:59Z

I've made some progress on this. What I've done is modified cop/cop to reflect the proposed API, so anyone looking at that file for documentation would see the "modern" API / implementation. The compatibility with the legacy behavior is secluded in two legacy/..._support.rb files.

I have not yet changed the Team/Commissionner implementation, nor most of the specs, as a check that the behavior for the internal legacy API is correctly handled (e.g. Cop#corrections).

When I'm satisfied with this, the last step would be to have those use the modern API and add warning to the internal legacy API (e.g. Cop#corrections, but not the add_offense with the legacy API).

There's basically one spec that fails right now that has to do with error handling. Currently, if an error occurs when a Cop analyses a file, it is handled differently than if an error occurs when a Cop autocorrects it. In the former case, it logs the error or reraises it depending on the :raise option. In the latter it always raises. Compare the specs for the analysis part and for the autocorrection part. Note that both tests say "records Team#errors" but that's not what the second spec actually tests! BTW, we are are not talking about ClobberringErrors here, but any other StandardError.

Does anyone know the reason for this difference in behavior? Is there a potential issue in having the same behavior (i.e. raise only if the :raise setting is on)?

bbatsov · 2020-05-17T16:49:50Z

Likely that's just some oversight.

There's basically one spec that fails right now that has to do with error handling. Currently, if an error occurs when a Cop analyses a file, it is handled differently than if an error occurs when a Cop autocorrects it. In the former case, it logs the error or reraises it depending on the :raise option. In the latter it always raises.

Probably we can simplify things, as I don't see much reason for having different approaches.

We'll also have to document all of this in the development docs, which are pretty light on such details right now. That's all we have on implementing auto-correction https://docs.rubocop.org/en/latest/development/#auto-correct I guess everyone just learns the ropes from other cops. :-)

marcandre · 2020-05-17T17:23:32Z

Probably we can simplify things, as I don't see much reason for having different approaches.

Great, thanks for the answer.

We'll also have to document all of this in the development docs, which are pretty light on such details right now. That's all we have on implementing auto-correction https://docs.rubocop.org/en/latest/development/#auto-correct

Right. I'll update and improve the docs.

I guess everyone just learns the ropes from other cops. :-)

Or trying things out to see if it passes or not, like this cop passing [[location], location] instead of a Node!

marcandre · 2020-05-20T22:37:09Z

@kittiphophum18 your comment appears to be a quote of my entire message (I removed it for brevity)

marcandre changed the title ~~Autocorrector: Proof of concept~~ Improving auto correction's API Apr 12, 2020

marcandre mentioned this pull request Apr 14, 2020

+ Source::TreeRewriter: Add #merge, #merge! and #empty? whitequark/parser#674

Merged

marcandre closed this Apr 15, 2020

marcandre reopened this Apr 20, 2020

owst mentioned this pull request Apr 25, 2020

Add Style/RedundantRegexpEscape cop #7908

Merged

8 tasks

marcandre mentioned this pull request May 13, 2020

Improve frozen_string_literal_comment autocorrection #7965

Merged

bbatsov added this to the 1.0.0 milestone May 13, 2020

marcandre mentioned this pull request May 13, 2020

Improved shared context. Refactor & simplify some specs. #7970

Merged

marcandre force-pushed the autocorrect branch from 9690878 to 706f540 Compare May 17, 2020 06:32

This was referenced May 21, 2020

Allow cops to share context between files #7968

Closed

[0.76.0] Layout/MultilineHashBraceLayout auto-correct not correcting #7520

Closed

marcandre force-pushed the autocorrect branch from 706f540 to 080db18 Compare May 21, 2020 05:41

This was referenced May 23, 2020

RuboCop exception handling of autocorrecting cop has multiple issues #8016

Closed

Incorrect Style/SafeNavigation transformation #7914

Closed

marcandre force-pushed the autocorrect branch 2 times, most recently from bce0844 to 6be48c1 Compare May 24, 2020 04:14

marcandre marked this pull request as draft May 24, 2020 04:19

koic mentioned this pull request Sep 5, 2020

Use Cop::Base API for Style/ConditionalAssignment #8652

Merged

8 tasks

koic mentioned this pull request Sep 22, 2020

Use Cop::Base API for some Style cops #8769

Merged

8 tasks

koic mentioned this pull request Nov 6, 2020

Use Cop::Base API for some Bundler and Gemspec cops #9000

Merged

8 tasks

This was referenced Nov 20, 2020

Use Cop::Base API for Rails department rubocop/rubocop-rails#387

Merged

Import EnforceSuperclass module rubocop/rubocop-rails#390

Merged

koic mentioned this pull request Nov 30, 2020

Use Cop::Base API for several cops #9120

Merged

8 tasks

dvandersluis mentioned this pull request Dec 8, 2020

Updated 5 cops to inherit from Base instead of Cop #9192

Merged

8 tasks

This was referenced Dec 20, 2020

Use Cop::Base API for Rake department rubocop/rubocop-rake#34

Merged

Use Cop::Base API instead of legacy API standardrb/standard#231

Merged

koic mentioned this pull request Jan 16, 2021

Use Cop::Base API rubocop/rubocop-sequel#18

Merged

koic mentioned this pull request Feb 3, 2021

Use Cop::Base API rubocop/rubocop-rubycw#11

Merged

This was referenced Mar 8, 2021

Use Cop::Base API for Style/HashSyntax #9564

Merged

Use Cop::Base API for Alignment #9576

Merged

Use Cop::Base API for Style/RescueModifier #9585

Merged

koic mentioned this pull request Apr 5, 2021

Use Cop::Base API for Minitest department rubocop/rubocop-minitest#128

Merged

8 tasks

koic mentioned this pull request Apr 29, 2021

Use Cop::Base API github/rubocop-github#87

Merged

koic mentioned this pull request Jun 12, 2021

Use Cop::Base API for Layout/RedundantLineBreak #9875

Merged

8 tasks

This was referenced Sep 24, 2021

Use Cop::Base for Bundler/OrderedGems and Gemspec/OrderedDependencies #10123

Merged

Replace Cop with Base in tests to prepare for deprecating Cop #10127

Merged

koic mentioned this pull request May 18, 2022

Use Cop::Base API for ThreadSafety department rubocop/rubocop-thread_safety#8

Merged

karouf mentioned this pull request Oct 24, 2022

Custom cop instantiated multiple times when support_multiple_source? returns true #11119

Closed

This was referenced Feb 9, 2023

Make custom cop inherit RuboCop::Cop::Base rubocop/rubocop-i18n#59

Merged

Make custom cop inherit RuboCop::Cop::Base ruby/rubygems#6364

Merged

koic mentioned this pull request May 10, 2024

Make custom cop inherit RuboCop::Cop::Base jekyll/jekyll#9597

Merged

Uh oh!

Conversation

marcandre commented Apr 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

When to specify corrections

Current workarounds

Overly complex de-dupping

Alternative to returning lambdas / false

Proposed API - simple case

Post processing

Compatiblity

Uh oh!

marcandre commented Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbatsov commented Apr 16, 2020

Uh oh!

marcandre commented Apr 20, 2020

Uh oh!

bbatsov commented May 10, 2020

Uh oh!

marcandre commented May 11, 2020

Uh oh!

marcandre commented May 11, 2020

Uh oh!

marcandre commented May 12, 2020

Uh oh!

bbatsov commented May 12, 2020

Uh oh!

marcandre commented May 12, 2020

Uh oh!

marcandre commented May 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbatsov commented May 17, 2020

Uh oh!

marcandre commented May 17, 2020

Uh oh!

marcandre commented May 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

marcandre commented Apr 12, 2020 •

edited

Loading

marcandre commented Apr 15, 2020 •

edited

Loading

marcandre commented May 17, 2020 •

edited

Loading