Skip to content

Improving auto correction's API#7868

Merged
marcandre merged 1 commit intorubocop:masterfrom
marcandre:autocorrect
Jun 22, 2020
Merged

Improving auto correction's API#7868
marcandre merged 1 commit intorubocop:masterfrom
marcandre:autocorrect

Conversation

@marcandre
Copy link
Copy Markdown
Contributor

@marcandre marcandre commented Apr 12, 2020

I am writing my first cop and I feel the current autocorrect API could be improved.

This PR is just a demo and not meant for merging. A few cops were refactored using the proposed API.

When to specify corrections

My main observation is that the very best moment to do an autocorrection is when the offense is detected. Currently the autocorrection and the offense detection are (officially) decoupled. When an offense is detected, we know exactly what it is (location, type, etc.) yet we are unable to make the correction there and then.

When we are later asked to make the autocorrection, we are passed the offending "node". All the state that we might have created to analyze the offense has to be rebuilt at that point, and if we detect different style of offenses, we even have to deduct again which offense this is and correct accordingly.

Current workarounds

Some cops simply redo the detection work twice

Some cops resort to "cheats" for the current state, like AccessModifierIndentation temporarily storing @column_delta. This works only provided that each autocorrect calls are made after each add_offense. Also, a naive approach wouldn't work reliably:

def auto_correct(node)
  lambda do |corrector|
    corrector.insert_after(node, ' ' * @column_delta) # Would be the last value for all corrections
  end
end

Overly complex de-dupping

The processing currently has two de-duping being done.

Firstly, extra offenses on the same range are ignored. I can see how this could be useful for certain cops.

Secondly, if offenses are on different ranges but the same node, the autocorrector is called only for the first one. The only builtin cop that uses this one is SpaceInsideReferenceBrackets: it will add one or two offenses on the opening '[' and/or closing ']', and autocorrect will correct the node (detecting again which of the opening or closing brackets need to be corrected) and relies on the dedupping not to double-correct if both the opening and closing brackets were wrong. Clearly, this is an example where this de-dupping isn't needed but instead the correction should be specifiable when detected...

That dedupping is the only reason I can see why the API of add_offense officially asks for a Node and optionally a location. There's no good reason why an offenses would be attached to a node rather than a range. Indeed a few pass a range instead (like EmptyLineAfterMagicComment) and cheat by repeating location: <same location>.

I propose that the newer API simply asks for a range (or node/comment and uses its loc.expression) and dedupes only on the range.

Alternative to returning lambdas / false

Finally I personally dislike returning lambdas. Ruby makes it trivial to capture blocks which feels to me a much more natural way to proceed.

It also wasn't clear at first that autocorrect can also return false when no correction can be made. This complicates processing a bit... Before you want to autocorrect, you have to check that you can autocorrect.
Moreover, there is also redundancy and it permits a cop writer to return a lambda that nevertheless wouldn't make any correction.

This trouble should be avoided and deduced automatically. Technically the issue is that we would need a way to build the corrections, check if there are any, and only later merge them.

In this "demo" I use a hack. I simply run the block twice, first on a fake corrector that simply records if it is called and the usual processing will call it later on Parser's TreeRewriter. The plan would be not to use a hack but I would instead add the merging functionality into TreeRewriter (I am its author btw). Before I spend time doing it though, I'd like to see if an agreement can reached on the add_offense API.

Proposed API - simple case

I would propose an improved (and fully compatible) API.

# Before:
def on_something(node)
  # ...
  add_offense(node)
end

def auto_correct(node)
  lambda do |corrector|
    corrector.wrap(node, '(', ')')
  end
end

# After:
include Autocorrector

def on_something(node)
  # ...
  add_offense(node) do |corrector|
    corrector.wrap(node, '(', ')')
  end
end

Post processing

One complication is that add_offense already yields (unless the offense is disabled) for some rare cases where post processing is needed. I noticed that 5% of the builtin cops use this post processing vs 45% that do autocorrections; most with post processing also have autocorrection. There are many solutions to this, but the simple case is simply to use the same block:

# Before:
def on_something(node)
  # ...
  add_offense(node) do
    # ... do only if not disabled
  end
end

def auto_correct(node)
  return false unless can_correct_this_one?(node)
  lambda do |corrector|
    corrector.wrap(node, '(', ')')
    # ...
  end
end

# After:
include Autocorrector

def on_something(node)
   # ...
  add_offense(node) do |corrector|
    # ... do only if not disabled

    if can_correct_this_one?(node)
      corrector.wrap(node, '(', ')')
      # ...
    end
end

Compatiblity

What I'm proposing should be compatible with existing cops.

The new API is opt-in for cops that decide to include Autocorrector. I'm proposing that the location: key isn't supported in the new API (pass the range directly instead of the node in those case).

It's not clear to what extent compatibility is required, in particular with Cop#corrections array. I wrote a CorrectionsProxy so that corrections could still be used. Actually the rest of the code still uses it for now (Team, ...), this seemed like a good way to test compatiblity. I intend on cleaning things up though so that the core doesn't use Cop#corrections...

@marcandre marcandre changed the title Autocorrector: Proof of concept Improving auto correction's API Apr 12, 2020
@marcandre
Copy link
Copy Markdown
Contributor Author

marcandre commented Apr 15, 2020

The new parser has the required changes to the TreeRewriter (couldn't resist implementing them). I'll update this PR but the API proposal remains the same.

PS: I think I have an even cleaner proposal. Will reopen when updated.

@marcandre marcandre closed this Apr 15, 2020
@bbatsov
Copy link
Copy Markdown
Collaborator

bbatsov commented Apr 16, 2020

Sounds good to me! Sorry for the radio silence, but I didn't find the time to take a careful look at your proposal. Looking forward to the improved version!

@marcandre marcandre reopened this Apr 20, 2020
@marcandre
Copy link
Copy Markdown
Contributor Author

Updated the code and tweaked the API. I removed the syntax sugar part as it conflicts with the strict rules of RuboCop (no multiple block chaining). I believe it's not actually necessary and both the corrections and the code-to-execute-only-if-enabled can be in the same block. The revised API is thus simpler.

I never spelled out the downside of my proposal: it's potentially slower. For example, if RuboCop is run without the --autocorrect flag, the block is still call so the calculations for the autocorrections will still be run. My guess is that the time spent to calculated the corrections should typically be negligible compared to parsing and finding the offenses, so I don't think it will make a noticeable difference.

@owst owst mentioned this pull request Apr 25, 2020
8 tasks
@bbatsov
Copy link
Copy Markdown
Collaborator

bbatsov commented May 10, 2020

I'll try to find some time soon to review your proposal more carefully. In the mean time it'd be great if you ran some benchmarks regarding the performance impact.

@marcandre
Copy link
Copy Markdown
Contributor Author

benchmarks regarding the performance impact.

Did you have anything in mind?

I probably should not have mentioned the theoretical performance impact, because this PR affects only the correction part, which is typically negligible to the inspection part.

I think the performance impact will be undetectable:
case
when no offenses? then no speed difference
when many offenses for cops that can't autocorrect? then no difference
when many offenses for autocorrection cops and running with auto-correction? then no speed difference
when many offenses for autocorrection cops and running with auto-correction and cops don't spend an inordinate amount of time figuring out what the correction would be? then no speed difference
else
maybe a slowdown
end

I ran the test suite before and after the changes and got similar speeds (1m12 before vs 1m7 after, no I don't think it's noticeably faster either)

@marcandre
Copy link
Copy Markdown
Contributor Author

Question remaining to be answered:

Is compatiblity with Cop#corrections array required? It's not clear to me why a cop would fiddle with that. Still, I wrote a CorrectionsProxy that should be mostly compatible

@marcandre
Copy link
Copy Markdown
Contributor Author

Actually 1.0 is coming up real soon, which is great timing. I propose to include my CorrectionsProxy solution with deprecation warnings, in case any code out there uses the array directly.

I will try to finalize this PR (although the proposed API will remain unchanged) and think it would make a great addition to the 1.0 release.

@bbatsov
Copy link
Copy Markdown
Collaborator

bbatsov commented May 12, 2020

Is compatiblity with Cop#corrections array required? It's not clear to me why a cop would fiddle with that. Still, I wrote a CorrectionsProxy that should be mostly compatible

I don't think so. I can't even remember why we introduced this. Maybe @koic or @jonas054 do.

I went over your proposal more carefully and I like it quite a lot overall. I guess we'll gradually be shifting the auto-corrections to the new API as times goes on. The problem with big codebases is that after some point in time it gets really hard to keep the uniform - e.g. we never migrated all the code to the NodePattern, after it was introduced, we never adopted everywhere all the node extensions and so on.

@marcandre
Copy link
Copy Markdown
Contributor Author

Well, the corrections array was needed to hold the corrections because they might not be applied afterall (cops can exclude other cops). In an ideal world though it would have been part of a CopRunner or of the Team class, not of the Cop class that is meant to be extended.

It's going to be fun de refactor the existing cops to use the new API 😄; I'll focus first on cleaning up the Team.

@marcandre
Copy link
Copy Markdown
Contributor Author

marcandre commented May 17, 2020

I've made some progress on this. What I've done is modified cop/cop to reflect the proposed API, so anyone looking at that file for documentation would see the "modern" API / implementation. The compatibility with the legacy behavior is secluded in two legacy/..._support.rb files.

I have not yet changed the Team/Commissionner implementation, nor most of the specs, as a check that the behavior for the internal legacy API is correctly handled (e.g. Cop#corrections).

When I'm satisfied with this, the last step would be to have those use the modern API and add warning to the internal legacy API (e.g. Cop#corrections, but not the add_offense with the legacy API).

There's basically one spec that fails right now that has to do with error handling. Currently, if an error occurs when a Cop analyses a file, it is handled differently than if an error occurs when a Cop autocorrects it. In the former case, it logs the error or reraises it depending on the :raise option. In the latter it always raises. Compare the specs for the analysis part and for the autocorrection part. Note that both tests say "records Team#errors" but that's not what the second spec actually tests! BTW, we are are not talking about ClobberringErrors here, but any other StandardError.

Does anyone know the reason for this difference in behavior? Is there a potential issue in having the same behavior (i.e. raise only if the :raise setting is on)?

@bbatsov
Copy link
Copy Markdown
Collaborator

bbatsov commented May 17, 2020

Likely that's just some oversight.

There's basically one spec that fails right now that has to do with error handling. Currently, if an error occurs when a Cop analyses a file, it is handled differently than if an error occurs when a Cop autocorrects it. In the former case, it logs the error or reraises it depending on the :raise option. In the latter it always raises.

Probably we can simplify things, as I don't see much reason for having different approaches.

We'll also have to document all of this in the development docs, which are pretty light on such details right now. That's all we have on implementing auto-correction https://docs.rubocop.org/en/latest/development/#auto-correct I guess everyone just learns the ropes from other cops. :-)

@marcandre
Copy link
Copy Markdown
Contributor Author

Probably we can simplify things, as I don't see much reason for having different approaches.

Great, thanks for the answer.

We'll also have to document all of this in the development docs, which are pretty light on such details right now. That's all we have on implementing auto-correction https://docs.rubocop.org/en/latest/development/#auto-correct

Right. I'll update and improve the docs.

I guess everyone just learns the ropes from other cops. :-)

Or trying things out to see if it passes or not, like this cop passing [[location], location] instead of a Node!

@marcandre
Copy link
Copy Markdown
Contributor Author

@kittiphophum18 your comment appears to be a quote of my entire message (I removed it for brevity)

@koic koic mentioned this pull request Sep 22, 2020
8 tasks
@koic koic mentioned this pull request Nov 30, 2020
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants