[MRG+1] Fix alerts found with lgtm by jhelie · Pull Request #9278 · scikit-learn/scikit-learn

jhelie · 2017-07-04T23:13:05Z

This PR fixes some alerts flagged by lgtm.com code queries. Most are minor improvements suggestions (mostly fixing some misconstructed child classes) but a few some should fix unintentional code behaviour.

The standard lgtm engineering analytics highlights dependencies, vulnerabilities and tracks introduced and fixed alerts as the code changes (and shows that scikit-learn keeps getting better) but you can investigate further by writing your own custom queries to explore your code base.

I personally find lgtm alerts most useful when using PR integration as it allows to check and deal with potential issues before they find their way into master (see for instance 5147fd0).

Full disclosure: I frequently use scikit-learn but also work on lgtm.com, hence my interest! Hope that helps, any question please ask.

jnothman · 2017-07-05T05:00:59Z

Thanks. The string comparison by reference fix is the only one that appears to actually be a bug fix. In some cases we intentionally avoid super...__init__, but these look good to me.

Is there a way to mark LGTM alerts as "Won't fix" (e.g. the first)?

Your PR introduces flake8 errors.

jnothman · 2017-07-05T05:05:09Z

From a glance at the alert, it looks like a number of them are valid. What's the best way to get a permalink for an alert?

jnothman · 2017-07-05T05:26:42Z

Apart from Flake8 errors (please fix if you can), this LGTM.

GaelVaroquaux · 2017-07-05T06:05:22Z

sklearn/decomposition/sparse_pca.py

+        method=method, random_state=random_state)
        self.n_iter = n_iter
-        self.callback = callback
+        self.callback = callback    


It seems to me that there is trailing whitespace here.

GaelVaroquaux · 2017-07-05T06:07:54Z

All these changes are good. I am 👍 for merge, but the style issues (flake8) should be addressed.

…e/scikit-learn into fix-alerts-found-with-lgtm

jnothman · 2017-07-05T06:56:02Z

Still failing flake8, says Travis:

--------------------------------------------------------------------------------
./sklearn/decomposition/sparse_pca.py:265:33: W291 trailing whitespace
        self.callback = callback
                                ^

jhelie · 2017-07-05T06:59:58Z

@jnothman hopefully that was the last one but will update if not - replying to your other questions now!

jhelie · 2017-07-05T07:11:22Z

@jnothman permalink to alerts are obtained by clicking the (diagonal up) arrow at the end of the line highlighting the alert - while the question mark symbol next to it takes you to the code query.

e.g. for the reference vs value check the link is https://lgtm.com/projects/g/scikit-learn/scikit-learn/snapshot/1a981e87ba6f8d66ecfd6940184836446ded3f4c/files/benchmarks/bench_plot_randomized_svd.py#V185

jnothman · 2017-07-05T07:21:16Z

yes, flake is happy.

jhelie · 2017-07-05T07:22:32Z

But I should add that an alert does not need to have found its way into your code base to be referenced and discussed: if you turn on PR integration any introduced alerts will get a similar link that can be shared (obviously pushing new commits will re-trigger the analysis and that might fix the alert)

see for example: https://lgtm.com/projects/g/matplotlib/matplotlib/rev/pr-b7a573728fb8b7712d6aca3e99ef36d8f308c015 (this is an example of a fixed alert, but the idea is the same)

jnothman · 2017-07-05T07:22:36Z

Thanks for this. I've posted another couple of LGTM-derived issues.

jhelie · 2017-07-05T07:27:14Z

@jnothman you're welcome. And regarding marking alerts as "won't fix": this is something we're actively working on, in the future you will be able to dismiss individual alerts, turn particular queries on/off and decide which parts of your code base should not be analysed (we automatically detect library, generated and test code but we might miss some).

We've recently come out of beta and we keep rolling out features, any feedback is welcome!

jnothman · 2017-07-05T07:33:20Z

Ability to reject false positives would definitely be high on or top of the list. Other usability things I have noted:

the CI integration docs do not describe what circumstances lead to a red x.
when I view full source code for an alert, and then click back, all the alerts are collapsed again. It should be possible to hit back and see where I was up to in the listing.
should be able to filter alerts list by path.

jhelie · 2017-07-05T09:10:46Z

Thanks for this @jnothman! pinging @sjvs our product manager

amueller · 2017-07-05T15:01:37Z

do we want LGTM CI or do we wait for flagging false positives?
What tech are they using? the default queries are proprietary, right? Obviously it would be best if we could do this using open tools.

sj · 2017-07-05T15:04:04Z

@jhelie: thanks for looping me in; @jnothman: thanks for the feedback!

Regarding CI documentation: a red x will only appear if it is not possible to analyse a revision. For Python analysis this shouldn't happen (famous last words...), but for languages like Java it's very difficult to analyse source code that doesn't compile. However, you're right: this needs to be documented more clearly; I will pass that on to our engineering and documentation teams!

I'll pass your report re. the back button on as well. Browsing and filtering alerts by path is very much on our schedule for the next few months — stay tuned!

amueller · 2017-07-05T15:12:17Z

I don't understand how lgtm infers input types. Does it use the docstring? or just fuzz it?

sj · 2017-07-05T15:12:50Z

@amueller: the tech behind it is documented here: https://lgtm.com/docs/lgtm/about-lgtm. The queries are all open-sourced here: https://github.com/lgtmhq/lgtm-queries. Pull requests with improvements are really welcome!

We will soon introduce functionality running your own queries as part of PR integration; you can already get started with your own queries on the query console here: https://lgtm.com/query/project:12060052/lang:python/

We pride ourselves on our extremely low false-positive levels, made possible by the query technology and our vision: code = data. If you find any FPs: do let us know (or file a PR!). And if you'd like to know more about the tech behind it, do have a read of some of papers:

QL: Object-oriented Queries on Relational Data (Avgustinov et al., ECOOP 2016)
Type Inference for Datalog with Complex Type Hierarchies (Schäfer et al., POP 2010)
Type Inference for Datalog and Its Application to Query Optimisation (De Moor et al., PODS 2008)
Algebraic Data Types for Object-oriented Datalog (Schäfer et al.)

Any other questions: shoot me a message!

amueller · 2017-07-05T15:17:57Z

@sjvs awesome, thanks!

sj · 2017-07-05T15:20:13Z

@amueller: your question about input types appeared as I was typing my previous response. The quick answer: lgtm performs points-to analysis which tracks the possible types of a variable. This analysis is being improved on a daily basis (literally as we speak).

To answer your original question:

do we want LGTM CI or do we wait for flagging false positives?

I'd say: yes, you do want that 😄.

On a more serious note: we launched lgtm.com fairly recently and are very keen to receive feedback from the community. So I'd really appreciate your thoughts and suggestions for improvements — including (especially!) on the continuous integration.

jhelie · 2017-07-05T15:29:38Z

@amueller: what @sjvs says, and please feel free to ping me if you suspect an FP (I can see a few dubious cases atm but always keen to hear from the project devs!)

jnothman · 2017-07-05T21:47:13Z

The main reason I'd be tempted towards lgtm.com integration is that if we don't integrate it takes a long time to update and show us that we've successfully eradicated alerts! :P

…

On 6 July 2017 at 01:29, Jean Helie ***@***.***> wrote: @amueller <https://github.com/amueller>: what @sjvs <https://github.com/sjvs> says, and please feel free to ping me if you suspect an FP (I can see a few dubious cases atm but always keen to hear from the devs!) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9278 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6wd9RWY2STA0i09y13CQi6HoViAYks5sK6vogaJpZM4ONuv2> .

sj · 2017-07-05T21:57:54Z

Hehe — point taken, @jnothman. I'll make sure to look into increasing the update frequency soon! We're regularly polling > 50000 GitHub projects to keep the free results up-to-date for everybody, so we have to be a little bit careful 😉

jnothman · 2017-07-05T23:10:48Z

well if the people visiting and subscribed to lgtm are core devs of some project, that might affect priority.

…

On 6 Jul 2017 7:57 am, "Bas van Schaik" ***@***.***> wrote: Hehe — point taken, @jnothman <https://github.com/jnothman>. I'll make sure to look into increasing the update frequency soon! We're regularly polling > 50000 GitHub projects to keep the free results up-to-date for everybody, so we have to be a little bit careful 😉 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9278 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yI6stiu3_FmzPiW1IghjsMms0CXks5sLAblgaJpZM4ONuv2> .

jnothman · 2017-07-05T23:12:05Z

How many of those 50k repositories have a commit every week, I'm curious?

jnothman · 2017-07-06T00:20:25Z

Also @sjvs, have you considered supporting Cython? There's a shortage of quality control tools there, though its use is on the rise.

jhelie · 2017-07-07T07:42:26Z

@jnothman we're continuously monitoring language popularity and actively working on supporting more languages on lgtm.com so thanks for letting us know about your interest in Cython. At the moment I don't think it's scheduled just yet and I believe the C languages will come first.

Regarding your question on project activity: we're curious about this kind of questions too and I'm hoping to have time to write up a short analysis of all the projects on lgtm - there's always more data than time!

jhelie · 2017-07-07T07:47:56Z

At the moment it should take < 48 hours for the results of the analysis to appear - and congrats on fixing 2 alert btw :)

jnothman · 2017-07-08T09:11:40Z

@jnothman <https://github.com/jnothman> we're continuously monitoring

language popularity and actively working on supporting more languages on lgtm.com so thanks for letting us know about your interest in Cython. At the moment I don't think it's scheduled just yet and I believe the C languages will come first. The point is that a lot of analysis will be a duplicate of Python. Potentially much lower hanging fruit than C.

…

On 7 July 2017 at 17:47, Jean Helie ***@***.***> wrote: At the moment it should take < 48 hours for the results of the analysis to appear - and congrats on fixing 2 alert btw :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9278 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6ww137c8zzPK56q2coFGZl7HduV0ks5sLeKtgaJpZM4ONuv2> .

* compare values rather than references * add __ne__ to class * add missing self parameter * properly initialise child class using super() * remove useless loop variables and assigments * remove useless return statements * respect PEP8 style * fix another flake8 check

sj · 2017-07-25T11:23:22Z

@jnothman: Cython certainly looks interesting. You're right, it's potentially easier to implement than C. Note that we've had C/C++ analysis support for a long time in our other products, so porting that to lgtm.com is less work than you may think it is.

I'll certainly add Cython to the list!

* compare values rather than references * add __ne__ to class * add missing self parameter * properly initialise child class using super() * remove useless loop variables and assigments * remove useless return statements * respect PEP8 style * fix another flake8 check

jhelie added 6 commits July 4, 2017 18:38

compare values rather than references

10e640b

add __ne__ to class

61c2f2e

add missing self parameter

7bdbc08

properly initialise child class using super()

c118c91

remove useless loop variables and assigments

f0c095c

remove useless return statements

2eaaa3d

Merge branch 'master' into fix-alerts-found-with-lgtm

0ede981

jnothman changed the title ~~Fix alerts found with lgtm~~ [MRG+1] Fix alerts found with lgtm Jul 5, 2017

GaelVaroquaux reviewed Jul 5, 2017

View reviewed changes

jhelie added 2 commits July 5, 2017 07:48

respect PEP8 style

9cee269

Merge branch 'fix-alerts-found-with-lgtm' of https://github.com/jheli…

11108fa

…e/scikit-learn into fix-alerts-found-with-lgtm

fix another flake8 check

80e5ba2

jnothman merged commit ac41281 into scikit-learn:master Jul 5, 2017

This was referenced May 24, 2018

sklearn.tree pydoc help is broken #8108

Closed

[MRG] Fixes sklearn.tree pydoc help #8108 #8111

Closed

Uh oh!

Conversation

jhelie commented Jul 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

GaelVaroquaux Jul 5, 2017

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

jhelie commented Jul 5, 2017

Uh oh!

jhelie commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

jhelie commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

jhelie commented Jul 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

jhelie commented Jul 5, 2017

Uh oh!

amueller commented Jul 5, 2017

Uh oh!

sj commented Jul 5, 2017

Uh oh!

amueller commented Jul 5, 2017

Uh oh!

sj commented Jul 5, 2017

Uh oh!

amueller commented Jul 5, 2017

Uh oh!

sj commented Jul 5, 2017

Uh oh!

jhelie commented Jul 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

sj commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

jnothman commented Jul 6, 2017

Uh oh!

jhelie commented Jul 7, 2017

Uh oh!

jhelie commented Jul 7, 2017

Uh oh!

jnothman commented Jul 8, 2017 via email

Uh oh!

sj commented Jul 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jhelie commented Jul 4, 2017 •

edited

Loading

jhelie commented Jul 5, 2017 •

edited

Loading

jhelie commented Jul 5, 2017 •

edited

Loading