Skip to content

Every git scan with --experimental reports skipped files #575

@maciejpirog

Description

@maciejpirog

Problem

It happens even if the root is not a git repo. For example:

$ ls -a code
.        ..       hello.py
$ cat code/hello.py
print('hello')
$ opengrep scan --experimental -c rules.yaml code
...
┌──────────────┐
│ Scan Summary │
└──────────────┘
Some files were skipped or only partially analyzed.
  Scan was limited to files tracked by git.

Ran 1 rule on 1 file: 1 finding.

The --verbose flag does not reveal any concrete files:

$ opengrep scan --experimental -c rules.yaml code --verbose
...
========================================
Files skipped:
========================================

  Always skipped by Opengrep:

   • <none>

  Skipped by .gitignore:
  (Disable by passing --no-git-ignore)

   • <all files not listed by `git ls-files` were skipped>

  Skipped by .semgrepignore:
  (See: https://semgrep.dev/docs/ignoring-files-folders-code/#understand-semgrep-defaults)

   • <none>

  Skipped by --include patterns:

   • <none>

  Skipped by --exclude patterns:

   • <none>

  Skipped by limiting to files smaller than 1000000 bytes:
  (Adjust with the --max-target-bytes flag)

   • <none>

  Partially analyzed due to parsing or internal Opengrep errors

   • <none>

The problem disappears when we don't use gitignore:

$ opengrep scan --experimental -c rules.yaml code --no-git-ignore
...
┌──────────────┐
│ Scan Summary │
└──────────────┘

Ran 1 rule on 1 file: 1 finding.

Reason

In src/osemgrep/cli_scan/Scan_subcommand.ml we call Summary_report.pp_summary given it simply the flag from the config:

          Logs.app (fun m ->
              m "%a"
                (Summary_report.pp_summary
                   ~respect_gitignore:conf.targeting_conf.respect_gitignore
                   ~maturity:conf.common.maturity
                   ~max_target_bytes:conf.targeting_conf.max_target_bytes
                   ~skipped_groups)
                ());

while inn src/osemgrep/reporting/Summary_report.ml we first check if the argument was set to true:

  let out_limited =
    if respect_gitignore then
      Some "Scan was limited to files tracked by git."
    else None

and then print out the message if it was, regardless if any files were actually ignored

  match (out_skipped, out_partial, out_limited, skipped_groups.ignored) with
  | [], None, None, [] -> ()
  | xs, parts, limited, _ignored -> (
      Fmt.pf ppf "Some files were skipped or only partially analyzed.@\n";

Expected behaviour:

Either:

  1. If no other skipped files detected, only print the message "Scan was limited to files tracked by git", but don't print the message "Some files were skipped or only partially analyzed".
  2. As above, but print the message only if the root is a git repo
  3. Check if any files were actually skipped (which is the case in practice, because in most cases we have things like build artifacts in the scanned repo)

EDIT: A scan can have multiple roots, so point 2 needs to take it into account

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions