Skip to content

Conversation

@OmkarPh
Copy link
Collaborator

@OmkarPh OmkarPh commented Jun 4, 2023

Fixes #561

Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
@OmkarPh OmkarPh force-pushed the feature/matched_text branch from 2a70012 to 5c2bca4 Compare June 4, 2023 16:43
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
@OmkarPh
Copy link
Collaborator Author

OmkarPh commented Jun 5, 2023

The matched_text is hidden for scans without license-text

Here's a screenshot of diff
image

OmkarPh added 3 commits June 6, 2023 23:16
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
@AyanSinhaMahapatra
Copy link
Member

I have a couple of suggestions here:

On the window:

  • In the diff modal, show the Matched Coverage instead of the Score
  • I don't get why we are having the texts displayed two times each

On the diff (just putting points from our discussion in the community call here):

  • We should ignore special characters (Unless it's a + after a word, like gpl3+ or similar)
    diff-special-characters
  • We are not matching correctly texts in different lines
    text_lines

Maybe we do some pre-processing before the diff?

Otherwise this is looking great. Thanks++

@AyanSinhaMahapatra
Copy link
Member

A good example with many variations to test this on.s
aws-java-sdk-core-1.12.262-sources.jar-v32rc4results-todo.json.txt

Notice also the matched text sometimes has [] which are diagnostics elements from scancode-toolkit. This would be moved to a different attribute to fix this: aboutcode-org/scancode-toolkit#3426

@OmkarPh
Copy link
Collaborator Author

OmkarPh commented Jun 8, 2023

I have a couple of suggestions here:

On the window:

  • In the diff modal, show the Matched Coverage instead of the Score
  • I don't get why we are having the texts displayed two times each

Actually, I had opened this PR before the call this week, the diff you see below is what we discussed in the call (it has unnecessary diffs like spaces, newlines, etc)

The diff on top is the final diff we will have.
I kept older diff for now to just compare how the updated diff works.

  • We are not matching correctly texts in different lines
    That was the case in old diff in bottom, in newer one, it must be fixed. Yet, there can be unnoticed edgecases :)

Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
@OmkarPh
Copy link
Collaborator Author

OmkarPh commented Jun 11, 2023

hi @AyanSinhaMahapatra
you can test now, i've removed debug setup, and handled some edge cases

Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
@AyanSinhaMahapatra
Copy link
Member

@OmkarPh tested out the diffs, this is looking great! Much more improved from before and works great on most cases!

There are a couple more edge case to handle btw:

  • The case where we have a match to a .LICENSE file instead of a .RULE file. In this case we should fetch the rule text from the specific entry in the license_references list instead of the license_rule_references list. For now this is showing the following:
    license-rule-diff-case

  • Sometimes there is a diff shown on words when there are extra special characters preceding them in the match side, see one example here:
    diff-edge-case-symbols

Testing this out more, but once this is working great on most of the cases, we can merge and track more edge cases separately in the issue.

OmkarPh added 2 commits June 12, 2023 21:38
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
@OmkarPh OmkarPh linked an issue Jun 12, 2023 that may be closed by this pull request
@AyanSinhaMahapatra
Copy link
Member

@OmkarPh

The case where we have a match to a .LICENSE file instead of a .RULE file. In this case we should fetch the rule text from the specific entry in the license_references list instead of the license_rule_references list.

Ignore this part ^. This is being handled correctly. The result JSON did not have any license references data so that's why it showed that.

One other point on that btw:

let's say in the matched text/diff modal when we don't have references data, instead of saying: Couldn't find License Rule Reference for specified identifier - mit_14.RULE we should say something like: Please use --license-references CLI option with your scan to see the rule text, and the diff between rule/matched text. Also use this link: https://scancode-toolkit.readthedocs.io/en/stable/cli-reference/scan-options-post.html#license-references-option

Similarly there are cases where there are no rule texts, as we create synthetic rules for showing matches:
https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/models.py#L2310
These have the following identifier-prefixes:

spdx-license-identifier: matches to SPDX license expressions with a prefix like SPDX-License-Identifier: which signifies that there are SPDX license expressions following these prefixes.
license-detection-unknown: these are license detections which are not matched properly and cannot be concluded to a license-expression.
package-manifest-unknown: These are license statements in package manifests which cannot be matched properly to a license expression.

For these three cases of synthetic rules we can have seperate doc strings which can be shown.

OmkarPh added 3 commits June 16, 2023 21:25
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
@OmkarPh
Copy link
Collaborator Author

OmkarPh commented Jun 17, 2023

I've handled the mentioned cases with relevant messages (let me know if the text needs any change)
Also, fixed the diff errors & some UI changes to handle text-overflow

Signed-off-by: Omkar Phansopkar <omkarphansopkar@gmail.com>
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmkarPh LGTM! thanks++
This works perfectly thanks! I've tested out the new changes and all the issues in the comments above are resolved. This is ready to merge.

@OmkarPh OmkarPh merged commit 53fa89b into v4.0-react-typescript Jun 19, 2023
@AyanSinhaMahapatra AyanSinhaMahapatra deleted the feature/matched_text branch October 25, 2023 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

View the matched text of a license match

2 participants