RAT-321: text based configuration#157
Conversation
…d UI report configuration into separate class. Moved reporter code into new Reporter class. Removed hard coded license definitions, reworked test for configuration file defined license definitions
Fix not calculations
|
I recognize that this is a massive change and will take some time to process. If you have any questions or want to see extra tests please let me know and I will endeavour to complete them as quickly as possible. It may make sense to create a new branch to put these changes on so that we can get an alpha release out to get some feedback before going all in on the changes. |
|
@Claudenw if I get it correctly there are no functional changes of existing functionality. Thus I'd prefer to merge your changes and prepare a release of RAT in order to collect feedback from existing users. WDYT? |
|
There are differences in the UI for Maven and Ant when not using the
standard definitions.
Preparing a release and collecting feedback seems like a viable path. I
think we need to update documentation, but I think a release candidate is a
good start.
Claude
…On Mon, Oct 16, 2023 at 11:29 AM P. Ottlinger ***@***.***> wrote:
@Claudenw <https://github.com/Claudenw> if I get it correctly there are
no functional changes of existing functionality. Thus I'd prefer to merge
your changes and prepare a release of RAT in order to collect feedback from
existing users. WDYT?
—
Reply to this email directly, view it on GitHub
<#157 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASTVHSEHJGAOTVR77ASA6TX7T5AXAVCNFSM6AAAAAA5W6RTVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRUGA4DCNBWGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
LinkedIn: http://www.linkedin.com/in/claudewarren
|
|
@ottlinger how do we proceed? I do not have access to merge and I am unsure of the process to create a release so I think I need you to guide and/or execute this process. If you need anything from me please let me know. |
Overview
This is a larger change than I had hoped for. However, the change has been minimized as much as possible. The goal of this change is to switch to a text based configuration and in the process simplify the configuration architecture.
A secondary goal was to attempt to align the configuration options so that the same property name is used across all the user interfaces.
Changes from the user perspective.
For many users there are no changes as the original licenses are maintained in the change. For users that define custom licenses there are changes.
For users that define special or custom licenses the easiest solution is to rewrite the custom licenses into a configuration file and include that when running RAT.
Configuration format
Configuration format is defined in XML only in this change. However, future implementations of other formats are possible and anticipated in the code. The default configuration file is located in
/apache-rat-core/src/main/resources/org/apache/rat/default.xmlThe configuration file starts with a
<rat-config>tag and ends with a closing</rat-config>. Within the configuration there are 3 sections:<licenses>- Contains the definition of licenses.<approved>- An optional list of approved licenses. If not specified all licenses in the<licenses>element are assumed to be approved. The licenses in the list may include licenses defined but not approved in other configuration files.<matchers>- Defines matcher builder implementations. Implementations have names likeTextBuilder. The finalBuilderpart of the name is removed and the first part lowercased, becomes the name of the matcher. (i.e. CopyrightBuilder becomes thecopyrightmatcher).License definition
Each license is enclosed in a
<license>tag. The<license>tag has 3 properties:id- The id of the license. Must be unique across all definitions. If more than 5 characters are specified only the first 5 are used, the rest are discarded. This is equivalent to the oldLicenseFamilyCategoryproperty.name- The name of the license. Used in display. This is equivalent to the olddLicenseFamilyNameproperty.derived_from- (optional) Specifies theidof a license from which the current license is derived. Currently this option is unprocessed but in future may be used to accept licenses from which an accepted license is derived.Each license has up to two enclosed tags. The possible enclosed tags are:
<note>- Notes about the license. If multiple<note>tags are specified they are merged into a single note.<matchers>section of the configuration. There may be only one matcher. If more than one matcher is specified the last one is selected.Matcher definition
There are eight (8) matchers defined in the default configuration file. They have varying numbers of parameters and child nodes. ALL matchers have
idproperties. If theidproperty is not specified a default one is generated. The id is used to reference the matcher.Text matcher
The
<text>matcher matches text just like the old text matching did. The text to match can be specified either in atextproperty or simply by enclosing the text in<text>and</text>tags.Regex matcher
The
<regex>matcher uses a regular expression for matching. This is much slower than the text matching above. The expression is specified in aexprproperty on the tag.Spdx matcher
The
<spdx>matcher matches the SPDX tags of the form "SPDX-License-Identifier: ". Thenameattribute of the<spdx>tag specifies the name in the license identifier string.Copyright matcher
The Copyright matcher is a new matcher that matches the tokens "Copyright", "(C)" , "(c)", "©". The token must be followed by a date, two dates separated by a dash, or a copyright holders name. or a combination. The
<copyright>tag has 3 properties, all of which are optional:Copyright matches any either than name first of the date first. if no date or owner is specified it will match the copyright tokens followed by 4 digits. no dates are specified but the owner is then it will match the token followed by the owner name. if date(s) and owner are specified then it will match the token followed by either the owner and then the date(s) or the date(s) and then the owner.
MatcherRef matcher
This is a matcher that references another defined matcher. It has one property
refIdwhich matches theidproperty of another matcher. The referenced matcher is used in place of the MatcherRef.Not matcher
This matcher reverses the meaning of the match. It has no properties and must enclose one and only one other matcher.
Any matcher
This matcher encloses a collection of matchers. For this matcher to match one of the enclosed matchers must be matched.
All matcher
This matcher encloses a collection of matchers. For this matcher to match all of the enclosed matchers must be matched.
Approved section
The
<approved>section specifies which licenses are approved. The<approved>encompasses one or more<family>tags that have alicense_refthat contains theidof the approved license. Licenses defined in other configuration files may be listed for approval. If the approved section is not specified all licenses defined in the file are assumed to be approved.Matchers section
The
<matchers>section of the configuration file registers matcher builders for use in the system. Implementations have names likeTextBuilder. The finalBuilderpart of the name is removed and the first part lowercased, becomes the name of the matcher. (i.e. CopyrightBuilder becomes thecopyrightmatcher). The<matchers>comprises child elements that are of the form<matcher class="org.apache.rat.configuration.builders.AllBuilder" name='somename'/>where theclassattribute specifies the class name of the implementation. If thenameattribute is specified it overrides the name of the matcher.CLI interface
The command line interface adds new options to specify files of configurations to read. Executing the
--helpoption will list all the commands and their options.ANT interface
The Ant interface has changed to utilize the various builders in the system. An example of an ant
build.xmlcan be found at/apache-rat-tasks/src/test/resources/antunit/report-junit.xml.the
<rat:report>tag has several properties to configure the system. In most cases there should be no changes for users. For users that specify custom licenses the<rat:license>tag is the same as the standard configuration file<license>tag.Maven interface
the Maven interface has changed to utilize the various builders in the system. An example of an maven
pom.xmlcan be found at/apache-rat-plugin/src/it/it1/pom.xml. The main difference between the Maven implementation and the configuration file is that items that were properties in the configuration file are specified as enclosed text tags.Changes from developer perspective
Separation of configuration from running report.
The
ReportConfigurationclass now contains all the information necessary to run the report. The code to run the report has been moved fromReporttoReporter.Reportis now limited to implementing the CLI interface and setting the properties in theReportConfigurationproperly.All user interfaces (cli, ANT and Maven) have been modified to set the
ReportConfigurationproperties and call theReporterto execute the report.As part of this separation the MetaData object has been removed from the configuration and user facing code. It remains in the reporting engine where it belongs.
Migration to builders
Multiple builders were created to assist in the creation of
ILicense, andIHeaderMatcherimplementations. The Builders are utilized in the code that configures theReportConfigurationand provide a standard mechanism to ensure that the objects are properly configured across all client interfaces.ILicense and ILicenseFamily
The ILicense has been simplified to comprise and
LicenseFamily, notes, derivedFrom, and a single matcher. TheILicenseBuilderclass implements the builder. The licenseFamilyCategory must be unique across all licenses. Licenses may be sorted by the LicenseFamily. There is aComparator<ILicense>available from theILicenseclass.The
ILicenseFamilynow implementsComparable<ILicenseFamily>. Both theILicenseand theILicenseFamilyare ordered by thelicenseFamilyCategoryproperty.IHeaderMatcher (matchers)
The system provided matchers are implemented by Builders. New matcher types can be created by creating builders that extends
org.apache.rat.configuration.builders.AbstractBuilderor otherwise implementIHeaderMatcher.Builderand produce an IHeaderMatcher that extendsorg.apache.rat.analysis.matchers.AbstractMatcher.Any properly constructed
org.apache.rat.configuration.builders.AbstractBuilderthat is added to the<matchers>section of the configuration file (see above) is automatically registered and available for use in the license definition in the configuration file. Additional steps must be taken to access the new matchers from ANT and Maven.Changes to matching logic
With the addition of the
<not>and<all>matchers it became necessary to provide deeper inspection of the state of the match. Prior to this all matches were so a matcher could returntrueif a match was detected andfalseothewise. However, the code tests the headers line by line so the matcher can not declare that it does not match until the last header line has been processed. To handle this situation, theorg.apache.rat.analysis.Stateenum has been added to the code base. Matchers now returnStatewhen theIHeaderMatcher.matches(String)method is called and thereset()method now resets the current state toi(indeterminate).IHeaderMatcheralso has two new methods:currentState()- returns the current state of the matcher.finalizeState()- sets the current state to eithert(true) orf(false).To simplify the construction of new IHeaderMatcher implementations there is a
org.apache.rat.analysis.AbstractSimpleMatcherthat handles the tracking of the State and assumes thatfinalizeState()will convert aitof. Classes implementing this simply need to implementboolean doMatch(String)to perform the check.The change to matching logic necessitated a change to the matching engine to call the
finalizeState()and switch from handling booleanmatches(string)result toStatehandling.Testing
Significant work has been done to improve testing for the various matchers and other classes that were touched by this change.
JavaDoc
Modified files have had their javadocs updated. New files have javadocs as well.