Allow dashes in author name #42#43
Conversation
The new files for test case for 'author name with dashes' caused a change in the overall statistics.
HTML comment end tags contain dashes, and dashes are no longer forbidden characters.
The new files for test case for 'author name in HTML files' caused a change in the overall statistics.
|
Should leading/trailing dashes in names be removed? eg. "--author-name -" to "author-name" |
|
@sebastianquek for your review please. |
|
Thanks @yamgent for your contributions! BackgroundLooking at the various use cases, the author tag can be followed by several variants of author names:
The current implementation allows for spaces in names, and when we allow dashes as well, we will need to handle edge cases such as in point 6 (as @yamgent has found). However, the fix is very specific to Given that we currently restrict author names to only alphanumeric characters and dashes as in the above examples, I propose that we remove the feature that allows for spaces in names. Proposed implementationOne approach is to extract the author's name according to a regex expression. In a very verbose manner:
This ensures that author names that start and/or end with non-alphanumeric characters are illegal (as @riwu has mentioned). Furthermore, alphanumeric characters can be interleaved with Any thoughts/ideas? |
|
I'm ok with not allowing spaces in author name. |
sebastianquek
left a comment
There was a problem hiding this comment.
Remove ability to have whitespace in author names, use regex that accepts and extracts the valid author names.
This streamlines the extraction of the author name from an author tag as special rules (HTML comments) are not required anymore.
This reverts commit de643b6.
Regex now extracts name, instead of looking for forbidden characters.
Now spaces in author's name are forbidden.
src/main/java/backend/Logic.java
Outdated
|
|
||
| private static final String REGEX_NEITHER_ALPHANUMERIC_NOR_SPACE_NOR_DASH = | ||
| "[^ a-zA-Z0-9\\-]+"; | ||
| private static final String REGEX_AUTHOR_NAME_FORMAT = "(^[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]).*"; |
There was a problem hiding this comment.
The .* at the end will match any number of characters that are not line breaks! Do test it with the new test file you created (testFileHtml.html).
|
One last thing, @yamgent could you add test cases for author names that start and/or end with |
|
@sebastianquek Okay sure, am working on it. |
src/main/java/backend/Logic.java
Outdated
|
|
||
| private static final String REGEX_NEITHER_ALPHANUMERIC_NOR_SPACE = | ||
| "[^ a-zA-Z0-9]+"; | ||
| private static final String REGEX_AUTHOR_NAME_FORMAT = "(^[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])"; |
There was a problem hiding this comment.
Is it intended to fail if author string doesn't start with alphanumeric?
Removing '^' from the regex would simply strip leading non-alphanumeric instead of failing.
There was a problem hiding this comment.
You're right, if we allow @@author matric-num----, which results with the author name being matric-num, @@author --matric-num---- and @@author !@%$&*^matric-num likewise should not fail.
|
@sebastianquek I added test cases for the dashes, do take a look. @riwu thanks for your input! |
|
Awesome! 👍 |
|
@sebastianquek Few more test cases added |
|
LGTM 👍 |
Fixes #42.
Ready for review.