OPENNLP-1782: Add tagging examples to verify French POS model#863
OPENNLP-1782: Add tagging examples to verify French POS model#863mawiesne merged 6 commits intoapache:mainfrom
Conversation
|
Thx @meriam2303 for the PR. Let's see if it passes the tests. |
|
@meriam2303 It seems there is a syntax error:
Could you check, correct it and push a fix to the same branch? Please also add a new static constant to that test:
See other constants close to the class definition (POLISH, GERMAN, ENGLISH...) |
|
Hi, There are still some indentation errors for the French data. |
There was a problem hiding this comment.
Fixed checkstyle but currently fails due to missing model (?)
Error: Tests run: 8, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.277 s <<< FAILURE! -- in opennlp.tools.postag.POSTaggerMEIT
Error: opennlp.tools.postag.POSTaggerMEIT.testPOSTagger(String, int, String, String[])[8] -- Time elapsed: 0.005 s <<< ERROR!
java.lang.NullPointerException: Cannot invoke "opennlp.tools.tokenize.Tokenizer.tokenize(String)" because the return value of "java.util.Map.get(Object)" is null
at opennlp.tools.postag.POSTaggerMEIT.testPOSTagger(POSTaggerMEIT.java:66)
|
Fixed the test setup. Now we have which should be AUX according to the provided reference. Think it is an edge case here: Actually, faisait is the imperfect of “faire”. Here it functions as a semi-auxiliary in faisait souffrir. Tagging it as AUX is acceptable in Universal Dependencies because “faire” + infinitive is considered an auxiliary construction. (not a native French speaker though). Tried some other (online) taggers, which will label |
There was a problem hiding this comment.
according to the dictionary of the french academy AUX is “a verb that functions as a grammatical tool used to build the compound tenses of other verbs.” and faire is considered a semi-auxiliare. Because this option is not available and because "faire" doesn't build a compound tense of the verb "souffrir" it is safer and more correct to consider faire a verb NOT an auxiliare.
There was a problem hiding this comment.
Note: current CI failure is unrelated to this change (403, sourceforge)
|
thx @meriam2303 for your first time open-source contribution. All checks passed which is why I've merged the PR. |
* adds French sample sentence and pos tags, incl. arabic+maghrebi stub examples for existing tests * adds French constant * inits French resources for test context --------- Co-authored-by: Richard Zowalla <rzo1@apache.org> (cherry picked from commit 237a771)
French Model draft to test + Arabic & Maghrebi commented out
For documentation related changes: