Conversation
|
Hi @wallace11 thanks! I'm afraid that this won't pass. The current algorithm first generates a sequence of words that are determined by some "separating characters" like whitespace/punctuation etc. But now the dates are surrounded by text: Edit: the CI complains about formatting, this can be fixed by running |
|
@eikek Regarding spaces, that's the thing - in "normal" Japanese there's no such thing. That's exactly why I wanted to create a proper Japanese tests to see if it catches that. I looked at some of my documents and indeed on some of them you've got the date as part of the first sentence or the title (which is also a sentence). Do you think it'd be possible to fix that? |
|
@wallace11 no worries! (you only would need to install sbt for this) Thanks for your explanation! I just read around wikipedia that there are no spaces in Japanese :) Well, I guess this means doing it completely differently here. If you have some documents you could share, that would help! That way I could run this against some "real" data. I might be able to remove all characters that are not arabic numbers or the letters for year/month/day… maybe this gives some results. |
Not very efficient, but should work to find the position of dates in japanese text.
|
@wallace11 I just pushed a quite crude fix :-). It preprocesses the text and removes all characters that don't take part in a date. Your tests should pass now. You could try this against your documents. I can merge this and some minutes later a nightly version is published. |
|
@eikek |
|
@wallace11 Great 😃 ! Sounds like a weekend 😉 Thank you four your help! |
Hi there,
Here's some more sensible Japanese tests.
I hope that they pass 😆