Conversation
|
I am fine to start with something that works and fine-tuned that later. To 3: We cannot reconstruct the correct height, width if we only have the information from Google Cloud Vision OCR and therefore any estimation is probable semantically wrong (also it may work on a technical level). However, I suggest to drop these parameters as hard requirements and thereby also simplifying the integration here, see dinosauria123/gcv2hocr#2. To 4: This also belongs to the upstream repo. |
README.md
Outdated
| | From ╲ To | hOCR | ALTO | PAGEXML | FineReader | Plain Text | Google Cloud Vision | | ||
| | ---: | --- | --- | --- | --- | --- | --- | | ||
| | hOCR | - | ✓ | - | - | ✓ | - | | ||
| | ALTO | ✓ | ✓ | - | - | ✓ | - | |
There was a problem hiding this comment.
The check mark for a transformation between ALTO and ALTO seems a typo. I suggest to use the equal sign = on the diagonal.
README.md
Outdated
| | ALTO | ✓ | ✓ | - | - | ✓ | - | | ||
| | PAGE | - | - | - | - | - | - | | ||
| | FineReader | - | - | - | - | - | - | | ||
| | Google Cloud Vision | ✓ | - | - | ✓ | - | - |
There was a problem hiding this comment.
The second checkmark looks wrong, there is no transformation from Google Cloud Vision format to the FineReader format.
| OUTFILE="$2" | ||
| #TODO | ||
| WIDTH=2000 | ||
| HEIGHT=2000 |
|
We should check further the travis problems here and also the Docker image has some build errors on this branch: https://hub.docker.com/r/ubma/ocr-fileformat/builds/ |
|
I resolved the merge conflict here, this triggered new travis checks which are now passing, but the docker builds still fails: |
|
Must add |
|
Okay, I tried to add this package but there is now a new error in the docker build: Any hint how to fix this? |
|
Okay, adding libc-dev as well did the trick... |
|
Okay, this looks now ready to merge from my side. |
|
@stweil Cool! Do you also want to do a new release |
|
What about the remaining pull requests? Yes, it's time for a new release, but before tagging, I'd like to look after a problem report which I got for https://digi.bib.uni-mannheim.de/ocr-fileformat/ (download button not working as expected). |
|
I don't know the status of the other PR which are just making some branches by @kba visible. Thus, these could take more time to understand and have a plan how to proceed. Fixing an error before releasing sounds certainly like a good idea. |
Works, but ideally: