Skip to content

Support for google cloud vision 2 hocr by @dinosauria123#28

Merged
stweil merged 7 commits intomasterfrom
gcv
Dec 9, 2017
Merged

Support for google cloud vision 2 hocr by @dinosauria123#28
stweil merged 7 commits intomasterfrom
gcv

Conversation

@kba
Copy link
Copy Markdown
Collaborator

@kba kba commented Sep 11, 2016

Works, but ideally:

  • use upstream repo
  • delete temporary files
  • fall back to max x/y if width height unspecified
  • maybe port to more flexible language, e.g. python

@zuphilip
Copy link
Copy Markdown
Member

I am fine to start with something that works and fine-tuned that later.

To 3: We cannot reconstruct the correct height, width if we only have the information from Google Cloud Vision OCR and therefore any estimation is probable semantically wrong (also it may work on a technical level). However, I suggest to drop these parameters as hard requirements and thereby also simplifying the integration here, see dinosauria123/gcv2hocr#2.

To 4: This also belongs to the upstream repo.

kba added a commit to kba/gcv2hocr that referenced this pull request Sep 12, 2016
README.md Outdated
| From ╲ To | hOCR | ALTO | PAGEXML | FineReader | Plain Text | Google Cloud Vision |
| ---: | --- | --- | --- | --- | --- | --- |
| hOCR | - | ✓ | - | - | ✓ | - |
| ALTO | ✓ | ✓ | - | - | ✓ | - |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check mark for a transformation between ALTO and ALTO seems a typo. I suggest to use the equal sign = on the diagonal.

README.md Outdated
| ALTO | ✓ | ✓ | - | - | ✓ | - |
| PAGE | - | - | - | - | - | - |
| FineReader | - | - | - | - | - | - |
| Google Cloud Vision | ✓ | - | - | ✓ | - | -
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second checkmark looks wrong, there is no transformation from Google Cloud Vision format to the FineReader format.

OUTFILE="$2"
#TODO
WIDTH=2000
HEIGHT=2000
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the TODO here?

@zuphilip
Copy link
Copy Markdown
Member

We should check further the travis problems here and also the Docker image has some build errors on this branch: https://hub.docker.com/r/ubma/ocr-fileformat/builds/

@zuphilip
Copy link
Copy Markdown
Member

I resolved the merge conflict here, this triggered new travis checks which are now passing, but the docker builds still fails:

make -C gcv2hocr

make[2]: Entering directory '/ocr-fileformat/vendor/gcv2hocr'
gcc -std=c99 -o gcv2hocr main.c
make[2]: Leaving directory '/ocr-fileformat/vendor/gcv2hocr'
make[1]: Leaving directory '/ocr-fileformat/vendor'

[91mmake[2]: gcc: Command not found
make[2]: *** [Makefile:9: gcv2hocr] Error 127
make[1]: *** [Makefile:112: gcv2hocr] Error 2
make: *** [Makefile:27: vendor] Error 2 
[0m
Removing intermediate container 417d39381b83

@kba
Copy link
Copy Markdown
Collaborator Author

kba commented Feb 26, 2017

Must add gcc to the apk add line and probably uninstall it afterwards since that's a pretty big package.

@zuphilip
Copy link
Copy Markdown
Member

zuphilip commented Mar 4, 2017

Okay, I tried to add this package but there is now a new error in the docker build:

make -C gcv2hocr
make[2]: Entering directory '/ocr-fileformat/vendor/gcv2hocr' 
gcc -std=c99 -o gcv2hocr main.c 
[91mmain.c:1:19: fatal error: stdio.h: No such file or directory 
 #include <stdio.h> 
                   ^ 
compilation terminated.

Any hint how to fix this?

@zuphilip
Copy link
Copy Markdown
Member

zuphilip commented Mar 4, 2017

Okay, adding libc-dev as well did the trick...

@zuphilip
Copy link
Copy Markdown
Member

zuphilip commented Mar 5, 2017

Okay, this looks now ready to merge from my side.

@stweil stweil merged commit a514e3d into master Dec 9, 2017
@stweil
Copy link
Copy Markdown
Member

stweil commented Dec 9, 2017

I merged it finally. Thank you, @kba and @zuphilip.

@stweil stweil deleted the gcv branch December 9, 2017 11:28
@zuphilip
Copy link
Copy Markdown
Member

zuphilip commented Dec 9, 2017

@stweil Cool! Do you also want to do a new release v0.2.2 now?

@stweil
Copy link
Copy Markdown
Member

stweil commented Dec 9, 2017

What about the remaining pull requests? Yes, it's time for a new release, but before tagging, I'd like to look after a problem report which I got for https://digi.bib.uni-mannheim.de/ocr-fileformat/ (download button not working as expected).

@zuphilip
Copy link
Copy Markdown
Member

zuphilip commented Dec 9, 2017

I don't know the status of the other PR which are just making some branches by @kba visible. Thus, these could take more time to understand and have a plan how to proceed. Fixing an error before releasing sounds certainly like a good idea.

bertsky pushed a commit to bertsky/ocr-fileformat that referenced this pull request Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants