Skip to content

Set dynamic memory limit in pdfalto_server#1038

Merged
kermitt2 merged 2 commits intomasterfrom
fix-1036
Aug 8, 2023
Merged

Set dynamic memory limit in pdfalto_server#1038
kermitt2 merged 2 commits intomasterfrom
fix-1036

Conversation

@kermitt2
Copy link
Copy Markdown
Collaborator

@kermitt2 kermitt2 commented Aug 6, 2023

This is a fix for #1036

The memory limit for the pdfalto subprocess was only set for batch mode, not for server mode.
This PR pass the memory limit defined in the Grobid configuration file to the pdfalto_server script, which set the ulimit before creating the pdfalto subprocess.

Tested successfully with the java server.

todo: to be tested with Docker.

@coveralls
Copy link
Copy Markdown

coveralls commented Aug 6, 2023

Coverage Status

coverage: 39.98% (+0.003%) from 39.977% when pulling b6675db on fix-1036 into bbf1e85 on master.

@kermitt2
Copy link
Copy Markdown
Collaborator Author

kermitt2 commented Aug 6, 2023

Tested successfully with docker too. The memory limit is used by the docker service and a pdfalto subprocess for a large PDF is killed as expected.

@kermitt2
Copy link
Copy Markdown
Collaborator Author

kermitt2 commented Aug 6, 2023

@lfoppiano I made the change to the pdfalto_server script for macos too, but I can't test it. Could you check maybe?

@lfoppiano
Copy link
Copy Markdown
Member

lfoppiano commented Aug 8, 2023

I tested on M1, it seems to work by setting 6M as maximum memory for pdfalto.

Aug 08 09:00:59 falcon docker[26816]: ERROR [2023-08-08 00:00:59,218] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 139. [/opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, /opt/grobid/grobid-home/tmp/origin12466301919498765093.pdf, /opt/grobid/grobid-home/tmp/ahxv9U7ABP.lxml, --timeout, 60, --ulimit, 6144]
Aug 08 09:00:59 falcon docker[26816]: ERROR [2023-08-08 00:00:59,218] org.grobid.core.process.ProcessPdfToXml: pdfalto return message:
Aug 08 09:00:59 falcon docker[26816]: ERROR [2023-08-08 00:00:59,219] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs.
Aug 08 09:00:59 falcon docker[26816]: ! org.grobid.core.exceptions.GrobidException: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 139
[...]

Does this correspond to the expected behaviour?

@kermitt2
Copy link
Copy Markdown
Collaborator Author

kermitt2 commented Aug 8, 2023

Does this correspond to the expected behaviour?

Thank you ! yes this is the expected failing exception when subprocess is OOM and killed

@kermitt2 kermitt2 merged commit 4ecce3b into master Aug 8, 2023
@lfoppiano
Copy link
Copy Markdown
Member

I just realised that I did not really test on the M1 macos but on Linux 😅
On the M1 the script was not modified so no PDF could be processed. I fixed it in 2c720dd.

However, the ulimit is commented because it fails to set the limit:

WARN  [2023-08-15 01:47:02,132] org.grobid.core.process.ProcessPdfToXml: pdfalto stderr: /Users/lfoppiano/development/projects/grobid/grobid-home/pdfalto/mac_arm-64/pdfalto_server: line 38: ulimit: virtual memory: cannot modify limit: Invalid argument

@kermitt2
Copy link
Copy Markdown
Collaborator Author

Ha yes I modified only the old mac ones grobid-home/pdfalto/mac-64/pdfalto_server.

Maybe changing the ulimit on latest macos requires sudo?

@lfoppiano
Copy link
Copy Markdown
Member

I need to investigate more, however, for mac on intel works without issue. Just tested.

@lfoppiano lfoppiano added this to the 0.8.0 milestone Jun 9, 2024
@lfoppiano lfoppiano deleted the fix-1036 branch March 21, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants