-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Memory Corruption Error: "Process finished with exit code -1073741819 (0xC0000005)" #799
Description
Dear spacy-team, this might be related to #786
I tried to train a entity recognizer using spacy's entity recognizer class following this example script
https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py
However, in my adapted version which runs on a bigger set of example data python breaks frequently with the exit code -1073741819 (0xC0000005) which some research suggest to stand for memory corruption. I think the error occurs in the update function of the entity recognizer https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py#L32
This bug is extremely difficult to track down. I spend 4 days trying to reduce my case into a minimal working example, however it doesn't seem to work deterministically. I rather have the impression that certain things make it more likely that the bug occurs. I try to list all things which seems to have an influence on this:
- The biggest impact seems to have whether you use
nlp = spacy.load('de', parser=False, entity=False, add_vectors=False)ornlp = spacy.load('de')for creatingEntityRecognizer(nlp.vocab, entity_types=entity_types). The keyword arguments seem to be crucial. It is impressively more likely that my code breaks if I skip them. - Having new unseen vocabulary in the example data seemed to increase the probability of a crash
- Having long sentences as example data seemed to increase the probability of a crash
I cannot be certain about all these points, because there was no deterministic pattern at all to be find (at least I couldn't find it yet), and because I had not enough time to do some large statistics on the crash counts.
Still, I hope you might be able to find this weird bug.
Your Environment
- Operating System: Windows 10
- Python Version Used: 3.5
- spaCy Version Used: 1.6
- Environment Information: 32 gb RAM (practically no memory limits)