Skip to content

Memory Corruption Error: "Process finished with exit code -1073741819 (0xC0000005)" #799

@schlichtanders

Description

@schlichtanders

Dear spacy-team, this might be related to #786

I tried to train a entity recognizer using spacy's entity recognizer class following this example script
https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py
However, in my adapted version which runs on a bigger set of example data python breaks frequently with the exit code -1073741819 (0xC0000005) which some research suggest to stand for memory corruption. I think the error occurs in the update function of the entity recognizer https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py#L32

This bug is extremely difficult to track down. I spend 4 days trying to reduce my case into a minimal working example, however it doesn't seem to work deterministically. I rather have the impression that certain things make it more likely that the bug occurs. I try to list all things which seems to have an influence on this:

  1. The biggest impact seems to have whether you use nlp = spacy.load('de', parser=False, entity=False, add_vectors=False) or nlp = spacy.load('de') for creating EntityRecognizer(nlp.vocab, entity_types=entity_types). The keyword arguments seem to be crucial. It is impressively more likely that my code breaks if I skip them.
  2. Having new unseen vocabulary in the example data seemed to increase the probability of a crash
  3. Having long sentences as example data seemed to increase the probability of a crash

I cannot be certain about all these points, because there was no deterministic pattern at all to be find (at least I couldn't find it yet), and because I had not enough time to do some large statistics on the crash counts.

Still, I hope you might be able to find this weird bug.

Your Environment

  • Operating System: Windows 10
  • Python Version Used: 3.5
  • spaCy Version Used: 1.6
  • Environment Information: 32 gb RAM (practically no memory limits)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBugs and behaviour differing from documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions