Skip to content

Too many labels result in a crash #2800

@jsthivierge

Description

@jsthivierge

Hi, I'm currently trying to train a custom model with over 125 labels and I encounter the following error:

Windows 10

Process finished with exit code -1073740791 (0xC0000409)

Ubuntu 18.04

*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)

There seems to be a limit. Under 125 labels it works and over it, it crashes.

How to reproduce the behaviour

def __train_model(self, train_data, entity_types):
    nlp = spacy.blank("en")

    ner = nlp.create_pipe("ner")
    nlp.add_pipe(ner)

    for entity_type in list(entity_types):
        ner.add_label(entity_type)

    optimizer = nlp.begin_training()

    # Start training
    for i in range(20):
        losses = {}
        index = 0
        random.shuffle(train_data)

        for statement, entities in train_data:
            nlp.update([statement], [entities], sgd=optimizer, losses=losses, drop=0.5)

    return nlp

Unit Test

    def test_train_with_max_supported_entity_types(self):
        train_data = TrainData()
        train_data.extend([("One sentence", {"entities": []})])
        entity_types = {i for i in range(125)}

        model = self.train_model_processor.train(train_data, entity_types)

        assert_is_not_none(model)

So in the unit test whenever entity_types length is beyond 125, it crashes.

Your Environment

  • spaCy version: 2.0.12

  • Platform: Windows-10-10.0.16299-SP0

  • Python version: 3.7.0

  • Environment Information:
    16gb RAM, CPU: i7-3630QM

Any idea if there is a limit of labels ? If so, should it return an error message describing the error instead of crashing ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBugs and behaviour differing from documentationfeat / nerFeature: Named Entity RecognizertrainingTraining and updating models

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions