Different problems to test "Text classification with Keras" example

Hello,

I have found three main problems when training and evaluating the Text classifier using Keras example.

1. Saving the 'config.json' file after training in line 222, instead of opening the file with 'wb' it should be just 'w' since the write function in the next line will complain that it cannot handle a 'str' produced by lstm.to_json()

2. For testing (using the is_runtime flag to True) the program crashes when trying to set the weights of the embeddings and lstm in line 33. 

ValueError: Dimension 0 in both shapes must be equal, but are 1070971 and 0. Shapes are [1070971,300] and [0,0]. for 'Assign' (op: 'Assign') with input shapes: [1070971,300], [0,0].

The reason is that during training the model loads the embeddings from 'en_vectors_web_lg' but during test it does not load them because it is using 'en' model instead. Besides, the model is saved using pickle.dump(weights[1:], file_) in line 221 which clearly left the embeddings out. 

My partial solution is to load 'en_vectors_web_lg' inside the load function of the SentimentAnalyser class, get the embeddings and set the weights. But probably saving the embeddings directly with pickle should do the same thing.

3. After that, the system crashes in line 153 when iterating over the data using parallel batches. 
  File "/usr/local/lib/python3.5/dist-packages/spacy/language.py", line 558, in pipe
    for name, proc in self.pipeline:
TypeError: 'Tagger' object is not iterable

So, I solved this issue by not using the create_pipeline function and instead to use:
nlp = spacy.load('en')
nlp.add_pipe(SentimentAnalyser.load(model_dir, nlp, max_length=max_length))

I don't add the tagger and parser since they are already included in 'en'.

however, after all these changes the system prints 0.5 as accuracy and it is taking a long time to present this result (more than 10 minutes) even when I have a GPU.

Please let me know what could be wrong or if it is expected this accuracy for this example.

## Info about spaCy

* **spaCy version:** 2.0.11
* **Platform:** Linux-4.4.0-119-generic-x86_64-with-Ubuntu-16.04-xenial
* **Models:** en, en_vectors_web_lg, en_core_web_lg
* **Python version:** 3.5.3

## Info about models
    Installed models (spaCy v2.0.11)
    /usr/local/lib/python3.5/dist-packages/spacy

    TYPE        NAME                  MODEL                 VERSION                                   
    package     en-core-web-lg        en_core_web_lg        2.0.0    ✔      
    package     en-core-web-sm        en_core_web_sm        2.0.0    ✔      
    package     en-vectors-web-lg     en_vectors_web_lg     2.0.0    ✔      
    link        en                    en_core_web_sm        2.0.0    ✔      
    link        en_core_web_lg        en_core_web_lg        2.0.0    ✔      
    link        en_vectors_web_lg     en_vectors_web_lg     2.0.0    ✔  




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different problems to test "Text classification with Keras" example #2209

Info about spaCy

Info about models

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Different problems to test "Text classification with Keras" example #2209

Description

Info about spaCy

Info about models

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions