Skip to content

pipe(): ValueError Error parsing doc #429

@blang

Description

@blang

I found strange behaviour using the pipe() method (only verified on german variant):

If you parse a document using pipe() you can get a ValueError, while if i use nlp(text) everything is fine. I boiled it down to single words, while german words work, english words like 'windows' don't work.

Steps to reproduce:

import spacy
nlp = spacy.load('de')
def texts():
    yield "Windows"
for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
    print(len(doc))  # doc access -> ValueError

Trace

ValueError                                Traceback (most recent call last)
<ipython-input-2-9a095ec5505b> in <module>()
      8 def texts():
      9     yield "Windows"
---> 10 for doc in nlp.pipe(texts(), n_threads=16, batch_size=1000):
     11     print(len(doc))

.../venv/lib/python3.4/site-packages/spacy/language.py in pipe(self, texts, tag, parse, entity, n_threads, batch_size)
    254             stream = self.entity.pipe(stream,
    255                 n_threads=1, batch_size=batch_size)
--> 256         for doc in stream:
    257             yield doc
    258 
ValueError: Error parsing doc: Windows

If you use nlp("Windows") it works fine. Also if you execute nlp("Windows") before the same pipe() call, pipe() does not raise an exception (a dictionary is built?)

Versions:

Python 3.4.3 (Problem not related to ipython)
spacy 0.101.0

Maybe this is related to this region syntax/parser.pyx

if not eg.is_valid[guess]:
    # with gil:
    #     move_name = self.moves.move_name(action.move, action.label)
    #     print 'invalid action:', move_name
    return 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBugs and behaviour differing from documentation

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions