-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Merge hangs #214
Copy link
Copy link
Closed
Description
Great job on SpaCy, it is really impressive!
I'm pos tagging news articles and merging the tags for the named entities. Many thousands of the articles are fine. Merge hangs up on the text here though.
from __future__ import unicode_literals
import spacy.en
from spacy.en import English
nlp = English()
doc = nlp('text', tag=True, parse=True)
text = '"But there is no telling what the hurricane damage will be," Anderson said, in reference to Hurricane Fran which swept through North and South Carolina last week. Some industry sources said the hurricane may have brought beneficial rains while others predicted losses and damage to open bolls. "Fran changes the picture considerably but it is too soon (to determine losses)," said Jarral Neeper, an analyst with Calcot Ltd. He pegged the estimate at 18.717 million bales. Smith Barney analyst David Brandon compared the effect of Hurricane Fran to that of Hurricane Hugo in 1989, which he said buoyed yields in North and South Carolina.'
doc = nlp(text, tag=True)
for ent in doc.ents:
if len(ent.orth_.split()) > 1:
start = text.index(ent.orth_)
end = start+len(ent.orth_)
print ent.orth_ + ' start: ' + str(start) + ' ' + 'end: ' + str(end) + ' ' + 'entity: ' + ent.label_
doc.merge(start, end, '', '', ent.label_)
for token in doc:
print token.orth_Here's the output I get:
Hurricane Fran start: 92 end: 106 entity: EVENT
South Carolina start: 137 end: 151 entity: GPE
last week start: 152 end: 161 entity: DATE
Jarral Neeper start: 381 end: 394 entity: PERSON
Calcot Ltd. start: 412 end: 423 entity: ORG
18.717 million bales start: 450 end: 470 entity: QUANTITY
Smith Barney start: 472 end: 484 entity: ORG
David Brandon start: 493 end: 506 entity: PERSON
Hurricane Fran start: 92 end: 106 entity: EVENT
Hurricane Hugo start: 556 end: 570 entity: EVENT
North and South Carolina start: 127 end: 151 entity: GPE
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels