-
Notifications
You must be signed in to change notification settings - Fork 215
error running csv_example : SystemError: Negative size passed to PyString_FromStringAndSize #54
Copy link
Copy link
Closed
Description
Hello!
I'm new to python and just following the readme as well as an article that referenced dedupeio http://blog.districtdatalabs.com/basics-of-entity-resolution.
I'm unable to complete running csv_example.py due to the ff error:
15/10 positive, 14/10 negative
Do these records refer to the same thing?
(y)es / (n)o / (u)nsure / (f)inished / (p)revious
f
Finished labeling
Traceback (most recent call last):
File "csv_example.py", line 151, in <module>
threshold = deduper.threshold(data_d, recall_weight=1)
File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 237, in threshold
return self.thresholdBlocks(blocked_pairs, recall_weight)
File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 68, in thresholdBlocks
probability = core.scoreDuplicates(self._blockedPairs(blocks),
File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 248, in _blockedPairs
block, blocks = core.peek(blocks)
File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/core.py", line 278, in peek
record = next(records)
File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 281, in _blockData
for block in viewvalues(blocks):
File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/_collections_abc.py", line 693, in __iter__
for key in self._mapping:
File "/Users/mo-manguiat/anaconda/lib/python3.5/shelve.py", line 95, in __iter__
for k in self.dict.keys():
SystemError: Negative size passed to PyBytes_FromStringAndSize
I'm using the ff in a virtual environment:
Python 3.5.2
dedupe 1.6.10
future 0.16.0
Unidecode 0.4.16
numpy 1.12.1
Mac OSX 10.11.4
memory 16 GB 1867 MHz DDR3
free storage space at 40GB (might this be the problem?)
Googling the error led me to a few posts on stackoverflow about perhaps storage or memory limits, but no clear solutions yet. Also, the csv file isn't large so i'm not sure how to proceed.
I also got the same error running it on a python 2.7 virtual environment
any help would be appreciated :) thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels