error running csv_example : SystemError: Negative size passed to PyString_FromStringAndSize

Hello!

I'm new to python and just following the readme as well as an article that referenced dedupeio http://blog.districtdatalabs.com/basics-of-entity-resolution.

I'm unable to complete running csv_example.py due to the ff error:

```
15/10 positive, 14/10 negative
Do these records refer to the same thing?
(y)es / (n)o / (u)nsure / (f)inished / (p)revious
f
Finished labeling
Traceback (most recent call last):
  File "csv_example.py", line 151, in <module>
    threshold = deduper.threshold(data_d, recall_weight=1)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 237, in threshold
    return self.thresholdBlocks(blocked_pairs, recall_weight)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 68, in thresholdBlocks
    probability = core.scoreDuplicates(self._blockedPairs(blocks),
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 248, in _blockedPairs
    block, blocks = core.peek(blocks)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/core.py", line 278, in peek
    record = next(records)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 281, in _blockData
    for block in viewvalues(blocks):
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/_collections_abc.py", line 693, in __iter__
    for key in self._mapping:
  File "/Users/mo-manguiat/anaconda/lib/python3.5/shelve.py", line 95, in __iter__
    for k in self.dict.keys(): 
SystemError: Negative size passed to PyBytes_FromStringAndSize
```

I'm using the ff in a virtual environment:
Python 3.5.2
dedupe  1.6.10
future 0.16.0
Unidecode 0.4.16
numpy 1.12.1

Mac OSX 10.11.4
memory 16 GB 1867 MHz DDR3
free storage space at 40GB (might this be the problem?)

Googling the error led me to a few posts on stackoverflow about perhaps storage or memory limits, but no clear solutions yet.  Also, the csv file isn't large so i'm not sure how to proceed.

I also got the same error running it on a python 2.7 virtual environment

any help would be appreciated :) thanks!





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error running csv_example : SystemError: Negative size passed to PyString_FromStringAndSize #54

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

error running csv_example : SystemError: Negative size passed to PyString_FromStringAndSize #54

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions