Skip to content

error running csv_example : SystemError: Negative size passed to PyString_FromStringAndSize #54

@motrippy

Description

@motrippy

Hello!

I'm new to python and just following the readme as well as an article that referenced dedupeio http://blog.districtdatalabs.com/basics-of-entity-resolution.

I'm unable to complete running csv_example.py due to the ff error:

15/10 positive, 14/10 negative
Do these records refer to the same thing?
(y)es / (n)o / (u)nsure / (f)inished / (p)revious
f
Finished labeling
Traceback (most recent call last):
  File "csv_example.py", line 151, in <module>
    threshold = deduper.threshold(data_d, recall_weight=1)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 237, in threshold
    return self.thresholdBlocks(blocked_pairs, recall_weight)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 68, in thresholdBlocks
    probability = core.scoreDuplicates(self._blockedPairs(blocks),
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 248, in _blockedPairs
    block, blocks = core.peek(blocks)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/core.py", line 278, in peek
    record = next(records)
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/site-packages/dedupe/api.py", line 281, in _blockData
    for block in viewvalues(blocks):
  File "/Users/mo-manguiat/Projects/dedupetest/env/lib/python3.5/_collections_abc.py", line 693, in __iter__
    for key in self._mapping:
  File "/Users/mo-manguiat/anaconda/lib/python3.5/shelve.py", line 95, in __iter__
    for k in self.dict.keys(): 
SystemError: Negative size passed to PyBytes_FromStringAndSize

I'm using the ff in a virtual environment:
Python 3.5.2
dedupe 1.6.10
future 0.16.0
Unidecode 0.4.16
numpy 1.12.1

Mac OSX 10.11.4
memory 16 GB 1867 MHz DDR3
free storage space at 40GB (might this be the problem?)

Googling the error led me to a few posts on stackoverflow about perhaps storage or memory limits, but no clear solutions yet. Also, the csv file isn't large so i'm not sure how to proceed.

I also got the same error running it on a python 2.7 virtual environment

any help would be appreciated :) thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions