-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Vectorizing memory issue #6183
Copy link
Copy link
Closed
Description
Hi all
I'm working with a pretty large data set and am having an issue with line 758 of text.py (CountVectorizer code):
indptr.append(len(j_indices))
In my case, the length of j_indices is larger than the maximum signed int. indptr is an int array.
I tried making indptr a long array but that leads to other bigger memory issues.
Any thoughts?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels