-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Maybe it's something I missed. I am very new to dask. But here's a reproducible example:
Dataset: dbpedia_csv.tar.gz
https://drive.google.com/folderview?id=0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M
data = dd.read_csv('dbpedia_csv/train.csv', header=None)
data.divisions = tuple(range(1, len(data.divisions) + 1)) # assign divisions other than None
data.count().compute()
0 560000
1 560000
2 560000
dtype: int64
which is consistent with
data[1].count().compute()
560000
But when data.compute()
I got this:
0 E. D. Abbott Ltd
1 Schwan-Stabilo
2 Q-workshop
3 Marvell Software Solutions Israel
4 Bergan Mercy Medical Center
5 The Unsigned Guide
6 Rest of the world
7 Globoforce
8 Rompetrol
9 Wave Accounting
10 Angstrem (company)
11 I-innovate (UK)
12 JVC
13 Toei Bus
14 Tear Drop Records
15 Presses polytechniques et universitaires romandes
16 Websense
17 Adventist Health System
18 CIB Bank
19 Orfanato Music Group
20 SCAN Health Plan
21 Rudolf Kämpf
22 De Wendel family
23 Mansfield Building Society
24 Witcomb Cycles
25 Goldilocks Bakeshop
26 Guardian Assurance Company
27 Shiply
28 Orange Music Electronic Company
29 Rytec Corporation
...
18629 NME
18630 The Londoner
18631 Formal Aspects of Computing
18632 Hayom Yom
18633 Swords Against the Shadowland
18634 Aruvu Rezuru: Kikaijikake no Yōseitachi
18635 Tintin (magazine)
18636 The Tenth Man (Chayefsky play)
18637 Dramatical Murder
18638 A Dictionary of Modern English Usage
18639 Assayad
18640 Loveless (manga)
18641 The American Hebrew
18642 Montanan (magazine)
18643 Black Out (novel)
18644 Gold Digger (comics)
18645 Dealer's Choice (play)
18646 Phuket Gazette
18647 If I Forget Thee Oh Earth
18648 The Lincoln Conspiracy (book)
18649 Endocrine Research
18650 A Thousand Splendid Suns
18651 Western People
18652 Mademoiselle (magazine)
18653 Scoliosis (journal)
18654 Barking in Essex
18655 Science & Spirit
18656 The Blithedale Romance
18657 Razadarit Ayedawbon
18658 The Vinyl Cafe Notebooks
Name: 1, dtype: object
Why isn't it 560000 but 18658?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels