Skip to content

Dimension inconsistent?  #897

@terrytangyuan

Description

@terrytangyuan

Maybe it's something I missed. I am very new to dask. But here's a reproducible example:

Dataset: dbpedia_csv.tar.gz
https://drive.google.com/folderview?id=0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M

data = dd.read_csv('dbpedia_csv/train.csv', header=None)
data.divisions = tuple(range(1, len(data.divisions) + 1))  # assign divisions other than None
data.count().compute()
0    560000
1    560000
2    560000
dtype: int64

which is consistent with

data[1].count().compute()

560000

But when data.compute()
I got this:

0                                         E. D. Abbott Ltd
1                                           Schwan-Stabilo
2                                               Q-workshop
3                        Marvell Software Solutions Israel
4                              Bergan Mercy Medical Center
5                                       The Unsigned Guide
6                                        Rest of the world
7                                               Globoforce
8                                                Rompetrol
9                                          Wave Accounting
10                                      Angstrem (company)
11                                         I-innovate (UK)
12                                                     JVC
13                                                Toei Bus
14                                       Tear Drop Records
15       Presses polytechniques et universitaires romandes
16                                                Websense
17                                 Adventist Health System
18                                                CIB Bank
19                                    Orfanato Music Group
20                                        SCAN Health Plan
21                                            Rudolf Kämpf
22                                        De Wendel family
23                              Mansfield Building Society
24                                          Witcomb Cycles
25                                     Goldilocks Bakeshop
26                              Guardian Assurance Company
27                                                  Shiply
28                         Orange Music Electronic Company
29                                       Rytec Corporation
                               ...
18629                                                  NME
18630                                         The Londoner
18631                          Formal Aspects of Computing
18632                                            Hayom Yom
18633                        Swords Against the Shadowland
18634              Aruvu Rezuru: Kikaijikake no Yōseitachi
18635                                    Tintin (magazine)
18636                       The Tenth Man (Chayefsky play)
18637                                    Dramatical Murder
18638                 A Dictionary of Modern English Usage
18639                                              Assayad
18640                                     Loveless (manga)
18641                                  The American Hebrew
18642                                  Montanan (magazine)
18643                                    Black Out (novel)
18644                                 Gold Digger (comics)
18645                               Dealer's Choice (play)
18646                                       Phuket Gazette
18647                            If I Forget Thee Oh Earth
18648                        The Lincoln Conspiracy (book)
18649                                   Endocrine Research
18650                             A Thousand Splendid Suns
18651                                       Western People
18652                              Mademoiselle (magazine)
18653                                  Scoliosis (journal)
18654                                     Barking in Essex
18655                                     Science & Spirit
18656                               The Blithedale Romance
18657                                  Razadarit Ayedawbon
18658                             The Vinyl Cafe Notebooks
Name: 1, dtype: object

Why isn't it 560000 but 18658?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions