Skip to content

equality comparison with a scalar is slow for category (performance regression) #23814

@colinfang

Description

@colinfang

Are the following 2 ways to compare a series to a scalar equivalent (ignore missing values)? I have to write the hard way in order to take advantage of the category properties.

```python
x = pd.Series(list('abcd') * 1000000).astype('category')
%timeit x == 'a'
# 10 loops, best of 3: 25.2 ms per loop
%timeit x.cat.codes == x.cat.categories.get_loc('a')
# 1000 loops, best of 3: 750 µs per loop
```

Metadata

Metadata

Assignees

No one assigned

    Labels

    CategoricalCategorical Data TypePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions