Exception: ValueError('buffer source array is read-only')

(Comes from https://github.com/dask/distributed/issues/1978#issuecomment-645869748)

**What happened**:

```
$ ipython
Python 3.8.3 (default, May 20 2020, 12:50:54) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: # coding: utf-8 
   ...: from dask import dataframe as dd 
   ...: import pandas as pd 
   ...: from distributed import Client 
   ...: client = Client() 
   ...: df = dd.read_csv("../data/yellow_tripdata_2019-*.csv", parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"]) 
   ...: payment_types = { 
   ...:     1: "Credit Card", 
   ...:     2: "Cash", 
   ...:     3: "No Charge", 
   ...:     4: "Dispute", 
   ...:     5: "Unknown", 
   ...:     6: "Voided trip" 
   ...: } 
   ...: payment_names = pd.Series( 
   ...:     payment_types, name="payment_name" 
   ...: ).to_frame() 
   ...: df2 = df.merge( 
   ...:     payment_names, left_on="payment_type", right_index=True 
   ...: ) 
   ...: op = df2.groupby("payment_name")["tip_amount"].mean() 
   ...: client.compute(op) 
   ...:                                                                                                                                                                                                                                       
Out[1]: <Future: pending, key: finalize-85edcc1f23785545f628c932abd19768>

In [2]: distributed.worker - WARNING -  Compute Failed                                                                                                                                                                                        
Function:  _apply_chunk
args:      (        VendorID tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  trip_distance  RatecodeID store_and_fwd_flag  ...  mta_tax  tip_amount  tolls_amount  improvement_surcharge  total_amount  congestion_surcharge  payment_name
0              1  2019-01-04 14:08:46   2019-01-04 14:18:10                1           1.70           1                  N  ...      0.5         0.0          0.00                    0.3          9.30                   NaN          Cash
1              1  2019-01-04 14:20:33   2019-01-04 14:25:10                1           0.90           1                  N  ...      0.5         0.0          0.00                    0.3          6.30                   NaN          Cash
13             2  2019-01-04 14:14:45   2019-01-04 14:26:00                5           1.63           1                  N  ...      0.5         0.0          0.00                    0.3          9.80                   NaN          Cash
15             2  2019-01-04 14:49:45   2019-01-04 15:0
kwargs:    {'chunk': <methodcaller: sum>, 'columns': 'tip_amount'}
Exception: ValueError('buffer source array is read-only')

In [2]:                                                                                                                                                                                                                                       

In [2]: client                                                                                                                                                                                                                                
Out[2]: <Client: 'tcp://127.0.0.1:33689' processes=4 threads=4, memory=16.70 GB>

In [3]: _1                                                                                                                                                                                                                                    
Out[3]: <Future: error, key: finalize-85edcc1f23785545f628c932abd19768>
```

**What you expected to happen**: The operation finishes without error.

**Minimal Complete Verifiable Example**:

```python
# coding: utf-8
from dask import dataframe as dd
import pandas as pd
from distributed import Client
client = Client()
df = dd.read_csv("../data/yellow_tripdata_2019-*.csv", parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"])
payment_types = {
    1: "Credit Card",
    2: "Cash",
    3: "No Charge",
    4: "Dispute",
    5: "Unknown",
    6: "Voided trip"
}
payment_names = pd.Series(
    payment_types, name="payment_name"
).to_frame()
df2 = df.merge(
    payment_names, left_on="payment_type", right_index=True
)
op = df2.groupby("payment_name")["tip_amount"].mean()
client.compute(op)
```

Data: 

```
https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-01.csv
https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-02.csv
https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-03.csv
https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-04.csv
```

**Anything else we need to know?**: I managed to avoid this error by reducing the number of files, but then it hit me again at a later point. I expect this behavior to be dependent on the available RAM.

**Environment**:

- Dask version: 2.18.1
- Python version: 3.8.3
- Operating System: Linux Mint 19.3
- Install method (conda, pip, source): pip


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exception: ValueError('buffer source array is read-only') #3943

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Exception: ValueError('buffer source array is read-only') #3943

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions