I have a dataframe where one column has strings that sometimes contain a word and parentheses around the value I want to keep. How do I do remove them? Here’s what I have:
import pandas as pd
df = pd.read_csv("Espacios_@cronista.csv")
del df['Espacio']
df[df['Tamano'].str.contains("Variable")]
Output I have:
Tamano Subastas Imp Fill_rate
0 Variable (300x600) 43 13 5.99
1 Variable (266x600) 43 5 4.44
2 266x600 43 5 4.44
Output I need:
Tamano Subastas Imp Fill_rate
0 300x600 43 13 5.99
1 266x600 43 5 4.44
2 266x600 43 5 4.44
Solution:
This is a good use case for pd.Series.str.extract
pipelined
Meaning, assign creates a copy. You can use fillna to fill in spots that became NaN.
pat = 'Variable\s*\((.*)\)'
df.assign(Tamano=df.Tamano.str.extract(pat, expand=False).fillna(df.Tamano))
Tamano Subastas Imp Fill_rate
0 300x600 43 13 5.99
1 266x600 43 5 4.44
2 266x600 43 5 4.44
in place
Meaning we alter df
pat = 'Variable\s*\((.*)\)'
df.update(df.Tamano.str.extract(pat, expand=False))
df
Tamano Subastas Imp Fill_rate
0 300x600 43 13 5.99
1 266x600 43 5 4.44
2 266x600 43 5 4.44
