I’m just getting up to speed on Pandas and cannot resolve one issue. I have a list of Counties in NY State. If the County is one of the 5 boroughs, I want to change the county name to New York, otherwise I leave it alone. The following gives the idea, but is not correct.
EDIT – so if the counties in the County column of the first few rows were Albany, Allegheny, Bronx before the change, they would be Albany, Allegheny, New York after the change
# clean up county names
# 5 boroughs must be combined to New York City
# eliminate the word county
nyCounties = ["Kings", "Queens", "Bronx", "Richmond", "New York"]
nypopdf['County'] = ['New York' for nypopdf['County'] in nyCounties else
nypopdf['County']]
Solution:
A small mockup:
In [44]: c = ['c', 'g']
In [45]: df = pd.DataFrame({'county': list('abccdefggh')})
In [46]: df['county'] = df['county'].where(~df['county'].isin(c), 'N')
In [47]: df
Out[47]: county
0 a
1 b
2 N
3 N
4 d
5 e
6 f
7 N
8 N
9 h
So this is using pd.Series.where ~df['county'].isin(c) selects rows that are not in the list c (the ~ at the start is the ‘not’ operation), the second argument is the value to replace with (when the condition is False).
To fit your example:
nypopdf['County'] = nypopdf['County'].where(~nypopdf['County'].isin(nyCounties), 'New York')
or
nypopdf['County'].where(~nypopdf['County'].isin(nyCounties), 'New York', inplace=True)
Complete example:
nypopdf = pd.DataFrame({'County': ['Albany', 'Allegheny', 'Bronx']})
nyCounties = ["Kings", "Queens", "Bronx", "Richmond", "New York"]
print(nypopdf)
County
0 Albany
1 Allegheny
2 Bronx
nypopdf['County'].where(~nypopdf['County'].isin(nyCounties), 'New York', inplace=True)
print(nypopdf)
County
0 Albany
1 Allegheny
2 New York
