Pandas dataframe allows adding new columns to it based on the values of the existing columns.
Example Scenario
Suppose, a dataset contains Football match results of some matches like below:
Date | Venue | Opponent | GF | GA |
---|---|---|---|---|
08-27-2017 | H | Arsenal FC | 4 | 0 |
09-16-2017 | H | Burnley FC | 1 | 1 |
Here, GF stands for Goals For and GA stands for Goals Against (i.e., number of goals conceded by a team).
Now we want to add a new column called Result
to this dataframe that will contain the result of the match.
The condition to generate the Result
column is:
- If
GF
>GA
,Result
should containW
- If
GF
<GA
,Result
should containL
- Otherwise
Result
should containD
.
Solution using Dataframe apply method
We can use pandas.DataFrame.apply
method to add the new Result
column.
import pandas as pd
def get_result(row):
if row['GF'] == row['GA']:
return 'D'
elif row['GF'] > row['GA']:
return 'W'
return 'L'
data = pd.read_csv("scores.csv")
data['Result'] = data.apply(lambda row: get_result(row), axis=1)
print(data)
Output:
scores.csv
:
Date,Venue,Opponent,GF,GA
08-27-2017,H,Arsenal FC,4,0
09-16-2017,H,Burnley FC,1,1
Explanation
We check each row of the existing dataframe.
We assign a value to the newly created Result
column based on the condition given in the get_result
method.
axis=1
in the apply
method tells of applying the logic in all rows of the dataframe.
Reference
Advertisement