Pandas dataframe allows adding new columns to it based on the values of the existing columns.

Example Scenario

Suppose, a dataset contains Football match results of some matches like below:

DateVenueOpponentGFGA
08-27-2017HArsenal FC40
09-16-2017HBurnley FC11

Here, GF stands for Goals For and GA stands for Goals Against (i.e., number of goals conceded by a team). Now we want to add a new column called Result to this dataframe that will contain the result of the match.

The condition to generate the Result column is:

  • If GF > GA, Result should contain W
  • If GF < GA, Result should contain L
  • Otherwise Result should contain D.

Solution using Dataframe apply method

We can use pandas.DataFrame.apply method to add the new Result column.

import pandas as pd


def get_result(row):
    if row['GF'] == row['GA']:
        return 'D'
    elif row['GF'] > row['GA']:
        return 'W'
    return 'L'


data = pd.read_csv("scores.csv")
data['Result'] = data.apply(lambda row: get_result(row), axis=1)
print(data)

Output:

alt output of pandas apply method on rows

scores.csv:

Date,Venue,Opponent,GF,GA
08-27-2017,H,Arsenal FC,4,0
09-16-2017,H,Burnley FC,1,1

Explanation

We check each row of the existing dataframe. We assign a value to the newly created Result column based on the condition given in the get_result method. axis=1 in the apply method tells of applying the logic in all rows of the dataframe.

Reference

Advertisement

Citation

Click to select citation style

Shovon, A. R. (2022, January 5). Adding a new column in Pandas dataframe based on existing columns. Ahmedur Rahman Shovon. Retrieved December 3, 2024, from https://arshovon.com/blog/pandas-dataframe-add-calculated-column/

Shovon, Ahmedur Rahman. “Adding a new column in Pandas dataframe based on existing columns.” Ahmedur Rahman Shovon, 5 Jan. 2022. Web. 3 Dec. 2024. https://arshovon.com/blog/pandas-dataframe-add-calculated-column/.

@misc{ shovon_2022,
    author = "Shovon, Ahmedur Rahman",
    title = "Adding a new column in Pandas dataframe based on existing columns",
    year = "2022",
    url = "https://arshovon.com/blog/pandas-dataframe-add-calculated-column/",
    note = "[Online; accessed 3-December-2024]"
}
Adding a new column in Pandas dataframe based on existing columns