Open In App

How to Select Column Values to Display in Pandas Groupby

Last Updated : 11 Jul, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Pandas is a powerful Python library used extensively in data analysis and manipulation. One of its most versatile and widely used functions is groupby, which allows users to group data based on specific criteria and perform various operations on these groups. This article will delve into the details of how to select column values to display in pandas groupby, providing practical examples and technical explanations.

Understanding GroupBy in Pandas

We use groupby() function in Pandas is to split a DataFrame into groups based on some criteria. It can be followed by an aggregation function to perform operations on these groups.

Types of GroupBy Operations:

  • Single Column Grouping: Grouping data based on a single column.
  • Multiple Column Grouping: Grouping data based on multiple columns.

GroupBy Syntax:

The groupby function in Pandas is used to group data and perform operations on these groups.

Syntax:

df.groupby(‘column_name’).operation()

Where,

  • df: The DataFrame to be grouped.
  • 'column_name': The column or columns to group by.
  • operation(): The operation to be applied to each group, such as mean(), sum(), count(), etc.

When using groupby, the default behavior is to return the index values of the groups. However, in many cases, you might want to display values from a specific column instead. This can be achieved by applying a function to the grouped data.

Grouping Data with GroupBy

We can use the groupby function in Pandas to split the data into different groups based on some criteria. For example, we might group data by a single column or multiple columns to analyze subsets of the data independently.

Example 1: Grouping by a Single Column

Python
import pandas as pd

# Sample data
data = {
    'Animal': ['Cheetah', 'Cheetah', 'Lion', 'Lion', 'Tiger', 'Tiger'],
    'Max Speed': [100, 95, 80, 85, 65, 70]
}

df = pd.DataFrame(data)

# Group by 'Animal' and calculate mean speed
mean_speed = df.groupby('Animal').mean()
print(mean_speed)

Output:

         Max Speed
Animal            
Cheetah       97.5
Lion          82.5
Tiger         67.5

Example 2: Grouping by Multiple Columns

We can also group by multiple columns to perform complex data analysis like to group both ‘Animal’ and ‘Max Speed’ columns, and the sum is calculated for each group.

Python
import pandas as pd

# Sample data with multiple columns
data = {
    'Animal': ['Cheetah', 'Cheetah', 'Lion', 'Lion', 'Tiger', 'Tiger'],
    'Max Speed': [100, 95, 80, 85, 65, 70],
    'Color': ['Yellow', 'Yellow', 'Tan', 'Tan', 'Orange', 'Orange']
}

df = pd.DataFrame(data)

# Group by 'Animal' and 'Color', and calculate the sum
grouped = df.groupby(['Animal', 'Color']).sum()
print(grouped)

Output:

                Max Speed
Animal  Color            
Cheetah Yellow        195
Lion    Tan           165
Tiger   Orange        135

Selecting Column Values to Display in Pandas GroupBy

After performing a groupby operation and aggregating the data, if we want to select specific columns to display than we can do so by using double square brackets.

Example 1: Selecting Columns After GroupBy Using Double Brackets

In this we want to display sum for value1 and mean for value2 for category using GroupBy function

Python
import pandas as pd

# Sample data
data = {
    'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Value1': [10, 20, 30, 40, 50, 60],
    'Value2': [100, 200, 300, 400, 500, 600]
}

df = pd.DataFrame(data)

# Group by 'Category'
grouped = df.groupby('Category')

# Aggregate the data
aggregated = grouped.agg({'Value1': 'sum', 'Value2': 'mean'})

# Select specific columns to display
selected_columns = aggregated[['Value1', 'Value2']]

print(selected_columns)

Output:

          Value1  Value2
Category                
A             30   150.0
B             70   350.0
C            110   550.0

Example 2: Selecting Columns from a GroupBy Object

To select columns from a GroupBy object, you can use the reset_index method:

Python
import pandas as pd
df = pd.DataFrame({
    'a': [1, 1, 3],
    'b': [4.0, 5.5, 6.0],
    'c': [7, 8, 9],
    'name': ['hello', 'hello', 'foo']
})

# Group by columns a and name
gb = df.groupby(['a', 'name'])

# Calculate the median
median_result = gb.median().reset_index()

print(median_result)

Output:

   a    name    b    c
0 1 hello 4.75 7.5
1 3 foo 6.00 9.0

Example 3: Iterating Over Groups

To iterate over the groups and access the corresponding sub-DataFrames, you can use a loop:

Python
import pandas as pd

df = pd.DataFrame({
    'A': ['foo', 'bar'] * 3,
    'B': [1, 2, 3, 4, 5, 6],
    'C': [7, 8, 9, 10, 11, 12]
})

# Group by column A
gb = df.groupby('A')

# Iterate over the groups
for name, group in gb:
    print(f"Group: {name}")
    print(group)
    print()

Output:

Group: bar
A B C
1 bar 2 8
3 bar 4 10
5 bar 6 12

Group: foo
A B C
0 foo 1 7
2 foo 3 9
4 foo 5 11

Conclusion

The groupby function in Pandas is a versatile tool for data analysis. It allows you to group data based on one or more columns and perform various operations on these groups. By selecting specific columns, aggregating data, and applying custom functions, you can gain valuable insights from your data. Whether you are working with sales data, student scores, or employee information, the groupby function can help you analyze and understand your data more effectively.



Similar Reads

Pandas - GroupBy One Column and Get Mean, Min, and Max values
We can use Groupby function to split dataframe into groups and apply different operations on it. One of them is Aggregation. Aggregation i.e. computing statistical parameters for each group created example - mean, min, max, or sums. Let's have a look at how we can group a dataframe by one column and get their mean, min, and max values. Example 1: i
2 min read
Pandas GroupBy - Count occurrences in column
Using the size() or count() method with pandas.DataFrame.groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe. However, this operation can also be performed using pandas.Series.value_counts() and, pandas.Index.value_counts(). Approach Import moduleCreate or import data frameApply group
4 min read
Pandas - Groupby multiple values and plotting results
In this article, we will learn how to groupby multiple values and plotting the results in one go. Here, we take "exercise.csv" file of a dataset from seaborn library then formed different groupby data and visualize the result. For this procedure, the steps required are given below : Import libraries for data and its visualization.Create and import
4 min read
How to count unique values in a Pandas Groupby object?
Here, we can count the unique values in Pandas groupby object using different methods. This article depicts how the count of unique values of some attribute in a data frame can be retrieved using Pandas. Method 1: Count unique values using nunique() The Pandas dataframe.nunique() function returns a series with the specified axis's total number of u
3 min read
How to sum negative and positive values using GroupBy in Pandas?
In this article, we will discuss how to calculate the sum of all negative numbers and positive numbers in DataFrame using the GroupBy method in Pandas. To use the groupby() method use the given below syntax. Syntax: df.groupby(column_name) Stepwise Implementation Step 1: Creating lambda functions to calculate positive-sum and negative-sum values. p
3 min read
Groupby without aggregation in Pandas
Pandas is a great python package for manipulating data and some of the tools which we learn as a beginner are an aggregation and group by functions of pandas. Groupby() is a function used to split the data in dataframe into groups based on a given condition. Aggregation on other hand operates on series, data and returns a numerical summary of the d
4 min read
Max and Min date in Pandas GroupBy
Prerequisites: Pandas Pandas GroupBy is very powerful function. This function is capable of splitting a dataset into various groups for analysis. Syntax: dataframe.groupby([column names]) Along with groupby function we can use agg() function of pandas library. Agg() function aggregates the data that is being used for finding minimum value, maximum
1 min read
How to combine Groupby and Multiple Aggregate Functions in Pandas?
Pandas is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing data much easier. It is an open-source library that is built on top of NumPy library. Groupby() Pandas dataframe.groupby() function is used to split the data in dataframe int
1 min read
How to group dataframe rows into list in Pandas Groupby?
Suppose you have a Pandas DataFrame consisting of 2 columns and we want to group these columns. In this article, we will discuss the same. Creating Dataframe to group Dataframe rows into a list C/C++ Code # importing pandas as pd import pandas as pd # Create the data frame df = pd.DataFrame({'column1': ['A', 'B', 'C', 'A', 'C', 'C', 'B', 'D', 'D',
3 min read
Pandas Groupby - Sort within groups
Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like - Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups. In similar ways, we can perform sorting within these groups. Exampl
2 min read