Resources - CN Blog

Pandas Groupby Function: Python for Digital Marketing

POST February 18, 2020

By Sarah Burke

If you’ve been following along with our Python for Digital Marketing posts, you’ve imported data from Google Analytics into a Jupyter Notebook and may have even combined it with another dataset. If you’ve never been here in your entire life but are starting at a large dataset, you’re probably still in the right place. So now that we have all this data, how can we break it down to learn more about specific parts of it? Enter the pandas groupby() function!

With groupby(), you can split up your data based on a column or multiple columns. You can then perform aggregate functions on the subsets of data, such as summing or averaging the data, if you choose.

In this article, we’ll cover:

Grouping your data
- By one column
- By multiple columns
Viewing data from a group
Applying aggregate functions

For this example, I’m using in-market audience data paired with age groups from Google Analytics.

As always, I’ll start by importing pandas, reading in my CSV, and checking out the head() and info() on the DataFrame.

Grouping Your Data

This step basically sets you up to perform the aggregate functions on your data. With your end goal in mind, decide if you need to group based on one column or multiple.

Group By One Column

To group your data by a single column, pass the column name into the groupby() function. In this example, I’ll group by the Age column:

Using first(), I can see the first entry in each of my age groups.

Group By Multiple Columns

Instead of just grouping my data by Age or In-Market Segment, I may want to create subsets based on both. All I need to do is pass both of these column names into the groupby() function:

This creates a really nice looking output that shows that data from each age group within an in-market segment.

View Data From a Group

If I want to look at the data from one of my Age groups, I can do so with get_group(). Here I’m looking at all of the values from my 25-34 group:

To do this with a DataFrame that was grouped based on multiple columns, pass values for each column into the get_group() function as a tuple:

Apply Aggregate Functions

Okay, on to the good stuff! We’re going to apply some aggregate functions to our newly formed groups.

*Note: If your numerical values contain commas or percentages, they will read in as objects and you’ll get an error when trying to perform aggregate functions. You’ll need to clean up your data before this step so they are datatypes that you can work with, like integers or floats.

Apply an Aggregate Function Across All Columns

An aggregate function can be applied across all columns by simply adding the function to the DataFrame:

This can be cleaned up a little more by rounding the values to a decimal place that you choose. Here I’m rounding to the nearest hundredth.

Other functions you may want to use like mean(), max(), min(), std(), or count() can be applied in the same way.

Apply Aggregate Functions Based on Columns

Sometimes it won’t make sense to apply the same function across all of your columns. If I wanted to use sum() for example, the output for the Pages / Sessions column won’t make sense since it’s an average. I have two options in scenarios like this.

Option #1: Apply the function to certain columns.

Here, I’m passing in the columns that I want to sum up, which completely leaves the Pages / Session column out.

Option #2: Specify the function that should be used for each column.

Using a dictionary, we can identify which function should be applied to each column.

Apply an Aggregate Function To a Single Group

I can also drill down to a single group. If I want to look at the average number of sessions from 45-54 year olds in the Travel/Vacation Packages segment, I can do so with the following:

This gives me the average for each metric in this group.

This gives me the average for each metric in this group. As you can see, groupby() is a simple but powerful tool in the pandas toolbox. I hope it helps you extract more meaning out of your datasets and saves you from some manual work. Keep calm and group on!

Do you need help with your business's reporting & analytics infrastructure? Explore all of the ways we utilize data to drive success for our clients.

Categories: Data Analysis Python for Digital Marketing

POST February 18, 2020

By Sarah Burke

Related Resources

Blog by: Jack Novorr

6.16.2025

Data Analysis

Is Your Digital Marketing Strategy Putting You at Risk? Understanding CCPA’s New Legal Precedent

Capital One’s privacy lawsuit highlights growing risks for marketers relying on standard tracking technologies. Here's what you need to know.

Blog by: Jack Novorr

5.6.2025

Data Analysis

Using BigQuery to Overcome GA4 Data Retention Limits

Keep your GA4 data forever with BigQuery. Learn how to set up BigQuery to start storing raw GA4 data before it's gone for good.

Blog by: Kristen Nalewajek

9.27.2024

Data Analysis

Why US Businesses Need to Prioritize Data Privacy Now

The U.S. doesn't have a comprehensive national data privacy policy in place, but that doesn't mean businesses aren't being impacted. Learn more about the state-level policies reshaping digital marketing strategy and compliance.

Data - Blog - Google Collab [Background]

Blog by: Kristen Nalewajek

3.21.2024

Data Analysis

How to Get Started Using Python for Data Analysis in Google Colaboratory

Learn how to use the free Google Colab tool and perform data analysis with Python programming language in this tutorial for digital marketers and data analysts.

Blog by: Kristen Nalewajek

2.16.2024

Data Analysis

How to Save Universal Analytics Data

All historical data from Google’s Universal Analytics will be deleted on July 1, 2024. Learn more about what your options are for backing it up before it’s gone for good.

Blog by: Matt Mombrea

5.10.2023

Web Programming & Development

Is Google Analytics 4 a Tactical Move Away From Free Analytics?

There’s something fishy going on with the way that Google is handling GA4. To me, it’s playing out as a backdoor cash grab, hidden under a thin veil of a free and easy migration from UA.

Blog by: Sarah Burke

11.22.2022

Data Analysis

Data Lakes & Data Warehouses: What Are They? (& Why Your Company Probably Needs Both)

Data lakes and data warehouses have gained increased interest from organizations in recent years for their ability to support a single source of truth for data-driven decision-making across various departments. Understanding the strengths and applications of each is important not […]

Blog by: Olive Caresosa

4.21.2022

Conversion Optimization

How to Get Started with GA4: A Step-by-Step Guide

Need help setting up GA4 for your company or client’s website? Look no further! This post provides a step-by-step process for creating GA4 properties and best practices to make sure necessary events are tracked and the data flowing into GA4 are accurate.

Blog by: Sarah Burke

1.13.2022

Data Analysis

How To Change Your Google Analytics Attribution Model in GA4

One of the biggest changes to Google Analytics has arrived in 2022 - the ability to change your Google Analytics attribution models. This is a first for Google Analytics as this attribution model change will not just apply to a […]

Blog by: Greg Finn

12.17.2021

Data Analysis

Why You Should Set Up Google Analytics 4 Today

Let's face it. GA4 isn't GR8. Google Analytics 4 is a work in progress to put it kindly. However, in these final weeks of 2021 you have an opportunity to get GA4 installed and tuned up, giving your future self […]

Blog by: Olive Caresosa

10.25.2021

Data Analysis

What to Include in a PPC Dashboard

Learn What Metrics to Include in PPC Reports. Then, Download Our Free Data Studio Dashboard Template! Let’s be real, pay per click advertising is all about data. What campaigns are bringing in the most revenue? What landing pages are converting […]

Blog by: Greg Finn

5.31.2021

Data Analysis

6 Google Ads Custom Columns to Help Uncover More Data

You may already know you can create custom columns in the Google Ads online interface. But, if you're anything like me, you may not always think about how you can leverage custom columns to surface essential Google Ads performance metrics, […]

Blog by: Greg Finn

10.29.2020

Data Analysis

How To See Audience Performace Across Campaigns With Google Ads Reports

Google Ads makes it really easy to see performance at the campaign or ad group level, but analyzing audience performance across multiple Google Ads campaigns is easier said than done. You're left wondering.... What's working well? What's not? Combining like-minded […]

Blog by: Sarah Burke

10.8.2020

Data Analysis

Install Google Analytics on Web Stories With the Official WordPress Plugin

You read that right - the moment we've all been waiting for is here! Google’s Web Stories plugin is out of beta and now offers the ability to install Google Analytics on Web Stories directly in the plugin. If you […]

cross-domian tracking google tag manager

Blog by: Sarah Burke

9.30.2020

Data Analysis

Cross-Domain Tracking With Google Tag Manager: A Simple Guide

Cross-domain tracking can make your life a lot simpler if you find yourself having to analyze Google Analytics data from two different sites. It allows you to capture the full user journey from the moment they land on one domain […]

google analytics multi-channel report matching

Blog by: Sarah Burke

8.20.2020

Data Analysis

Why Don't Multi-Channel Funnel Reports Match Up With Other Reports in Google Analytics?

Why don't numbers from the multi-channel funnel reports match up with numbers for the same metrics in other Google Analytics reports? The discrepancy is largely due to differences in what Google considers direct traffic. Read our guide to gain a full understanding of attribution differences in Google Analytics reporting.

Blog by: Sarah Burke

5.29.2020

Data Analysis

Exploring a New Dataset With Python Part II: Using Seaborn To Visualize Data

Welcome to Part II of Exploring a new dataset with Python! If you missed Part I: The Basics, you can check it out here. In this article, we’ll be returning to our animal mug company’s dataset to continue our exploratory […]

Blog by: Sarah Burke

4.22.2020

Data Analysis

Exploring a New Dataset With Python: Part I

We’re taking it back to the basics in this article. Why? The day of a Digital Marketer is busy. We’re pulled in all sorts of different directions and are responsible for a lot of different things. In my personal experience, […]

Blog by: Sarah Burke

3.17.2020

Data Analysis

Strip Query Strings From URL Data: Python For Digital Marketing

If you’ve ever spent any time in a Google Analytics account, you’re all too familiar with the fact that the data isn’t always pretty. One exceptionally common scenario that us marketers run into all the time is page data being […]

Blog by: Sarah Burke

1.22.2020

Data Analysis

Using Python To Combine Datasets For Digital Marketing

Working in digital marketing, there are several reasons why you might want to combine two datasets from two different sources together. It could be combining Google Analytics data from two different accounts or properties, or turning two different reports into […]