How to build a cohort analysis in Google Sheets

TL;DR

You can build a fully functional monthly cohort retention table in Google Sheets using COUNTIFS, conditional formatting, and a clean export from your subscription system. The whole setup takes two to three hours if your data is already in reasonable shape. You need a Google account (free), your user signup and activity data in CSV format, and basic spreadsheet skills.

What You Need Before You Start

  • A Google account with access to Google Sheets (free tier works fine)
  • A CSV export of your user data with at least two columns: a unique user ID and a signup date
  • A second export showing user activity or payment events, with user ID and event date columns
  • At least three months of data for the patterns to be meaningful
  • Optional: Stripe, ChargeBee, or your billing system’s built-in export to pull subscription events cleanly
  • Optional: Mixpanel or Amplitude to cross-check your numbers once you have a baseline
  • Basic familiarity with absolute cell references (the $ sign in $A$1)

Step 1: Export Your Raw Data and Create Staging Tabs

Open a new Google Sheets file and name it something like Cohort Analysis - ProductName - May 2026. Create two tabs at the bottom: raw_users and raw_events.

In raw_users, paste or import your signup data. You need at minimum three columns:

user_id signup_date plan
u_1001 2025-01-14 starter
u_1002 2025-01-22 pro

In raw_events, paste your activity or payment events. Each row is one event:

user_id event_date event_type
u_1001 2025-02-03 payment
u_1001 2025-03-01 payment

Keep dates in ISO format (YYYY-MM-DD). Google Sheets handles these cleanly and COUNTIFS treats them as real dates rather than text strings, which matters a lot for the formulas ahead.

You should now see two clean flat tables with no merged cells, no summary rows, and date columns that appear left-aligned in the cell (which confirms they are stored as dates, not text).


Step 2: Create a Cohort Assignment Column

In raw_users, add a column C with the header cohort_month. In C2, enter this formula and drag it down the entire column:

=TEXT(B2,"YYYY-MM")

This converts 2025-01-14 into 2025-01, which becomes your cohort label. Using TEXT keeps it as a plain string so it sorts correctly when used as a row identifier in your matrix.

If your billing cycle matters more than calendar month, you can use EOMONTH to snap to the last day of the signup month:

=EOMONTH(B2,0)

Format that column as YYYY-MM via Format > Number > Custom date and time. For most SaaS retention analyses, the TEXT approach is simpler and produces fewer surprises.

You should now see a new column with clean monthly labels like 2025-01, 2025-02, and so on for every user row.


Step 3: Add an Event Month Column to raw_events

Go to your raw_events tab. Add a column C called event_month using the same TEXT approach:

=TEXT(B2,"YYYY-MM")

Drag it down to cover all event rows. This helper column is what makes your COUNTIFS formulas readable later. Without it, you end up nesting DATE and DATEVALUE calls inside COUNTIFS that are difficult to debug and easy to break.

While you are here, add a quick filter to check your event_type column. If you have a mix of payment, login, and support_ticket events in the same table, decide now which event type signals genuine retention. Payment is the most defensible for SaaS because it ties directly to revenue. Filter to that type and note the value string exactly as it appears.

You should now see a three-column events table with a clean YYYY-MM label on every row.


Step 4: Build the Cohort Matrix Skeleton

Create a third tab called cohort_matrix. This is where your retention table lives.

Set up row 1 as your period offset headers. These represent “months since signup”:

  • A1: Cohort
  • B1: M0
  • C1: M1
  • D1: M2
  • Continue to M11 if you have 12 months of history

In column A starting at A2, list your cohort months from oldest to newest:

2025-01
2025-02
2025-03
...

Stop at the most recent complete month. Leave out the current month if it is still in progress. A cohort that started this month has no M1 data yet and including it distorts the heatmap scale.

You should now see an empty grid with cohort months running down rows and period offsets running across columns.


Step 5: Count New Users Per Cohort (M0)

In B2, enter a COUNTIF to count how many users belong to the 2025-01 cohort:

=COUNTIF(raw_users!$C:$C, $A2)

This counts every row in raw_users where cohort_month matches the label in A2. Drag B2 down column B to fill all cohort rows.

Column B is your M0 column and it is the denominator for every retention percentage you will calculate. Get this right before moving on.

Quick sanity check: sum all M0 values with =SUM(B2:B20) and compare the total to the row count in raw_users minus the header row. They must match exactly. If they do not, you have a data gap, a duplicate header row in your raw data, or a cohort_month formula that returned an error on some rows.

You should now see the signup count per cohort filling column B, with the numbers adding up to your total user count.


Step 6: Count Retained Users for Each Period Offset

In C2 (M1 for the 2025-01 cohort), count users from that cohort who had a payment event in month 2025-02. Because you already have the event_month helper column, the formula is straightforward:

=COUNTIFS(
  raw_users!$C:$C, $A2,
  raw_events!$C:$C, TEXT(DATE(YEAR(DATEVALUE($A2&"-01")), MONTH(DATEVALUE($A2&"-01"))+1, 1), "YYYY-MM"),
  raw_events!$B:$B, raw_users!$A:$A
)

The middle argument calculates the target event month by adding the column offset to the cohort month. For M1 (offset 1), it adds 1. For M2 (offset 2), it adds 2. The offset is driven by the column position.

A more maintainable approach: put the numeric offset in a hidden row 2, then reference it in the formula. For column C, row 2 would hold 1. For column D it holds 2, and so on. Then the formula references C$2 instead of a hardcoded number.

Note that this COUNTIFS counts rows, not unique users. If one user had two payments in M1, they count twice. For a quick MRR-adjacent view that is acceptable. For strict user retention, add a helper column in raw_events that concatenates user_id and event_month, deduplicate it, and COUNTIFS against that column instead.

Drag the formula across all columns and down all rows. The bottom-right portion of the table will return zeros or errors for future periods that have not happened yet.

You should now see a triangular grid of numbers, dense in the top-left and empty in the bottom-right, which is exactly what a cohort matrix should look like.


Step 7: Calculate Retention Percentages

Create a fourth tab called cohort_pct. Copy the same row and column header structure from cohort_matrix.

In B2, reference the M0 count directly:

=cohort_matrix!B2

In C2, calculate the M1 retention rate:

=IFERROR(cohort_matrix!C2/cohort_matrix!$B2, "")

The $B2 locks the column to M0 regardless of how far right you copy the formula. IFERROR replaces division-by-zero and blank cells with an empty string rather than #DIV/0!.

Copy this formula across all columns and down all rows. Then select the entire data range (excluding the cohort labels in column A). Go to Format > Number > Custom number format and enter 0.0%.

You should now see a table of percentages where M0 shows either 100% or the raw count, and every subsequent column shows the fraction of that cohort that came back.


Step 8: Apply Heatmap Conditional Formatting

Select all percentage cells in cohort_pct (the data range, not the headers or M0 column).

Go to Format > Conditional formatting. At the bottom of the panel, click “Color scale.” Set:

  • Minpoint: 0%, color: light red or white
  • Midpoint: 20% to 30% depending on your typical retention, color: yellow
  • Maxpoint: 80% to 100%, color: dark green

Click Done.

Your matrix is now a heatmap. Cohorts where users churn fast will be mostly red by M2. Cohorts from a month when you shipped a meaningful feature improvement will stay green deep into M5 or M6. That visual shift across rows is often the fastest way to pinpoint when a product change actually moved the needle on retention.

You should now see a color-coded grid where healthy cohorts glow green and leaky ones show red or orange by M2 or M3.


Step 9: Add a Retention Curve Line Chart and Refresh Workflow

In cohort_pct, hold Cmd (Mac) or Ctrl (Windows) and select the header row plus three or four representative cohort rows. Go to Insert > Chart.

Google Sheets usually auto-suggests a line chart. If not, change the chart type to Line chart. Set:

  • X-axis: the M0, M1, M2… period labels
  • Series: each cohort row, labeled by cohort month

You now have a retention curve chart where each line is one cohort. Steep early drops followed by a flat tail are normal. A rising or flattening tail across newer cohorts signals genuine improvement.

For ongoing refresh, append new rows to raw_users and raw_events on the first of each month. All COUNTIFS formulas recalculate automatically. If you want to automate the import from a source sheet, go to Extensions > Apps Script and use this pattern:

function importLatestEvents() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var src = SpreadsheetApp.openById("YOUR_SOURCE_SHEET_ID");
  var data = src.getSheetByName("export").getDataRange().getValues();
  var dest = ss.getSheetByName("raw_events");
  dest.getRange(1, 1, data.length, data[0].length).setValues(data);
}

Set a monthly time-driven trigger on this function under the Apps Script Triggers menu.

You should now see a chart ready for a board update and a refresh process that takes minutes rather than an hour of manual work.


Common Mistakes To Avoid

  • Mixing event types without filtering. If you include logins and payments in the same raw_events table, COUNTIFS counts logins as retention. Filter to one meaningful event type before building the matrix.
  • Including the current incomplete month in cohort rows. A cohort that started this month has no M1 data yet. It makes M0 look artificially small and throws off the heatmap scale for all other rows.
  • Text-formatted dates that look like dates. If your CSV dates land as left-aligned text, COUNTIFS date comparisons silently return zero. Convert with DATEVALUE or use Data > Split text to columns to force proper date recognition.
  • Forgetting the $B lock in percentage formulas. Without the dollar sign locking the M0 column, the denominator shifts as you copy right. Every retention percentage will be wrong in a way that looks plausible.
  • Counting rows instead of unique users. One user with two payments in M1 counts twice. Add a deduplication step using a helper column that concatenates user_id and event_month if you need strict headcount retention.
  • Skipping plan segmentation. A blended cohort table hides the fact that free trial users churn at 80% while annual plan users retain at 90%. Duplicate the matrix filtered by plan and compare them side by side.

When To Level Up

Google Sheets handles around 10,000 users and 18 months of data without much trouble. Past that point, COUNTIFS across full columns starts lagging noticeably on each recalculation. When the sheet takes more than 30 seconds to update after a paste, it is time to move on.

Purpose-built tools like Chartmogul handle cohort analysis natively with daily refresh, MRR segmentation, and drill-down by plan, geography, or pricing tier. Mixpanel and Amplitude let you define “retention” as any event, not just payment, which opens up behavioral cohort analysis that a spreadsheet cannot replicate cleanly.

If you want to stay close to SQL and spreadsheets but need more speed, push your raw data into BigQuery and connect Google Sheets via Connected Sheets (available on Workspace Business Standard and above). The cohort logic stays identical, the query engine handles the heavy lifting, and your formulas still look familiar.

Browse the full list of options at /category/data-analysis/ or read the dedicated comparison at /cohort-analysis-tools-comparison/ for a side-by-side view of what each platform does better than a spreadsheet.


Frequently Asked Questions

What is a cohort in SaaS retention?
A cohort is a group of users who share a common starting point, usually the month they signed up or converted to paid. Tracking cohorts separately shows you whether retention is improving over time, rather than blending all users into a single average that hides the trend direction.

How many months of data do I need before cohort analysis is useful?
Three months is the bare minimum to see M0, M1, and M2 patterns. Six to twelve months gives you enough to identify whether you have a leaky-bucket problem (high early churn) or a slow bleed over time. Cohorts with fewer than 20 users per month produce percentages that swing too wildly to be actionable.

Should I use monthly or weekly cohorts?
Use monthly cohorts when your billing cycle is monthly and your product is used several times per week. Use weekly cohorts for high-frequency products (daily active use, like a productivity or communication tool) or when you are in early-stage growth and need faster feedback. Weekly cohorts with monthly billing create a mismatch between your retention signal and your revenue signal.

Can I do this in Excel instead of Google Sheets?
Yes. COUNTIFS works identically in Excel. The main difference is the conditional formatting dialog, but Excel supports three-color scale rules under Home > Conditional Formatting > Color Scales that achieve the same heatmap effect. For a full breakdown of the tradeoffs, see /google-sheets-vs-excel-data-analysis/.

How do I handle users who signed up but never activated?
Include them in M0 because they are part of the real cohort and the real churn problem. If you want to separate activation from retention, build a second matrix using only activated users as the denominator. Comparing the two matrices tells you whether churn starts at activation (never used the product) or post-activation (used it and left). That split is one of the most actionable signals you can surface from a cohort analysis.


Bottom Line

Building a cohort analysis in Google Sheets takes an afternoon and a clean data export. You set up two raw data tabs, assign cohort labels with a TEXT formula, count users per cohort with COUNTIFS, convert those counts to percentages, and apply a color scale heatmap to make patterns visible at a glance. The line chart turns the grid into something you can paste into a board update without translation.

This setup works well for early-stage SaaS founders who need directional retention insight without paying for a dedicated analytics platform. The concepts transfer directly to SQL-based tools when your data outgrows the sheet. The cohort logic is the same whether you are writing COUNTIFS or a GROUP BY query.

For the next step in building out your analytics stack, browse /category/data-analysis/ where we compare the platforms that take over once spreadsheets hit their ceiling.