How to dedupe a customer list without fancy tools - Data Research Analysis Collection

TL;DR

You can find and remove duplicate customer records using only a spreadsheet and a handful of formulas. The whole process takes 30 to 90 minutes depending on list size. All you need is Google Sheets or Microsoft Excel and a copy of your customer data.

What You Need Before You Start

Your customer list exported as a CSV or XLSX file (from your CRM, email platform, e-commerce backend, etc.)
Google Sheets (free, browser-based) or Microsoft Excel 2016 or later
A column you can use as a unique identifier, ideally email address, but phone number works too
At least one backup copy of the original file before you touch anything
Optional: a plain text editor like Notepad++ or VS Code if you want to inspect the raw CSV first
Optional: Python 3.10+ with pandas if your list has more than 50,000 rows (covered in the “When To Level Up” section)

If your list came from multiple sources, email address is the most reliable deduplication key because it is unique per person and does not change as often as a name or address. Phone numbers work as a fallback if emails are missing.

Step 1: Back Up Your Raw File

Before you change a single cell, make a copy of the original file and store it somewhere you will not accidentally overwrite.

If you are working in Google Sheets, duplicate the sheet tab and rename it RAW_BACKUP. If you are working in Excel, save a copy with the filename suffix _original before opening the working version.

customer_list_2026_original.xlsx   ← never touch this
customer_list_2026_working.xlsx    ← work here

This takes 30 seconds and has saved countless people from a painful re-export. You should now see two files (or two tabs) and you will only ever edit the working copy from this point forward.

Step 2: Load Your Data Into a Spreadsheet

Open the working file in Google Sheets or Excel. If it is a CSV, go to File > Import (Sheets) or Data > From Text/CSV (Excel) so the columns parse correctly. Do not just double-click the CSV in Excel because it will silently mangle any leading zeros in phone numbers or zip codes.

Check that your data has a proper header row in row 1. Common headers you want to see: email, first_name, last_name, phone, company. If the headers are missing, add them manually before moving on.

Freeze row 1 so it stays visible as you scroll. In Sheets: View > Freeze > 1 row. In Excel: View > Freeze Panes > Freeze Top Row.

You should now see a clean table with column headers locked at the top and your records starting at row 2.

Step 3: Add a Cleaned Key Column

Raw data is messy. The same person might appear as Sarah@Example.com in one row and sarah@example.com in another. A formula that normalizes the key field catches these near-misses.

Insert a new column to the right of your data. Label it clean_key. In the first data row (assuming email is column C), enter this formula:

=LOWER(TRIM(C2))

This strips leading and trailing spaces and converts the email to lowercase. Drag or copy the formula down to every row.

If your deduplication key is a name instead of an email (not ideal but sometimes all you have), use a combined key:

=LOWER(TRIM(A2))&"|"&LOWER(TRIM(B2))

That concatenates first name and last name with a pipe separator so John Smith and JOHN SMITH both become john|smith.

You should now see a new column of normalized values. Scan a few rows to confirm spaces disappeared and the casing is consistent.

Step 4: Sort the Entire Sheet by Your Clean Key

Sorting groups identical values together, which makes duplicates visually obvious and makes the COUNTIF formula in the next step more reliable.

Select all your data including headers. In Sheets: Data > Sort range > check “Data has header row” > sort by clean_key A to Z. In Excel: Home > Sort & Filter > Custom Sort > sort by clean_key ascending.

Do not sort only one column. Always select all data first so the row relationships stay intact.

You should now see your rows reordered so any rows with the same clean_key are physically next to each other. If you spot a cluster of three or four identical emails side by side, that is a sign your data has a significant dupe problem.

Step 5: Add a COUNTIF Column to Flag Duplicates

Insert another new column. Label it dupe_flag. In the first data row (assuming clean_key is column F), enter:

=COUNTIF($F$2:$F$10000,F2)

Replace $F$10000 with a row number that goes past your last row. This formula counts how many times each clean_key value appears in the entire column. A value of 1 means that record is unique. A value of 2 or higher means it appears multiple times.

Copy the formula down to all rows.

For an even faster visual scan, add a second helper column labeled is_dupe with this formula:

=IF(COUNTIF($F$2:$F$10000,F2)>1,"DUPE","OK")

This stamps each row with either DUPE or OK so you can filter visually.

You should now see a column of numbers (or labels) indicating which rows have duplicates. Rows marked DUPE need a decision made about them.

Step 6: Filter to Show Only Duplicate Rows

Turn on filters. In Sheets: Data > Create a filter. In Excel: Home > Sort & Filter > Filter.

Click the dropdown on the is_dupe column and select only DUPE. Your sheet now shows only the rows that have at least one duplicate elsewhere in the list.

Scan through what you see. Look for patterns:

Same email, slightly different names (typo or nickname)
Same person imported twice from two different sources
Test records that never should have been in the list

This review step is important because not every “duplicate” is a simple delete. Sometimes two records have different data you want to merge, like one has a phone number and the other has a company name.

You should now see a filtered view showing only the flagged rows, and you have a clear picture of how big the dupe problem actually is.

Step 7: Decide Keep or Merge for Each Duplicate Group

For each group of duplicates, you need one record to survive. The decision rules are:

If both records are identical: keep one, delete the other.
If they differ in minor fields: keep the more complete record.
If they differ in important fields (purchase history, tags, segment): manually merge the data into one row before deleting the other.

Color-code the rows you plan to delete using a fill color. Do not delete anything yet. This visual pass lets you catch mistakes before you commit.

A quick way to identify which record in a duplicate group is the “most complete” one is to add a completeness score column:

=COUNTA(A2:H2)

This counts how many cells in that row are non-empty. Higher score means more data. Keep the record with the highest score.

You should now have a color-coded sheet where rows marked for deletion are visually distinct from the rows you plan to keep.

Step 8: Delete the Duplicate Rows

With your filter still active showing only DUPE rows, select the rows you marked for deletion. Right-click > Delete rows (Sheets) or Home > Delete > Delete Sheet Rows (Excel).

Then clear the filter to see your full list again.

Do a final COUNTIF check. If your deduplication worked, no value in the dupe_flag column should be higher than 1. Sort the column descending to bring any remaining duplicates to the top.

=MAX(F2:F10000)

Paste this formula in an empty cell. If it returns 1, you are done. If it returns 2 or more, there are still duplicates to resolve.

You should now see a clean list where every clean_key value appears exactly once.

Step 9: Remove Your Helper Columns and Export

Delete the clean_key, dupe_flag, and is_dupe columns before you import this list back into your CRM or email tool. Those helper columns will confuse the import wizard and bulk up your file unnecessarily.

Then export: File > Download > CSV (Sheets) or File > Save As > CSV (Excel).

Name the output file something unambiguous:

customer_list_2026_deduped_2026-05-16.csv

Including the date in the filename prevents the classic overwrite-the-wrong-version mistake.

You should now have a clean CSV ready to upload to your email platform, CRM, or wherever this list lives.

Common Mistakes To Avoid

Sorting only one column. If you sort just the email column without selecting all data, you rearrange that column while every other column stays in place. Your data will be scrambled and unrecoverable without your backup.
Using name as the sole deduplication key. Names are not unique. There are 47,000 people named Michael Smith in the US alone. Always use email as the primary key.
Forgetting to TRIM. A trailing space after sarah@example.com makes it look different from the version without the space. COUNTIF will miss it if you skip the TRIM step.
Deleting without reviewing. Automated deletion of “all duplicates” in bulk will delete records with valuable unique data attached to them. Always review the duplicate pairs before removing anything.
Deduplicating across formats without standardizing. Phone numbers stored as +1 555-123-4567 and 5551234567 will not match. Normalize phone numbers by stripping all non-numeric characters first.
Working on the original file. You will make a mistake at some point. That is fine as long as you have the backup.

When To Level Up

This spreadsheet approach works well up to about 10,000 to 20,000 rows. Beyond that, COUNTIF formulas slow to a crawl, the manual review becomes impractical, and you start running into situations where duplicates span three or four records with conflicting data across all of them.

That is when you want a proper tool. Python with the pandas library handles 500,000 rows in seconds using df.drop_duplicates(subset=['email']). For fuzzy matching, where you need to catch John Smith and Jon Smith as the same person, the recordlinkage library or a dedicated tool like OpenRefine (free, desktop app) does the heavy lifting.

If you are managing ongoing deduplication as records flow in continuously, you are moving into CRM territory. Tools built for that problem are covered at /category/data-skills/.

For a deeper look at handling messy real-world data at scale, check out the guide to cleaning a CRM export before analysis and the comparison of free data cleaning tools for small businesses on this site.

Frequently Asked Questions

Do I need a paid tool to deduplicate a customer list?
No. Google Sheets is free and handles lists up to about 10 million cells. For most solopreneurs with lists under 20,000 contacts, a spreadsheet with COUNTIF formulas is all you need.

What if my duplicates have slightly different email addresses, like a typo?
COUNTIF with LOWER and TRIM catches case and spacing differences, but it will not catch sarah@gmial.com vs sarah@gmail.com. For fuzzy matching you need a tool like OpenRefine or Python’s fuzzywuzzy library, which is when this spreadsheet approach reaches its limit.

How do I decide which duplicate record to keep?
Keep the more complete record. Use a COUNTA formula to score each row by how many fields have data, then keep the highest-scoring version. If both records have unique data you need, merge the fields manually into one row before deleting the other.

Can I automate this process so it runs every time I get new contacts?
Yes, but that requires a script or a workflow tool. A Python script using pandas or a Zapier/Make automation that deduplicates before writing to your CRM handles this. The manual spreadsheet method described here is a one-time or periodic cleanup, not a real-time pipeline.

How often should I deduplicate my customer list?
Before any major campaign send, before importing into a new tool, and any time you merge two lists from different sources. For actively growing lists, a monthly pass is a reasonable habit.

Bottom Line

Deduplicating a customer list without any paid software comes down to four moves: normalize your key field with TRIM and LOWER, count how many times each value appears with COUNTIF, filter to see only the duplicates, and delete or merge with intention. The whole workflow takes under an hour for most lists and costs nothing beyond the time you spend on it. Get the backup habit right first, use email as your key whenever possible, and do not skip the manual review step before you delete anything. When your list grows past the point where spreadsheets start lagging, that is the right time to explore the tools covered in /category/data-skills/.