How to Clean Data in Excel: Dedupe, Split, Standardize

Step-by-step guide to AI data cleaning in Excel—remove duplicates, split columns, and standardize text using worksheet tools, Power Query, and Copilot. Practical, reproducible for SMBs.

How to Clean Data in Excel: Dedupe, Split, Standardize

Messy spreadsheets slow down decisions: duplicate customer rows inflate counts, “Full Name” fields block analysis, and stray spaces or inconsistent case make lookups fail. Here’s a practical, repeatable playbook for AI data cleaning in Excel—covering the fastest worksheet tools, reproducible Power Query recipes, and where Microsoft 365 Copilot helps you clean up faster without sacrificing review and control.

Key takeaways

  • Use worksheet tools for quick, one‑off fixes; prefer Power Query for anything you’ll refresh or audit later; bring in Copilot for guided, multi‑step cleanup with human review.

  • Lock data types early and protect codes with leading zeros; always validate row counts, distinct keys, and column profiles after cleaning.

  • Standardize case and whitespace before matching, joining, or charting; keep an original copy or a backup sheet for easy rollback.


What to use when: a quick comparison

Method

Best for

Repeatable

Speed

Risks/notes

Remove Duplicates (worksheet)

Fast exact dedupe on a table/range

No (destructive)

Very fast

Keeps first occurrence only; limited options

Text to Columns

Simple splits by comma/space/fixed width

No (one‑time)

Fast

Can overwrite cells; watch leading zeros

Flash Fill

Pattern‑based splits/standardization

No

Fast on small sets

May guess wrong; not ideal for messy data

Power Query

Production‑grade, refreshable pipelines

Yes (M code)

Setup once, refresh later

Slight learning curve; best for scale

Copilot (Clean Data/Agent)

Quick AI pass + natural‑language steps

Semi (requires review)

Fast with review

Always preview changes; document results


Remove duplicates the safe way — AI data cleaning in Excel with built‑ins and Power Query

Worksheet quick path. Place the active cell anywhere in your data (ideally a formatted Table). On the Data tab, choose Remove Duplicates, then tick the columns that together define a duplicate key (for example, Email or FirstName + LastName + ZIP). Excel keeps the first occurrence and removes later duplicates. Microsoft documents the flow in its support article “Filter for unique values or remove duplicate values,” which details options and caveats: see Microsoft Support (accessed 2026) — remove duplicate values in Excel.

Power Query for reproducibility. If you’ll refresh this data again (weekly CRM exports, ecommerce orders), import it with Data > Get Data and open Power Query. Select the key columns and choose Home > Remove Rows > Remove Duplicates. This records a step you can refresh on new files. Microsoft’s Learn guidance covers strategies for duplicates in Power Query (2026): working with duplicates.

Need to keep the first or last by date? Group by your key, sort inside each group, and expand the first/last row. Here’s a representative M snippet you can adapt:

let
  Source = Excel.CurrentWorkbook(){[Name="Table_Customers"]}[Content],
  Typed = Table.TransformColumnTypes(Source, {{"Email", type text}, {"SignedUp", type date}}),
  Sorted = Table.Sort(Typed, {{"Email", Order.Ascending}, {"SignedUp", Order.Descending}}),
  Grouped = Table.Group(Sorted, {"Email"}, {{"All", each _, type table [Email=text, SignedUp=date, ...]}}),
  KeptFirst = Table.TransformColumns(Grouped, {{"All", each Table.FirstN(_, 1)}}),
  Expanded = Table.ExpandTableColumn(KeptFirst, "All", Table.ColumnNames(Typed))
in
  Expanded

AI assist (Copilot). In Copilot chat (Excel for web/eligible desktop builds), try: “Remove duplicate customers using the Email column, keeping the most recent SignedUp date. Show me a preview before applying.” Review the plan and results before you accept.

Validation. After dedupe, compare row counts before/after and check distinct counts of your key to confirm uniqueness. If case‑sensitive dedupe matters, do it in Power Query with an explicit Text.Upper or Text.Lower on the key first, or branch your pipeline for strict comparisons.


Split columns by delimiter or width (and keep your zeros)

Worksheet quick path: Text to Columns. Select the source column, then Data > Text to Columns. Choose Delimited (Comma, Space, Tab, or Other) or Fixed width, set a destination to avoid overwriting, and in Step 3 assign column formats—pick Text for any codes that must preserve leading zeros. Microsoft’s documentation (accessed 2026) walks through the wizard: split text into different columns. If your data has quoted commas inside a field, set the Text qualifier to keep quoted text together.

Power Query split with quotes respected. In Power Query, select the column and choose Split Column > By Delimiter. Pick comma/space or a custom delimiter; set “Split at each occurrence” or “left‑most/right‑most.” To honor quoted text, the engine uses CSV‑aware splitters under the hood. Example M snippet:

let
  Source = Csv.Document(File.Contents("C:Datacustomers.csv"), [Delimiter=",", QuoteStyle=QuoteStyle.Csv]),
  ToTable = Table.FromColumns(Source),
  Promoted = Table.PromoteHeaders(ToTable, [PromoteAllScalars=true]),
  SplitName = Table.SplitColumn(Promoted, "Full Name",
    Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),
    {"First Name", "Last Name"})
in
  SplitName

Flash Fill for simple patterns. If you’ve got a small, consistent list, type the desired First Name in the adjacent cell and press Ctrl+E (Flash Fill). It’s fast, but verify results—ambiguous names or inconsistent spacing can throw it off. For larger or messier data, prefer Power Query.

Copilot prompt to accelerate. “Split the Full Name column into First Name and Last Name using the last space as the separator. Preserve leading zeros in any ID fields. Show a preview.” Apply only after scanning the preview.


Standardize text case and fix irregular whitespace

Why this matters. Inconsistent case and stray spaces break joins and inflate distinct counts. Clean these before matching emails, product names, or SKUs.

Worksheet functions. Use PROPER, UPPER, or LOWER to standardize case. To normalize spaces, TRIM reduces extra internal spaces to singles and removes leading/trailing spaces; CLEAN strips non‑printable characters. A reliable pattern for non‑breaking spaces (NBSP) is:

=TRIM(SUBSTITUTE(A2,CHAR(160)," "))

Power Query format steps (reproducible). Select text columns, then Transform > Format > Trim and Transform > Format > Clean. Optionally add case formatting via Transform > Format > Capitalize Each Word. Representative M snippet:

let
  Source = Excel.CurrentWorkbook(){[Name="Table_Products"]}[Content],
  Texted = Table.TransformColumnTypes(Source, {{"ProductName", type text}}),
  NBSP = Table.TransformColumns(Texted, {{"ProductName", each Text.Replace(_, "u00A0", " "), type text}}),
  Trimmed = Table.TransformColumns(NBSP, {{"ProductName", Text.Trim, type text}}),
  Cleaned = Table.TransformColumns(Trimmed, {{"ProductName", Text.Clean, type text}}),
  Titled = Table.TransformColumns(Cleaned, {{"ProductName", Text.Proper, type text}})
in
  Titled

AI quick pass with review. Microsoft announced Clean Data with Copilot for Excel on TechCommunity (originally 2024, updated through 2026). It detects common issues—text inconsistencies, numbers stored as text, and extra spaces—and offers suggested fixes that you can preview and apply selectively; see Microsoft TechCommunity (2024–2026) — Announcing Clean Data with Copilot in Excel. Use it as a first sweep, then finish in Power Query for repeatability.

Locale and data types. Set types early in Power Query (for example, mark codes as Text to keep leading zeros). If you handle dates or decimals from different regions, use “Change Type using locale” so parsing matches the source’s culture.


Validate, troubleshoot, and keep an audit trail

Treat cleaning as part of quality assurance. Keep an original sheet or import source files into Power Query and let all changes happen in the query steps. After transformations, check row counts before/after dedupe, confirm distinct keys, and spot‑check edge cases (punctuation, accents, multi‑word names). In Power Query, turn on Column quality/distribution/profile to visualize blanks, uniqueness, and value ranges; Microsoft’s Learn docs explain these profiling tools (accessed 2026): data profiling in Power Query.

When you compare pre‑ vs. post‑clean metrics, percentage math helps surface anomalies. If you need a refresher on exactly which calculation to use, see our short explainer on percent change vs. percent difference in Excel.

Risk controls. For sensitive data, prefer local Power Query steps over AI suggestions, and remove PII columns before any AI‑assisted pass to comply with company policy. Document your query steps (the M script) and add short notes on assumptions (for example, “email lower‑cased before match,” “NBSP normalized to space”).


Practical tips and alternatives

  • If an AI‑assisted step works well, rebuild it as a Power Query step so you can refresh it every week without re‑prompting.

  • Save common transformations (Trim/Clean/Title case) as reusable query functions and apply them to new files.

  • For cross‑file standardization or quick natural‑language prep before building slides, a dedicated agent such as hiData can be used to normalize text, merge spreadsheets, and generate draft charts and decks—then you can still load the results back into Excel and formalize them in Power Query.


FAQ

Q: When should I choose Copilot Agent Mode over Power Query? A: Use Agent Mode when you want a natural‑language, multi‑step plan in the grid with quick previews, and you’re okay documenting the result afterward. Choose Power Query when you need a durable pipeline you’ll refresh and audit regularly.

Q: Is worksheet Remove Duplicates case‑sensitive? A: Microsoft’s support docs don’t call out case sensitivity. If case matters, add an explicit case normalization step (like converting to all lower case) or perform dedupe in Power Query where you can control comparisons.

Q: How do I avoid losing leading zeros when splitting columns? A: In Text to Columns, set the destination and assign Text format to the target column in Step 3. In Power Query, ensure the column’s data type is Text and remove any auto‑applied numeric type changes.

Q: Is Flash Fill reliable for names and addresses? A: It’s fine for small, consistent patterns. For messy or large datasets, it can mis‑infer patterns. Prefer Power Query for scale and repeatability.


References cited inline

  • Microsoft Support (accessed 2026): remove duplicate values in Excel — Remove Duplicates dialog and options.

  • Microsoft Learn (2026): Power Query working with duplicates — grouping/distinct approaches.

  • Microsoft Support (accessed 2026): Convert Text to Columns Wizard — delimited and fixed‑width splits.

  • Microsoft TechCommunity (2024 announcement, updated through 2026): Announcing Clean Data with Copilot in Excel — detection and guided fixes.

  • Microsoft Learn (2026): Power Query data profiling tools — quality/distribution/profile views.

Like (0)

Related Posts