Skip to content

Remove duplicates

Estimated reading: 1 minute 475 views

The Remove Duplicates function is used to identify and remove repeated records from your dataset. You can choose one or multiple columns to check for duplicates.

This helps ensure that your data is clean, accurate, and free of redundancy before moving to the next step in your workflow.

How to Remove Duplicates?

Step 1: Import the “Remove Duplicates” step from the transformation panel and connect it to the source from which the user wants to remove the duplicates.

Step 2: Select the columns from the drop-down menu that should be used to remove duplicate records.

Step 3: Click “Save.”

For example:

  • In the “Key Columns” field, select the columns “Order_id,” “Customer_id,” “Address_id,” and “Product_id” to eliminate duplicate records.
  • Save it.

Preview

Leave a Reply

Your email address will not be published. Required fields are marked *

Share this Doc

Remove duplicates

Or copy link

CONTENTS