Remove duplicates
The Remove Duplicates function is used to identify and remove repeated records from your dataset. You can choose one or multiple columns to check for duplicates.
This helps ensure that your data is clean, accurate, and free of redundancy before moving to the next step in your workflow.
How to Remove Duplicates?
Step 1: Import the “Remove Duplicates” step from the transformation panel and connect it to the source from which the user wants to remove the duplicates.
Step 2: Select the columns from the drop-down menu that should be used to remove duplicate records.
Step 3: Click “Save.”
For example:
- In the “Key Columns” field, select the columns “Order_id,” “Customer_id,” “Address_id,” and “Product_id” to eliminate duplicate records.
- Save it.

Preview
