Imagine, you've received 10000 records of Name and Email address in a spreadsheet from online survey of your new product. There is likelihood of getting duplicate records. One should remove such redundant records to get accurate results. Lets see the possible ways to remove duplicates either from Google Docs' Spreadsheet or from Microsoft Excel's Spreadsheet.
To be on safe side I highly recommend to make a copy of your dataset before you experiment any of the following features.
1) Use "Remove Duplicates" feature - New in Excel 2007
This is the simplest way to remove the duplicate records if you are using Excel 2007 or later. Following are the steps.
- Select the range of cells from which you want to find the duplicates.
- Click Remove Duplicates button under the Data ribbon. It will open up the Remove Dialog box.
- The Remove Duplicates dialog will give you a list of columns. Choose the columns which should be considered for uniqueness. For example, here i've selected the Email column only so that I can keep all the names with unique email ids only. Keep the My data has headers checkbox selected if first row of spreadsheet used as headers.
- Click OK. In this case, Excel will delete all the duplicate records with the duplicate email ids.
2) Find duplicate rows in Excel/Google Docs spreadsheet
This approach will work on both Microsoft Excel (in any version) as well as Google Docs spreadsheet. In this approach, we will find the duplicate rows using IF condition. Follow the below steps to find the duplicate rows.
- Select and Sort the column in Ascending order from which you want to remove duplicate records. In our case, we will select and sort the Email column in ascending order.
Upon clicking the Sort button you will be prompted to expand the selection. Simply select Expand the selection and click Sort. It will sort the Email column as well as all the relative columns in ascending order.Here is the result of sorted Email column in ascending fashion.
- Now we will create a function that will compare two adjacent cells of Email IDs. If the email ids are same, one of them is a duplicate record. Enter the following function in Column C3:
=IF(B3=B2, "Duplicate", "")Drag this formula until the last non-empty row.
- That's it. All the duplicate email ids will have "Duplicate" value in the adjacent cell in column C. Here is the resulted output.
- Now, if you want to remove those rows having value Duplicate in column C, then select the column C and apply the filter from the Data ribbon as below.
- It will only display the rows with the value Duplicate in column C. Select them, delete them. Simple.
Do you know any better or other way to find and remove duplicates from spreadsheet. If so, then please share the knowledge in Comment section. Best answers would be added to this post with credits.
Remember
Remember
"In today's environment, hoarding knowledge ultimately erodes your power. If you know something very important, the way to get power is by actually sharing it."
- Joseph Badaracco