8 Ways to Find & Remove Duplicate Lines in Text Files
A plain text file is one of the most common ways to store typed data on a computer. It’s not only very simple to create a text file but also incredibly easy to edit one because it can be done using something as simple as Windows Notepad. A text file is also a universal format meaning it’s readable on multiple platforms including Windows, Mac, Linux, phones, tablets, and pretty much everything in between.
If you have text files that you need to read through or have merged several text files into one to make things easier, it makes sense to remove any lines of text that are duplicates of one another. You could manually remove all the duplicates yourself but it’s much easier to let the process be handled automatically.
Here we show you a number of different ways in which you can get duplicate lines removed from your text files.
Use A Third Party Utility To Delete Duplicate Lines
A simple and common way to remove duplicate lines from a text file is to use a dedicated third party utility.
1. TextCrawler Free
TextCrawler is a very powerful freeware program that is built mainly for the task of searching and replacing data in text files. Despite its power, TextCrawler is relatively easy to use and the remove duplicate lines option is actually found in a separate window, called the Scratchpad. Install (or extract the installer with 7-Zip to make it portable) and launch TextCrawler. Go to the Tools menu > Scratchpad or press F2.
Paste the text into the window and press the Do button. The Remove Duplicate Lines option should already be selected in the drop down by default. If not, select it first. Press Save to create a new text file or copy and paste the text back into a text editor. The drop down also has other options to remove white spaces and sort the text in ascending or descending order. Duplicate lines need to also match casing as well as content to be removed.
2. Duplicate Lines Remover
Duplicate Lines Remover is from security company NoVirusThanks and has some useful features. For some reason, information about the program has been removed from their website but thankfully the official download link is still available. Only a setup installer is available but you can make the program portable with the help of Universal Extractor.
To remove duplicate lines, browse for or drop a file onto the window, click Check and if duplicate lines are found, click Fix and choose a save name. You can also batch process a number of files at once, including in subfolders, using the File Scan tab.
Be aware though that it will automatically create .bak backups and replace the original files with the processed files. Use the mask box and add a file extension to stop everything from being processed. A dedicated command line version is also available in the package for scripts and more advanced users.
Download Duplicate Lines Remover
3. TextMechanic Offline
The Offline portion of the name is to differentiate this tool from the online TextMechanic service as the two are not related. This tool is a portable executable and hosted at SourceForge. In addition to removing duplicate and empty lines, TextMechanic Offline can also delete lines containing specific text or find and replace text. The .NET Framework 3.5 is required for Windows 10 users.
Paste the text to be processed into the TextMechanic window before pressing Remove Duplicate Lines. Then also press Remove Empty Lines otherwise the text will contain empty lines where duplicates have been removed. Finally, press Save to Clipboard so the corrected text can be pasted back into your text editor. This tool is case sensitive so lines need to be identical in both case and content if they are to be removed.
4. RemoveDup (modified by Raymond.cc)
This dedicated utility is open source and portable. All you have to do is run it and browse for a text file, then press Proceed. A stripped file will be output to the specified location with “_NoDuplicates” appended to the filename. The process is case sensitive so a line of “raymondcc” would not be a duplicate of “Raymondcc”. As RemoveDup is open source, we have made a few updates and improvements to the original program.
It now uses .NET Framework 4 which means Windows 10 users won’t need .NET 3.5 to be separately installed. The input box now has drag and drop support so you can drop a text file onto it without manually browsing. Lastly, we have added a checkbox to make the process optionally ignore case, so “raymondcc” would be a duplicate of “Raymondcc”.
Please direct any issues/feedback about the modified version to the comments section in this article. The modified source is included in the archive.
Download RemoveDup Modified Version | Download RemoveDup Original
Notepad++ is a feature rich and popular text editor and it never used to have an integrated feature to remove duplicate lines. An extra plugin was required called TextFX. However, in more recent versions, the function to remove duplicate lines from a text file is now built directly into the program.
All you have to do is click on the Edit menu and go to File Operations. There are two options; Remove Duplicate Lines and Remove Consecutive Duplicate Lines. The first is pretty much self explanatory and it simply removes all exact duplicate lines from the text. The second option only removes a line if it has a duplicate on the next line while different lines in between two duplicates will be ignored.
The Notepad++ duplicate lines menu functions are case sensitive. You can also run a simple delete operation that ignores case using search and replace (Ctrl+H). Paste “(?si)^(.+?\R)(?=(?:.+\R)?\1)” without quotes into the “Find what:” box, make sure “Replace with:” is empty, choose the Regular expression search mode and press “Replace All”. The last occurrence of each duplicate line in the file is retained.
PSPad is another text/code editor that has a remove duplicate lines function built in. It has two options to work as either case sensitive or insensitive.
Delete Duplicate Lines Using Built-in Windows Commands
Using a mixture of built in Windows commands, it is possible to remove duplicate lines from within text files.
6. Using a Batch Script
Putting those commands together into a single batch script allows for quick and easy processing. This could also be useful if your computer has any restrictions running third party software. The script we are using was found at StackOverflow and modified to make it easier to use.
Download the DeDupe Zip file and extract it. One BAT file is case sensitive and won’t alter lines unless their case is also identical, the other script is case insensitive. To delete duplicate lines, drop the text file onto the .BAT file and it will automatically process and output the file in the same folder with a _deduped suffix. The lines are not reordered but removed in place, ideal if you want to keep the same line order as the original.
Delete Duplicate Lines Using An Online Service
There are many online services to do the job of removing duplicate lines from text, we’ve chosen a couple which should cover your needs.
7. PineTools Remove Duplicate Lines Online
PineTools is a website we keep in our favorites because it boasts a huge number of online tools for various tasks. They range from color pickers, date/time tools, and (random) number generators to image editors, programming syntax highlighters, and mathematic calculators. There’s almost 20 text and list tools, of which remove duplicate lines Is one.
Simply paste the text into the original text box, press REMOVE and the filtered text will appear in the without duplicate lines box. Press “Select all” and copy it back into your text editor. There’s options to ignore case, convert to upper/lower case, ignore/remove empty lines, and sort the output into alphabetical order. Press remove again after changing an option to see the result.
Visit PineTools Remove Duplicate Lines Online
This website is not to be confused with the TextMechanic Offline tool because the two are separate products by different people. TextMechanic has a host of different online text based manipulation tools. Besides only removing duplicate lines, if you want to have all other tools available at once, use the All-in-One Text Manipulation Notepad.
There are two ways to get your text into the window; either use the Load File button to browse for a text file or press the blue C button to clear the current text in the window and paste your own text in. Then click the Remove Duplicate Lines button. The “Case Sensitive” and “Remove empty lines” checkboxes are available if your text requires it. Display removed shows what lines have been deleted in a separate box.
Once done, click the Save As button or press S to select all text, right click and copy it (Ctrl+C) ready for pasting into a text editor.
Visit TextMechanic Remove Duplicate Lines
A few other online duplicate line removal tools we looked at that are also worth checking out include DeDupeList.com, ToolSlick, and the OnlineTextTools.
Excel has nice options:
Data tab > Sort & filter part > advanced filter > copy to another location and mark unique records only or
Data tab > Data tools > Remove duplicates
Great Solutions! I used Deduped one. However, I want the last occurrence of duplicate line to be retained and all deleted instead of retaining first occurrence. I am hard time editing this code. Any help is highly appreciated!
Thanks in advance!
Thanks a bunch. Used te Dedupe one. Was searching for something like this for a long time. Great help.
here is an online tool can remove the duplicate lines in the text automatically.
Very very usefull
thank you for online text filter tools.
Great Article ! TY
I personalyhave used Notepad++ for ages for creating scripts.
Really is handy for alot of things