A plain text file is one of the most common ways to store typed data on a computer. It’s not only very simple to create a text file but also incredibly easy to edit one because it can be done using something simple as the built in Windows Notepad tool. A text file is also a universal format meaning it’s readable on multiple platforms including Windows PCs, Macs, Linux, phones, tablets and everything in between.
If you have a number of large text based files which you need to read through or have merged several text files into one to make things easier, it makes sense to remove any lines of text that are duplicates of one another. You could go through the file manually and remove the duplicates yourself but it’s much easier to let the process be handled automatically. Here we show you a number of different ways in which you can get duplicate lines removed from your text files.
Use A Third Party Utility To Delete Duplicate Lines
A simple and common way to remove duplicate lines from a text file is to use a dedicated third party utility.
To open the Scratchpad install and launch TextCrawler and go to the Tools menu > Scratchpad, or press F2. Paste the text into the window and press the Do button, the remove duplicate lines option should be selected in the drop down by default, it not select it beforehand. Then press Save to create a new text file or copy and paste the text back into your favorite text editor. The drop down also has other options to remove white spaces and sort the text in ascending or descending order.
Notepad++ is a feature rich text editor but does not have an integrated feature to remove duplicate lines, however, the option can be added in via the external plugin system Notepad++ uses. Although some are already included the TextFX plugin is not installed by default and needs to be added manually.
On the Notepad++ menu bar click Plugins > Plugin Manager > Show Plugin Manager and find the TextFX Characters plugin from the list. Check the box, click Install and restart the program when prompted and the TextFX menu will be available. To use the duplicate line removal function load a text file into Notepad++ and select all the text (Ctrl+A), go to TextFX > TextFX Tools and select the option “+Sort outputs only UNIQUE (at column) lines”.
Go back to the same TextFX Tools menu and either click on “Sort lines case sensitive (at column)” or “Sort lines case insensitive (at column)”, highlighted in yellow above. This will remove the duplicate lines and either leave or remove those with different casing. Then save the file out again.
3. Duplicate Lines Remover
Duplicate Lines Remover is from security company NoVirusThanks.org and has some useful features. For some reason the program’s webpage has been removed from their website but thankfully the official download link is still available. Only a setup installer is available but you can make the program portable with the help of Universal Extractor.
To remove duplicate lines, browse for or drop a file onto the window, click Check and if duplicate lines are found click Fix and choose a save name. You can also batch process a number of files at once, including in subfolders, using the File Scan tab. Be aware though that it will automatically create .bak backups and replace the original files with the processed files. Use the mask box and add a file extension to stop everything being processed. A dedicated command line version is available in the package for more advanced users.
4. TextMechanic Offline
The Offline portion of the name is to differentiate this tool from the online TextMechanic service as the two are not related. This tool is a portable executable and hosted at SourceForge. In addition to removing duplicate and empty lines, TextMechanic Offline can also delete lines containing specific text or find and replace text.
Paste the text to be processed into the TextMechanic window before pressing the “Remove Duplicate Lines” button followed by the “Remove Empty Lines” button. If you don’t press both buttons the text will contain empty lines where the duplicates have been removed. Finally press “Save to Clipboard” so the corrected text can be pasted back into your text editor. This tool is case sensitive so lines need to be identical in both case and content if they are to be removed.
Delete Duplicate Lines Using Built in Windows Commands
Using a mixture of built in Windows commands it is possible to remove duplicate lines from within text files.
5. Using a Batch Script
Putting those commands together into a single batch script allows for quick and easy processing. This could also be useful if your computer has any restrictions running third party software. The script we are using was found at StackOverflow and modified to make it easier to operate.
Simply download the DeDupe Zip file and extract it, there are two BAT files inside, one is case sensitive and won’t alter lines unless their case is also identical, the other script is case insensitive. To delete duplicate lines in a file all you have to do is drop the text file onto the .BAT file and it will automatically process and output the file in the same folder with a _deduped suffix. The lines are not reordered but removed in place, ideal if you want to keep the same line order as the original.
Delete Duplicate Lines Using An Online Service
There are many online services to do the job of removing duplicate lines from text, we’ve chosen a couple which should cover your needs.
Not to be confused with the TextMechanic Offline tool which is not related, this website has a host of different online text based manipulation tools. Besides only removing duplicate lines, if you want to have all other tools available at once, use the All-in-One Text Manipulation Notepad.
There are two ways to get your text into the window, either use the Load File button to browse for it or press the blue C button to clear the current text in the window and paste your own text in. Then click the Remove Duplicate Lines button, the case sensitivity and removing empty lines boxes are available to check if your text requires it. Once done click the Save As button or press S to select all text, right click and copy it (or press Ctrl+C) ready for pasting into a text editor.
7. Remove Duplicate Lines
Like TextMechanic.com above, the TextFixer website has a host of HTML, number and text manipulation tools, of which the Duplicate Line Removal Tool page is only one.
Paste the text to be processed into the top window, press the Remove Duplicate Lines button and the result will appear in the lower window ready to be selected and copied out. This tool has a couple of useful sorting options such as sorting in alphabetic order or reversing the sort order completely. The line removal is case sensitive so a single upper or lower case letter on a line means the difference between it staying or being removed.