How to remove duplicate lines in a large text file?
2018-09-15 13:22
751 查看
How would you remove duplicate lines from a file that is much too large to fit in memory? The duplicate lines are not necessarily adjacent, and say the file is 10 times bigger than RAM.
A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.
Java:
// Efficient Java program to remove // duplicates from input.txt and // save output to output.txt import java.io.*; import java.util.HashSet; public class FileOperation { public static void main(String[] args) throws IOException { // PrintWriter object for output.txt PrintWriter pw = new PrintWriter("output.txt"); // BufferedReader object for input.txt BufferedReader br = new BufferedReader(new FileReader("input.txt")); String line = br.readLine(); // set store unique values HashSet<String> hs = new HashSet<String>(); // loop for each line of input.txt while(line != null) { // write only if not // present in hashset if(hs.add(line)) pw.println(line); line = br.readLine(); } pw.flush(); // closing resources br.close(); pw.close(); System.out.println("File operation performed successfully"); } }
相关文章推荐
- How to remove ^M in a uploaded text file?
- How to open a large text file on Linux
- How to Open a Text File in VB .NET
- How to read Json file or text file in Spark
- How to remove a line from a file in java?
- Shell: How to read lines in a file.
- How to remove ^M in the file
- How to calculate the MD5 hash of a large file in C?
- How to read text file in client side via HTML5
- how to get url and hostname and url text in one page
- how to write order by and limit query in jpa [duplicate]
- How to duplicate the records in a MongoDB collection
- How to Mount NTFS File System in CentOS 7 / RHEL 7
- How to get file extension from string in C++
- how to make a authorware file in an environment without authorware
- How to create columns like "bigint" or "longtext" in Rails migrations
- 如何在vim中注释掉多行(how to comment out more than one lines in vim)
- How to export data as a CSV file in Symfony
- how to remove value in ArrayList in java
- How to build mex file directly in Visual Studio 2010?