您的位置：首页 > 编程语言 > Java开发

提取出某日访问百度次数最多的那个IP（Java实现）

2013-09-28 17:15 656 查看

思路参考july博客/article/1360513.html

1、海量日志数据，提取出某日访问百度次数最多的那个IP。

因为内存容量有限。所以需对大的文件进行切割。在分割文件时应使相同的IP保存到同一个文件中。可以采用取模操作。

注意：相同的IP必须存储到相同的文件中

因为每个IP（相当于字符串）对应了一个hashcode，相同的IP的hashcode肯定相同，通过hashcode对某个数取模，比如100.，这样原文件分割成100个文件。

根据取模的结果存储到相应的文件中。相同的IP会存储到同一个文件中。分割后的文件大小大约为原来的1/n(若对n取模），若对100取模大约为原文件的1/100。如果分割

后的文件中可能有部分文件内存中还装载（load）不下，可以对该文件继续分割直至内存可以装下为止。（比如对该文件继续对2求模）

对于分割后的文件，求每个文件上出现次数最多的IP。此时可以用hashmap存储每个IP出现的次数。key存储为IP字符串，value为该字符串出现的次数。每访问文件中的一条记录（IP），若该IP在hashmap中已存在，相应的value增加1。否则向hashmap中插入（put）一条新的记录。

统计该hashmap上拥有最大value的项

最后比较所有文件上访问最多的IP便求出了访问次数最多的IP

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
public class MassiveIP {
//generate the massive numbers of IPs
public void generateIP(String fileName){
PrintWriter out =null;
try {

out=new PrintWriter(fileName);
String s;
Random r=new Random();

for(int i=0;i<100000000;i++){
s="159.227.";
s+=r.nextInt(256)+"."+r.nextInt(256);
out.println(s);
}

} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
finally{
if (out != null)
out.close( );
}

}
//split the file to make it fit into the memory
public void FileSplit(String fileName){
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader (fileName));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
PrintWriter[] out=new PrintWriter[100];
for(int i=0;i<100;i++)
try {
//specify split file name
out[i]=new PrintWriter(fileName+i);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
String IP = null;
try {
while((IP =reader.readLine())!= null ) {
IP=reader.readLine();
int fileNum=IP.hashCode()%100;
fileNum=(fileNum>=0?fileNum:fileNum+100);
//	System.out.println(fileNum);
out[fileNum].println(IP);

}
for(int i=0;i<100;i++)
out[i].close();

//}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
reader.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
//find IP with the largest number of occurrence
public Map.Entry<String,Integer>  statitics(String fileName){
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader (fileName));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
HashMap<String,Integer> map=new HashMap<String,Integer>();
String IP = null;
try {
while((IP =reader.readLine())!= null){
//to judge whether the IP is already
//existed in the HashMap
if(map.containsKey(IP)){
map.put(IP, map.get(IP)+1);
}
else
map.put(IP,1);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//Entry in HashMap with the maximum value
//which means the IP with the largest occurrence
Map.Entry<String,Integer>  maxEntry=null;
for (Map.Entry<String,Integer> entry : map.entrySet()){
if (maxEntry == null || entry.getValue()>maxEntry.getValue()) {
maxEntry = entry;
}
}
try {
reader.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return maxEntry;
}
public static void main(String[] args){
MassiveIP m=new MassiveIP();
String FileName="D://Data//test.txt";
m.generateIP(FileName);
m.FileSplit(FileName);
List<Map.Entry<String,Integer>>l
=new ArrayList<Map.Entry<String,Integer>>();
for(int i=0;i<100;i++)
l.add(m.statitics(FileName+i));
Map.Entry<String,Integer>maxEntry=l.get(0);
for(int j=1;j<100;j++){
if(l.get(j).getValue()>maxEntry.getValue())
maxEntry=l.get(j);
}
System.out.println(maxEntry.getKey());
System.out.println(maxEntry.getValue());

}

}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航