您的位置:首页 > 编程语言 > Java开发

2.交通聚类 -层次聚类(agnes)Java实现

2015-11-29 21:58 323 查看
1.项目背景

在做交通路线分析的时候,客户需要找出车辆的行车规律,我们将车辆每天的行车路线当做一个数据样本,总共有365天或是更多,从这些数据中通过聚类来获得行车路线规律统计分析。

我首先想到是K-means算法,不过它的算法思想是任选K个中心点,然后不停的迭代,在迭代的过程中需要不停的更新中心点。在我们着这个项目中,此方案不能解决,因为我们是通过编辑距离来计算两条路线的相似度。可以参考(1.交通聚类:编辑距离(Levenshtein距离)Java实现)这篇文章了解一下编辑距离。当我们第一步选出k个中心点后,并且两两计算编辑距离,然后再重新选择中心点,这时问题出来了,我们得到了编辑距离的均值,是数字化的。如果在根据这个均值两两比较编辑距离就无法实现了。故此方案废弃。

2.层次聚类概述(HierarchicalClustering)

基于层次的聚类方法(系统聚类方法):对给定的数据集进行层次分解,直到某种条件满足为止。

1.凝聚的层次聚类:自底向上,首先将每个对象作为一个族开始,每一步合并两个最近的簇,直到满足簇的数目。如AGNES算法。每一项自成一类;迭代,将最近的两类合并为一类。

2.分裂的层次聚类:自顶向下,从包含所有对象的一个簇开始,每一步分裂一个簇,直到簇的数目。如DLANA算法。将所有项看做一类;找出最不相似的项分裂出去成为两类。

相信大家读完上边的陈述已经明白是怎么回事了。下边是代码,供以后用到的朋友学习研究。

3.代码:

packageagenes;/**
*Createdbyzzyon15/11/15.
*/

importjava.util.ArrayList;
importjava.util.List;

publicclassCluster{
privateList<DataPoint>dataPoints=newArrayList<DataPoint>();//类簇中的样本点
privateStringclusterName;

publicList<DataPoint>getDataPoints(){
returndataPoints;
}

publicvoidsetDataPoints(List<DataPoint>dataPoints){
this.dataPoints=dataPoints;
}

publicStringgetClusterName(){
returnclusterName;
}

publicvoidsetClusterName(StringclusterName){
this.clusterName=clusterName;
}

}

packageagenes;/**
*Createdbyzzyon15/11/15.
*/

importjava.util.ArrayList;
importjava.util.List;

publicclassClusterAnalysis{

publicList<Cluster>startAnalysis(List<DataPoint>dataPoints,intClusterNum){
List<Cluster>finalClusters=newArrayList<Cluster>();

List<Cluster>originalClusters=initialCluster(dataPoints);
finalClusters=originalClusters;
while(finalClusters.size()>ClusterNum){
doublemin=Double.MAX_VALUE;
intmergeIndexA=0;
intmergeIndexB=0;
for(inti=0;i<finalClusters.size();i++){
for(intj=0;j<finalClusters.size();j++){
if(i!=j){
ClusterclusterA=finalClusters.get(i);
ClusterclusterB=finalClusters.get(j);

List<DataPoint>dataPointsA=clusterA.getDataPoints();
List<DataPoint>dataPointsB=clusterB.getDataPoints();

for(intm=0;m<dataPointsA.size();m++){
for(intn=0;n<dataPointsB.size();n++){
doubletempDis=getDistance(dataPointsA.get(m),dataPointsB.get(n));
if(tempDis<min){
min=tempDis;
mergeIndexA=i;
mergeIndexB=j;
}
}
}
}
}//endforj
}//endfori
//合并cluster[mergeIndexA]和cluster[mergeIndexB]
finalClusters=mergeCluster(finalClusters,mergeIndexA,mergeIndexB);
}//endwhile

returnfinalClusters;
}

privateList<Cluster>mergeCluster(List<Cluster>clusters,intmergeIndexA,intmergeIndexB){
if(mergeIndexA!=mergeIndexB){
//将cluster[mergeIndexB]中的DataPoint加入到cluster[mergeIndexA]
ClusterclusterA=clusters.get(mergeIndexA);
ClusterclusterB=clusters.get(mergeIndexB);

List<DataPoint>dpA=clusterA.getDataPoints();
List<DataPoint>dpB=clusterB.getDataPoints();

for(DataPointdp:dpB){
DataPointtempDp=newDataPoint();
//tempDp.setDataPointName(dp.getDataPointName());
//tempDp.setDimensioin(dp.getDimensioin());
//tempDp.setCluster(clusterA);
tempDp=dp;
tempDp.setCluster(clusterA);
dpA.add(tempDp);
}

clusterA.setDataPoints(dpA);

//List<Cluster>clusters中移除cluster[mergeIndexB]
clusters.remove(mergeIndexB);
}

returnclusters;
}

//初始化类簇
privateList<Cluster>initialCluster(List<DataPoint>dataPoints){
List<Cluster>originalClusters=newArrayList<Cluster>();
for(inti=0;i<dataPoints.size();i++){
DataPointtempDataPoint=dataPoints.get(i);
List<DataPoint>tempDataPoints=newArrayList<DataPoint>();
tempDataPoints.add(tempDataPoint);

ClustertempCluster=newCluster();
tempCluster.setClusterName("Cluster"+String.valueOf(i));
tempCluster.setDataPoints(tempDataPoints);

tempDataPoint.setCluster(tempCluster);
originalClusters.add(tempCluster);
}

returnoriginalClusters;
}

//计算两个样本点之间的欧几里得距离
privatedoublegetDistance(DataPointdpA,DataPointdpB){
doubledistance=0;
double[]dimA=dpA.getDimensioin();
double[]dimB=dpB.getDimensioin();

if(dimA.length==dimB.length){
for(inti=0;i<dimA.length;i++){
doubletemp=Math.pow((dimA[i]-dimB[i]),2);
distance=distance+temp;
}
distance=Math.pow(distance,0.5);
}

returndistance;
}

publicstaticvoidmain(String[]args){
ArrayList<DataPoint>dpoints=newArrayList<DataPoint>();

double[]a={2,3};
double[]b={2,4};
double[]c={1,4};
double[]d={1,3};
double[]e={2,2};
double[]f={3,2};

double[]g={8,7};
double[]h={8,6};
double[]i={7,7};
double[]j={7,6};
double[]k={8,5};

//double[]l={100,2};//孤立点

double[]m={8,20};
double[]n={8,19};
double[]o={7,18};
double[]p={7,17};
double[]q={8,20};

dpoints.add(newDataPoint(a,"a"));
dpoints.add(newDataPoint(b,"b"));
dpoints.add(newDataPoint(c,"c"));
dpoints.add(newDataPoint(d,"d"));
dpoints.add(newDataPoint(e,"e"));
dpoints.add(newDataPoint(f,"f"));

dpoints.add(newDataPoint(g,"g"));
dpoints.add(newDataPoint(h,"h"));
dpoints.add(newDataPoint(i,"i"));
dpoints.add(newDataPoint(j,"j"));
dpoints.add(newDataPoint(k,"k"));

//dataPoints.add(newDataPoint(l,"l"));
//dpoints.add(newDataPoint(l,"l"));
dpoints.add(newDataPoint(m,"m"));
dpoints.add(newDataPoint(n,"n"));
dpoints.add(newDataPoint(o,"o"));
dpoints.add(newDataPoint(p,"p"));
dpoints.add(newDataPoint(q,"q"));

intclusterNum=3;//类簇数

ClusterAnalysisca=newClusterAnalysis();
List<Cluster>clusters=ca.startAnalysis(dpoints,clusterNum);

for(Clustercl:clusters){
System.out.println("------"+cl.getClusterName()+"------");
List<DataPoint>tempDps=cl.getDataPoints();
for(DataPointtempdp:tempDps){
System.out.println(tempdp.getDataPointName());
}
}

}
}


.csharpcode,.csharpcodepre
{
font-size:small;
color:black;
font-family:consolas,"CourierNew",courier,monospace;
background-color:#ffffff;
/*white-space:pre;*/
}
.csharpcodepre{margin:0em;}
.csharpcode.rem{color:#008000;}
.csharpcode.kwrd{color:#0000ff;}
.csharpcode.str{color:#006080;}
.csharpcode.op{color:#0000c0;}
.csharpcode.preproc{color:#cc6633;}
.csharpcode.asp{background-color:#ffff00;}
.csharpcode.html{color:#800000;}
.csharpcode.attr{color:#ff0000;}
.csharpcode.alt
{
background-color:#f4f4f4;
width:100%;
margin:0em;
}
.csharpcode.lnum{color:#606060;}



packageagenes;

/**
*Createdbyzzyon15/11/15.
*/

publicclassDataPoint{
StringdataPointName;//样本点名
Clustercluster;//样本点所属类簇
privatedoubledimensioin[];//样本点的维度

publicDataPoint(){

}

publicDataPoint(double[]dimensioin,StringdataPointName){
this.dataPointName=dataPointName;
this.dimensioin=dimensioin;
}

publicdouble[]getDimensioin(){
returndimensioin;
}

publicvoidsetDimensioin(double[]dimensioin){
this.dimensioin=dimensioin;
}

publicClustergetCluster(){
returncluster;
}

publicvoidsetCluster(Clustercluster){
this.cluster=cluster;
}

publicStringgetDataPointName(){
returndataPointName;
}

publicvoidsetDataPointName(StringdataPointName){
this.dataPointName=dataPointName;
}
}
源码github:https://github.com/chaoren399/dkdemo/tree/master/agenes/src/agenes


.csharpcode,.csharpcodepre
{
font-size:small;
color:black;
font-family:consolas,"CourierNew",courier,monospace;
background-color:#ffffff;
/*white-space:pre;*/
}
.csharpcodepre{margin:0em;}
.csharpcode.rem{color:#008000;}
.csharpcode.kwrd{color:#0000ff;}
.csharpcode.str{color:#006080;}
.csharpcode.op{color:#0000c0;}
.csharpcode.preproc{color:#cc6633;}
.csharpcode.asp{background-color:#ffff00;}
.csharpcode.html{color:#800000;}
.csharpcode.attr{color:#ff0000;}
.csharpcode.alt
{
background-color:#f4f4f4;
width:100%;
margin:0em;
}
.csharpcode.lnum{color:#606060;}

.csharpcode,.csharpcodepre
{
font-size:small;
color:black;
font-family:consolas,"CourierNew",courier,monospace;
background-color:#ffffff;
/*white-space:pre;*/
}
.csharpcodepre{margin:0em;}
.csharpcode.rem{color:#008000;}
.csharpcode.kwrd{color:#0000ff;}
.csharpcode.str{color:#006080;}
.csharpcode.op{color:#0000c0;}
.csharpcode.preproc{color:#cc6633;}
.csharpcode.asp{background-color:#ffff00;}
.csharpcode.html{color:#800000;}
.csharpcode.attr{color:#ff0000;}
.csharpcode.alt
{
background-color:#f4f4f4;
width:100%;
margin:0em;
}
.csharpcode.lnum{color:#606060;}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐
章节导航