您的位置:首页 > 移动开发

Cluster Analysis 的 评估方法 Evaluation Methods(Apply For Bioinformatics)

2014-03-16 22:38 337 查看

Cluster Analysis 的 评估方法 Evaluation Methods(Apply For Bioinformatics)

-----------------------【Basic information】--------------------------------

【Density】

Density is to describe how closely the nodes in the cluster interact with each other. Given a cluster consisting of n nodes and m edges, its density is 2m/n(n-1).

【size distribution】

The size distribution is to describe the basic information of the cluster results by showing the charts with the nodes size and the cluster number distributed on it.



-----------------------------【c-score】-------------------------------------

【Gene Annotation】基因注释

1,GO:Biological Process (BP)/Molecular Function(MF)/Cellular Component(CC)

2,MIPS :Munich Information Center for Protein Sequences, 
a genomics research center in Germany

3,other



【p-value】

In order to detect the functional characteristics of the predicted clusters,we compare the predicted clusters with known functional classification.The P-value based on hypergeometric distribution is often used to estimate whether a given set of proteins is
accumulated by chance.It has been used as a criteria to assign each predicted cluster a main function.Here,we also calculate Pvalue for each predicted cluster and assign a function category to it when the minimum P-value occurrs.



【precision】

The Precision for a cluster is the number of true positives divided by the total number of elements labeled as belonging to the positive cluster.

precision = tp/(tp+fp) where tp is the number of overlap and fp+tp is the namuber of the nodes in the cluster

【recall】

Recall is defined as the number of true positives divided by the total number of elements that actually belong to the positive.

recall=tp/(tp+fn) where tp is the number of overlap and tp+fn is the number of the background

【f-measure】

A measure that combines Precision and Recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score.

f-measure=2*precision*recall/(precision+recall)



-------------【Comparison with known complexes】-----------------------------

【Compare with known complexes】 OS/Kc/Pc

To evaluate the effectiveness of a algorithm for detecting protein complexes,we compare the predicted clusters produced by the algorithm with known protein complexes,The overlapping scoreOS(Pc,Kc) between a predicted cluster
Pc and a known complex Kc is calculated by the following formula: OS (Pc,Kc)=i*i/a*b where i is the size of the intersection set of the predicted cluster and the known complex,  a is the size of the predicted cluster and b is the size of the known complex.

【Sn & Sp】

  Sensitivity and specificity are two important aspects to estimate the performance of algorithms for detecting protein complexes.Sensitivity is the fraction of the true-positive predictions out of all the true predictions,defined
by the following formula:Sn = TP/(TP+FN) where TP(true positive)is the number of the predicted  clusters matched by the known complexes with OS(Pc,Kc)≥os(the default os value is 0.2,here you can also set the os value),and FN(false negative)is the number of
the known complexes that are not matched by the predicted clusters.Specificity is the fraction of the true-positive predictions out of all the positive predictions,defined by the following formula:Sp=TP/(TP+FP) where FP(false positive)equals the total number
of the predicted clusters minus TP.According to the assumption ,a predicted cluster and a known complex are considered to be matched if OS(Pc,Kc)≥os(os is the value you set).Generally,we use 0.2 as the matched overlapping threshold but here you can set the
value you like.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Cluster Evaluation
相关文章推荐