Cluster Analysis 的 评估方法 Evaluation Methods(Apply For Bioinformatics)
2014-03-16 22:38
337 查看
Cluster Analysis 的 评估方法 Evaluation Methods(Apply For Bioinformatics)
-----------------------【Basic information】--------------------------------
【Density】
Density is to describe how closely the nodes in the cluster interact with each other. Given a cluster consisting of n nodes and m edges, its density is 2m/n(n-1).【size distribution】
The size distribution is to describe the basic information of the cluster results by showing the charts with the nodes size and the cluster number distributed on it.
-----------------------------【c-score】-------------------------------------
【Gene Annotation】基因注释1,GO:Biological Process (BP)/Molecular Function(MF)/Cellular Component(CC)
2,MIPS :Munich Information Center for Protein Sequences,
a genomics research center in Germany
3,other
【p-value】
In order to detect the functional characteristics of the predicted clusters,we compare the predicted clusters with known functional classification.The P-value based on hypergeometric distribution is often used to estimate whether a given set of proteins isaccumulated by chance.It has been used as a criteria to assign each predicted cluster a main function.Here,we also calculate Pvalue for each predicted cluster and assign a function category to it when the minimum P-value occurrs.
【precision】
The Precision for a cluster is the number of true positives divided by the total number of elements labeled as belonging to the positive cluster.precision = tp/(tp+fp) where tp is the number of overlap and fp+tp is the namuber of the nodes in the cluster
【recall】
Recall is defined as the number of true positives divided by the total number of elements that actually belong to the positive.recall=tp/(tp+fn) where tp is the number of overlap and tp+fn is the number of the background
【f-measure】
A measure that combines Precision and Recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score.f-measure=2*precision*recall/(precision+recall)
-------------【Comparison with known complexes】-----------------------------
【Compare with known complexes】 OS/Kc/Pc
To evaluate the effectiveness of a algorithm for detecting protein complexes,we compare the predicted clusters produced by the algorithm with known protein complexes,The overlapping scoreOS(Pc,Kc) between a predicted clusterPc and a known complex Kc is calculated by the following formula: OS (Pc,Kc)=i*i/a*b where i is the size of the intersection set of the predicted cluster and the known complex, a is the size of the predicted cluster and b is the size of the known complex.
【Sn & Sp】
Sensitivity and specificity are two important aspects to estimate the performance of algorithms for detecting protein complexes.Sensitivity is the fraction of the true-positive predictions out of all the true predictions,definedby the following formula:Sn = TP/(TP+FN) where TP(true positive)is the number of the predicted clusters matched by the known complexes with OS(Pc,Kc)≥os(the default os value is 0.2,here you can also set the os value),and FN(false negative)is the number of
the known complexes that are not matched by the predicted clusters.Specificity is the fraction of the true-positive predictions out of all the positive predictions,defined by the following formula:Sp=TP/(TP+FP) where FP(false positive)equals the total number
of the predicted clusters minus TP.According to the assumption ,a predicted cluster and a known complex are considered to be matched if OS(Pc,Kc)≥os(os is the value you set).Generally,we use 0.2 as the matched overlapping threshold but here you can set the
value you like.
相关文章推荐
- 《Statistical Methods for Recommender Systems》阅读笔记--第4章(1)评估方法--数据切分及线上评估方法
- Java for Bioinformatics and Biomedical Applications
- Bioinformatics Homework for 2009 Graduate student
- #197 – Override Application Class Methods for Standard Events(为Application类的标准事件重写方法)
- 'Could not apply the stored configuration for the monitor'解决方法
- the evaluation period for visual studio trial edition has ended的解决方法-转发
- 立体匹配综述阅读心得之Classification and evaluation of cost aggregation methods for stereo correspondence
- 8. Establish a single-number evaluation metric for your team to optimize 建立一个单一数字的评估指标(MACHINE LE
- Grid Computing for Bioinformatics and Computational Biology
- USING STATIC IMPORTS FOR CONSTANTS AND METHODS(使用静态导入引用常量与方法)
- Classification and evaluation of cost aggregation methods for stereo correspondence
- A good web link to many involved papers to read for bioinformatics...
- 【分享】视频视觉显著度数据集和评测方法(A dataset and evaluation methodology for visual saliency in video)
- 160206 - Three methods for multistep ahead prediction 多步预测的方法
- 'Could not apply the stored configuration for the monitor'解决方法-fedora19
- getOutputStream() has already been called for this response异常的原因和解决方法[转]
- getOutputStream() has already been called for this response 的解决方法
- struts2 xml 验证出现 Invalid field value for field 的解决方法(转)
- 关于NHibernate之 no persister for: ×××.××错误的解决方法
- getOutputStream() has already been called for this response异常的原因和解决方法