您的位置：首页 > 编程语言 > Lua

Performance Measures and Evaluation on IR System

2014-02-28 13:30 288 查看

All common measures generally assume a ground truth notion of relevance: every document is known to be either relevant or non-relevance to a particular query.
1. Precision and Recall

Precision is the fraction of the documents retrieved that are relevant to the user’s information need.

Recall is the fraction of the documents that are relevant to the query that are successful retrieved.

$A$
: Retrieved documents

$C:$
Relevant documents
$B=A{\cap}C$

So, we will have

$Precision=\frac{B}{A},Recall=\frac{B}{C}$

2. Fall-out

Fall-out is the proportion of non-relevant documents that are retrieved, out of all non-relevant documents available:

$fall-out=\frac{A-B}{-C}$

It can be looked at as the probability that a non-relevant document is retrieved by a query.

3. F-measure

F-measure or F-score is the weighted harmonic mean of precision and recall.

The traditional F-measure or balanced F-score is:

$F=\frac{2{\cdot}precision{\cdot}recall}{(precision+recall)}$

The general formula for non-negative real
$\beta$
is

$F_\beta=\frac{\left(1+\beta^2\right){\cdot}precision{\cdot}recall}{\left(\beta^2{\cdot}precision+recall\right)}$

4. Average Precision

By computing a precision and recall at every position in the ranked sequence of documents, one can plot a precision-recall curve, plotting precision as

$p(r)$
a function of recall
$r$
.

Average Precision computes the average value of over the interval from

$r=0$
to
$r=1$
.

$AvgP=\int_0^1 p(r)\,\mathrm{d}x$

This integral is in practice replaced with a finite sum over every position in the ranked sequence of documents.

$AvgP=\sum_{k=1}^{k=n}P(k)\Detar(k)$

Where k is the rank in the sequence of retrieved documents, n is the number of retrieved documents,P(k) is the precision at cut-off k in the list, and

${\Deta}r(k)$
is the change in recall from items k-1 to k.

5. R-Precision

Precision at
$R_{th}$
position in the ranking of results for a query that has R relevant documents.

6. Mean average precision

Mean average precision for a set of queries is the mean of the average precision scores for each query.

$MAP=\frac{\sum_{q=1}^QAveP(q)}{Q}$

Where Q is the number of queries.

7. Discounted cumulative gain

DCG uses a graded relevance scale of documents from the results set to evaluate the usefulness or gain, of a document based on its position in the result list.

The DCG accumulated at a particular rank position p is defined as:

$DCG_p=rel_1+\sum_{i=2}^p\frac{rel_i}{log_2i}$

Precision and Recall
1. Information Retrieval

Precision is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.
Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents.

2. Classification task

Precision is defined as the number of true positives divided by the total number of elements labeled as belonging to the positive class
(i.e.the sum of true positives and false positives). Precision is also called positive predict value (PPV).
Recall is defined as the number of true positives divided by the total number of elements that actually belong to positive class (i.e.the
sum of true positives and false negatives). Recall is also called sensitivity or true positive rate.

3. Relationship

Often, there is an inverse relationship between precision and recall.Usually, precision and recall scores are not discussed in isolation. Instead,either values for one measure are compared for a fixed level at the other measure
or both are combined into a single measure (such as F-measure).

Confusion Matrix(contingency table)
Each column of the matrix represents the instance in a predicted class, while each row represents the instances in an actual class.

Confusion Matrix allows more detailed analysis than accuracy. Accuracy is not a reliable metric for the real performance of a classifier, because it will yield misleading results if the data set is unbalanced (that is, when the
number of samples in different classes vary greatly).

Reference:

[1] http://en.wikipedia.org/wiki/Information_retrieval

[2] http://en.wikipedia.org/wiki/Precision_and_recall

[3] http://en.wikipedia.org/wiki/Confusion_matrix

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： performance evaluation Information Retrival

相关文章推荐

新的分享

章节导航