IIR
chapter 8.
Chapter
8 Evaluation in information retrieval
1,
Test Collection
(1). A document collection
(2). A test suite of information needs,
expressible as queries
(3). A set of relevance judgments, standardly
a binary assessment of either relevant
or non-relevant for each
query-document pair.
Relevance is assessed relative to an
information need.
2,Standard
test collections
(1) Cranfield collection : allowing precise quantitative
measures of information retrieval effectiveness.
(2) Text Retrieval Conference(TREC)
(3) NII Test Collections for IR System(NTCIR): has built
various test collections of similar sizes to the TREC collections, focusing on
East Asian language and cross-language information retrieval.
(4) Cross Language Evaluation Forum(CLEF): concentrating on
European languages and cross-language information retrieval.
(5) Reuters-21578 and Reuters-RCV1
(6) 20 Newsgroups
3,Evaluation of unranked retrieval sets
(1) Precision : the fraction of retrieved documents PRECISION
that are relevant

(2) Recall: the fraction of relevant documents that are retrieved


P = tp/(tp+ f p)
R = tp/(tp+ f n)
accuracy
= (tp + tn)/(tp + f p + f n + tn)
4,
ROC Curve
An ROC
curve plots the true positive rate or sensitivity against the false positive
rate or (1 − specificity).
sensitivity is
just another term for recall.
The
false positive rate = f p/( f
p+ tn).
specificity
= tn/( f
p+tn)
5, kappa statistic
A common measure for agreement between judges,
designed for categorical judgments and corrects a simple agreement rate for the
rate of chance agreement.
No comments:
Post a Comment