Thursday, February 13, 2014

IS2140_Reading notes_Unit 6


IIR chapter 8.

Chapter 8 Evaluation in information retrieval

 

1, Test Collection

  (1). A document collection

  (2). A test suite of information needs, expressible as queries

  (3). A set of relevance judgments, standardly a binary assessment of either relevant or non-relevant for each query-document pair.

         Relevance is assessed relative to an information need.

 

2,Standard test collections

(1)   Cranfield collection : allowing precise quantitative measures of information retrieval effectiveness.

(2)   Text Retrieval Conference(TREC)

(3)   NII Test Collections for IR System(NTCIR): has built various test collections of similar sizes to the TREC collections, focusing on East Asian language and cross-language information retrieval.

(4)   Cross Language Evaluation Forum(CLEF): concentrating on European languages and cross-language information retrieval.

(5)   Reuters-21578 and Reuters-RCV1

(6)   20 Newsgroups

 

3Evaluation of unranked retrieval sets

(1)   Precision : the fraction of retrieved documents PRECISION that are relevant

 


 

 

(2)   Recall the fraction of relevant documents that are retrieved


 

 


 

P = tp/(tp+ f p)

R = tp/(tp+ f n)

 

 

accuracy = (tp + tn)/(tp + f p + f n + tn)

 

4, ROC Curve

An ROC curve plots the true positive rate or sensitivity against the false positive rate or (1 specificity).  sensitivity is just another term for recall.

The false positive rate = f p/( f p+ tn).

specificity =  tn/( f p+tn)

 

5, kappa statistic

 A common measure for agreement between judges, designed for categorical judgments and corrects a simple agreement rate for the rate of chance agreement.

 

 

No comments:

Post a Comment