Thursday, February 6, 2014

IS2140_Reading notes_Unit 5


IIR chapters 11 and 12

Chapter 11

1, Proba

chain rule: P(A, B) = P(A B) = P(A|B)P(B) = P(B|A)P(A)

a partition rule, P(B) = P(A, B)+ P(A, B)

 

2,The Probability Ranking Principle

(1)    The 1/0 loss case

Probability Ranking Principle (PRP):

 

For a query q and a document d in the collection, let Rd,q be an indicator random variable that says whether d is relevant with respect to a given query q. That is, it takes on a value of 1 when the document is relevant and 0 otherwise. Using a probabilistic model, the obvious order in which to present documents to the user is to rank documents by their estimated probability of relevance with respect to the information need: P(R = 1|d, q).

 

         1/0 loss: a binary situation where you are evaluated on your accuracy

d is relevant iff P((11.6) R = 1|d, q) > P(R = 0|d, q)

(2)    The PRP with retrieval cost

  C0 · P(R = 0|d) C1 · P(R = 1|d) C0 · P(R = 0|d) C1 · P(R = 1|d)

        C1: the cost of not retrieving a relevant document

        C0:  the cost of retrieval of a non relevant document. Then the Probability Ranking Principle says that if for a specific document d and for all documents dnot yet retrieved then d is the next document to be retrieved.

 

3, The Binary Independence Model

The Binary Independence Model (BIM) we present in this section is the model that has traditionally been used with the PRP. It introduces some simple assumptions, which make estimating the probability function P(R|d, q) practical.

A document d is represented by the vector ~x = (x1, . . . , xM) where xt = 1 if term t is present in document d and xt = 0 if t is not present in d.

 

 

Chapter 12 

1, Finite automata and language models

the finite automaton can generate strings that include the examples. The full set of strings that can be generated is called the language of the automaton.

 

2, The query likelihood model

In query likelihood model, we construct from each document d in the collection a language model Md. And then to rank documents by P(d|q), where the probability of a document is interpreted as the likelihood that it is relevant to the query.

P(d|q) = P(q|d)P(d)/P(q)

The Language Modeling approach thus attempts to model the query generation process: Documents are ranked by the probability that a query would be observed as a random sample from the respective document model. Most common way --- the multinomial unigram language model, which is equivalent to a multinomial Naive Bayes model.

 

 

No comments:

Post a Comment