IIR chapters 11 and 12
Chapter 11
1, Proba
chain rule: P(A, B) = P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A)
a partition
rule, P(B) = P(A, B)+ P(A, B)
2,The Probability Ranking
Principle
(1)
The 1/0 loss
case
Probability Ranking Principle (PRP):
For
a query q and a document d in the collection, let Rd,q be an indicator random variable that says whether d is relevant with respect to a given
query q. That is, it takes on a
value of 1 when the document is relevant and 0 otherwise. Using a probabilistic
model, the obvious order in which to present documents to the user is to rank
documents by their estimated probability of relevance with respect to the
information need: P(R = 1|d, q).
1/0
loss: a binary situation where you are evaluated on your accuracy
d is
relevant iff P((11.6) R = 1|d, q) > P(R = 0|d, q)
(2)
The PRP with
retrieval cost
C0
· P(R = 0|d) − C1 · P(R = 1|d)
≤ C0 · P(R = 0|d′)
− C1 · P(R = 1|d′)
C1:
the cost of not retrieving a relevant document
C0:
the cost of retrieval of a non relevant document. Then the Probability
Ranking Principle says that if for a specific document d and for all documents d′ not yet retrieved then d is the next document to be
retrieved.
3, The Binary Independence Model
The Binary
Independence Model (BIM) we present in this section is the model that
has traditionally been used with the PRP. It introduces some simple
assumptions, which make estimating the probability function P(R|d,
q) practical.
A document d
is represented by the vector ~x
= (x1, . . . , xM) where xt = 1 if term t is present in document d and xt = 0 if t is
not present in d.
Chapter 12
1, Finite
automata and language models
the finite automaton can generate strings that
include the examples. The full set of strings that can be generated is called
the language of the automaton.
2, The query likelihood model
In query likelihood model, we construct from each
document d in the collection a
language model Md. And then to
rank documents by P(d|q), where the probability of a
document is interpreted as the likelihood
that it is relevant to the query.
P(d|q)
= P(q|d)P(d)/P(q)
The Language Modeling approach thus attempts to
model the query generation process: Documents are ranked by the probability
that a query would be observed as a random sample from the respective document
model. Most common way --- the multinomial unigram language model, which is
equivalent to a multinomial Naive Bayes model.
No comments:
Post a Comment