Thursday, April 10, 2014

IS2140_Muddiest Points_Class12


Class 12

Integrating users’ clickthrough data is often the most effective approach. Does the clickthrough data mean the number of total clicks on one specific link which is relevant to the query or all relevant links?

Thursday, April 3, 2014

IS2140_Reading notes_Unit 12

1,  Personal email sorting

A user may have folders like talk announcements, electronic bills, email from family and friends, and so on, and may want a classifier to classify each incoming email and automatically move it to the appropriate folder.
 
2,  Topic-specific or vertical search

Vertical search engines restrict searches to a particular topic. For example, the query computer science on a vertical search engine for the topic China will return a list of Chinese computer science departments with higher precision and recall than the query computer science China on a general purpose search engine.

3,  labeling refers to the process of annotating each document with its class. But labeling is arguably an easier task than writing rules.

 
4,  Mutual information measures how much information – in the information theoretic sense – a term contains about the class. If a term’s distribution is the same in the class as it is in the collection as a whole, then I(U; C) = 0. MI reaches its maximum value if the term is a perfect indicator for class membership, that is, if the term is present in a document if and only if the document is in the class.

IS2140_Muddiest Points_Class11

In query translation, the searcher expresses queries in document language. I wonder that the web pages which rank high will be translated into searcher's language or just stay in document language.