Information retrieval is the process through which a computer system can respond to a user's query for text-based information on a specific topic. IR was one of the first and remains one of the most important problems in the domain of natural language processing (NLP). Web search is the application of information retrieval techniques to the largest corpus of text anywhere -- the web -- and it is the area in which most people interact with IR systems most frequently. This course aims at introducing the area of Information Retrieval and at examining the theoretical and practical issues involved in designing, implementing and evaluating Information Retrieval systems.
This course will discuss fundamental problems in information retrieval such as building blocks of search engine systems and a wide coverage of many IR applications (Personalized recommendation, Online advertising). The students will get hands-on experience by developing practical systems/components. It will prepare students for doing cutting-edge research in information retrieval and related fields which will open the door to job opportunities in IT industry.
The following books are for your reference. The first book is the required text book for this course.
- Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schuetze, Cambridge University Press, 2008.
- Modern Information Retrieval (2nd Edition). Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison-Wesley, 2011.
- Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, and Trevor Strohman, Pearson Education, 2009.
- Statistical Language Models for Information Retrieval. ChengXiang Zhai, Morgan & Claypool Publishers, 2008.
- Information Retrieval: Implementing and Evaluating Search Engines. Stefan Buttcher, Charlie Clarke, Gordon Cormack, MIT Press, 2010.
- Information Retrieval: Algorithms And Heuristics. David A. Grossman, Ophir Frieder), 2nd edition, 2004, Springer.
1. Search engine architecture o Basic search engine architecture
o Web crawler and basic text processing techniques
o Inverted Index and Query processing
o Search result interface
2. Retrieval models o Boolean and vector space model
o Latent Semantic Indexing
o Probabilistic ranking principle
o Language models
3. Retrieval evaluation
o Classic IR evaluations
o Modern IR evaluations
4. Relevance feedback
o Modeling feedback
o Modeling implicit feedback & Click modeling
5. Link analysis
o Pagerank and HITS
o Social network analysis
6. IR applications
o Learning to Rank
o Recommendation of results
o Online advertising
Upon successful completion of this course, the students will have acquired the following knowledge and skills:
- Understand basic building blocks of a modern search engine system, including web crawler, inverted index, query processing, search result interface.
- Understand classical retrieval models, including Boolean, vector space, probabilistic and language models.
- Understand recent development of learning-based ranking algorithms, i.e., learning-to-rank.
- Assess the quality of deployed retrieval systems using different measures for evaluating the performance of information retrieval systems.
- Understand user feedback for retrieval systems to evaluate the performance and improve the effectiveness of their service strategies.
Tests (2) 30%