Information Retrieval
Lecture 1 |
Information Retrieval: Natural Language Processing - Introduction |
||
---|---|---|---|
Date |
26th Aug, 2015 |
||
Lab/Assignment |
Lab Regular Expression for tokenization and sentence boundary identification. Assignment Find out data set for your respective languages on Twitter/Facebook Run tokenization and sentence boundary identification on the data |
Lecture 2 |
Spell Correction - Minimum Edit Distance |
||
---|---|---|---|
Date |
28th Aug, 2015 |
||
Lab/Assignment |
Topic to be covered
- Introduction to Information Retrieval, NLP basics, Reg Exp, Tokenization, Stemming, Sentence Boundary Detection
- TF-IDF and Compression
- Page-Rank and Link-Analysis
- Spell Correction, MED
- Language Modelling, Language Identification
- Text Classification. Naive Bayes, Spam-Ham, WEKA
- Evaluation of IR
- Clustering, Document Similarity, Cosine Scores
- Vector space Models
- Relevance feedback and Query Expansion
- =================================Mid Sem: Oct 15-17======================================
- Probabilistic information retrieval
- Question Answering
- Introduction to Big data Hadoop, Map-Reduce
- =================================End Sem: Dec 11-17======================================
References
Text Book
- Introduction to Information Retrieval, by C. Manning, P. Raghavan, and H. Schütze. Cambridge University Press, 2008.
- Readings in Information Retrieval. Karen Sparck Jones and Peter Willett. San Francisco : Morgan Kaufmann, 1997
References
- Search Engines: Information Retrieval in Practice, by Bruce Croft, Donald Metzler and Trevor Strohman.
- Information Retrieval: Algorithms and Heuristics. David A. Grossman and Ophir Frieder. Dordrecht, The Netherlands: Springer, 2004
- Modern Information Retrieval, by R. Baeza-Yates and B. Ribeiro-Neto.
Evaluation and Grading
- Mid sems: 30%
- End sems: 30%
- Group Project: 20%
- Class projects: 20%.
Attendance Policy
Attendance will be taken everyday and missing class can be expected to significantly reduce your chances of success. There will be no repetition.