DS 402: Big Data Concepts: Introduction (Even Semester)
Lecture 1 |
Introduction |
||
---|---|---|---|
Date |
20th Jan, 2015 |
||
Lab/Assignment |
Lab Hadoop Deployment Running the first example (NCDC Weather data) Assignment Calculate year-morning/afternoon/night avg. highest temperature Word count on a big Twitter data |
Lecture 1 |
HDFS, Inside Map-Reduce, Introduction to Machine Learning |
||
---|---|---|---|
Date |
27th Jan, 2015 |
||
Lab/Assignment |
Lab Twitter Word Count Assignment Naive Bayes Spam Filtering |
||
Topic to be covered
- An introduction to Hadoop: Key features, deployment, and uses in the Cloud.
- Big Data Reference Architecture, Design Patterns.
- Map Reduce Framework: Design patterns, Map Reduce in various various environments and integration with traditional data ware houses.Batch Processing: Review of Pig, a scripting language for control of MapReduce processes on Hadoop and Hive, a data warehouse like system with fairly complete SQL like syntax.
- HBase and Hive: Fault-tolerant way of sorting, moving and querying large quantities of sparse data.
- Impala and Flume: Overview and integration of Impala with Flume and Solr.
- Stream Computing: Examine Apache Flume NG as a collection technique and associated tools for complex event processing (CEP) application.
References
Text Book
- Hadoop-The Definite Guide, Author. Tom White; Publisher: O'Reilly Media.
- Hadoop in Practice, Author: Alex Holmes; Publisher: O'Reilly Media.
References
- Hadoop Operations; Author: Eric Sammer; Publisher:O'Reilly Media.
- Professional Hadoop Solutions; Authors: Boris Lublinsky, Kevin T. Smith, and Alexey Yakubovich; Publisher: O'Reilly Media.
- Map Reduce Design Patterns -- Building Effective Algorithms and Analytics for Hadoop and Other Systems; Author: Donald Miner; Publisher: O'Reilly Media.
- HBase -- The Definite Guide; Author: Lars George; Publisher: O'Reilly Media.