Show The Graduate Center Menu
 
 

Big Data Analytics

Instructor: Assistant Professor Hanghang Tong

  • Office: Gc 4420, x8196  

  • Office hour: 12:45-1:45pm Monday, 4420
     

Background

We are in the age of big data - data that is both large and complex. Big data analytics is the science of analyzing the data, generating insights, and making decisions by addressing all three dimensions of the big data challenges including variety, velocity and volume. It is essential behind many high impact real applications, such as social networks analysis, finance and business intelligence, climate modeling, health care, political science and so on.
 

Course Description

This class aims to provide a comprehensive overview of recent advance in machine learning and data mining to analyze big data. Selected topics include big data clustering and classification, anomaly and fraud detection, time-series analysis, big graph mining, and massive-scale data analytics; as well as case studies in social networks analysis, healthcare, business intelligence, etc.
 

Schedule

  • Lecture 1 - Labor day, no class

  • Lecture 2 - Introduction + Basic Concepts

  • Lecture 3 - Classification 1 (Bayes Clasifier, kNN) + Candidate projects

  • Lecture 4 - Classification 2 – Indexing, Logistic Regression

  • Lecture 5 - Classification 3 - SVM

  • Lecture 6 - Clustering 1 – Kmeans and PCA

  • Lecture 7 - Columbus day, no class

  • Lecture 8 - Web Fraud Detection (Tim Pan)

  • Lecture 9 - Clustering 2 – LSH, GMM and spectral clustering

  • Lecture 10 - Midterm exam

  • Lecture 11 - Time Series

  • Lecture 12 - No class, Prof. Terzi’s seminar moved to Sep. 20th

  • Lecture 13 - Big Graph Mining 1 – Co-clustering

  • Lecture 14 - Big Graph Mining 2 –Low-rank approximation

  • Lecture 15 - Big Graph Mining 3 – Pattern, Dissemination and Proximity / Mining Rare Events from Big Data (Jingrui He)

  • Lecture 16 - Project presentation at cs lab

  • Lecture 17 - Project final report due
     

Big Data Seminar

(10/1st, bi-weekly afterwards): Science Center, room 4102

  • Evimaria Terzi (Boston): Entity Selection and Ranking in Data Mining Applications

  • Fei Wang (IBM): Feature Engineering for Predictive Modeling with Large Scale Electronic Medical Records: Augmentation, Densification and Selection

  • Tim Pan (Google): Click Fraud - Challenges and Remedies

  • Han Liu (Princeton): From High Dimensional Data to Big Data

  • Ruoming Jin (Kent): Finally, Simple, Fast and Scalable Reachability Oracle!

  • Tao Li (FIU): Learning to Understand Documents
     

Late Policy

  • each person has 2 slip days in total for the whole semester. After that, 20% deduction per day of delay

  • no penalty if medical emergence  (need doctor’s notes)