斯坦福-mining massive data sets
CS246
Mining Massive Data Sets
Winter 2012
http://www.stanford.edu/class/cs246/
The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis will be on Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data.
Topics include: Frequent itemsets and Association rules, Near Neighbor Search in High Dimensional Data, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Systems, Clustering, Link Analysis, Large scale supervised machine learning, Data streams, Mining the Web for Structured Data, Relation extraction and Web Advertising.
CS246 is the first part in a two part sequence CS246--CS341. CS246 will discuss methods and algorithms for mining massive data sets, whileCS341: Project in Mining Massive Data Sets will be a project-focused advanced class with an unlimited access to a large MapReduce cluster.
CS341 (Project in Mining Massive Data Sets) is a project-focused advanced class with access to a large MapReduce cluster. This course is the second part in a two part sequence CS246/CS341 replacing CS345A: Data Mining. CS246 discusses methods and algorithms for mining massive data sets.
In this class, we will develop large scale data mining techniques and research projects. Students will have access to Amazon EC2 comptuing cluster. This means we will be able to run massive MapReduce jobs. Because it is challenging to work on algorithms for large scale data mining, we will be able to work with only a small number of students, and enrollment will be limited.
This is a purely project based course. We expect that students are already to some extent familiar with data mining methods. There will be lectures on some advanced data mining algorithm at the begging of the quarter. We also expect to have a good number of industrial guest lecturers discussing big data case studies.
CS345A, Winter 2009: Data Mining
浙公网安备 33010602011771号