Aimed at non-CS undergraduate and graduate students who want to learn the basics of big data tools and techniques and apply that knowledge in their areas of study. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive data sets. This course provides a broad and practical introduction to big data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.
Time Tuesdays & Thursdays 1:30-2:50 PM
Location: Building 320 room 105 (Geology corner)
The CAs hold 20 hours of office hours a week, Monday-Friday, in reserved areas in the Engineering Quad. Times and places are given in the course calendar.
Professor Widom holds office hours on Wednesdays 4:00-5:00pm in the Dean's Office #227 on the 2nd floor of the Huang building. Updates to her office hours will be posted on the course calendar.
Grades for the course will be weighted equally on composite scores for projects, exams, and homework assignments. That is, the 5 homework assignments will carry the same weight as the 2 exams. There will be 5 assignments, 2 projects, a midterm exam, and a final exam. See the syllabus below for dates and times. There will be no alternate exams, so please make sure you will be available for the midterm on Feb 14 and the final exam on March 18.
Please use Piazza for all questions related to the course. We will be using Piazza as our primary portal for course-related announcements, so make sure to sign up! For all Piazza posts, we guarantee that we will respond within 24 hours. DO NOT post assignment code on Piazza for debugging; we will not respond to posts containing assignment code. Also check out the list of frequently asked questions.
|Date||Topic and Assignments||Readings/References||Notes|
|Tue Jan 8||Introductions, course logistics, Big Data Overview (start)||Introductory Readings||Big Data Overview|
|Thu Jan 10|| Big Data Overview (finish) |
Data Analysis & Visualization Using Spreadsheets (Part 1)
|Google Spreadsheets References|| Data Analysis Using Spreadsheets Slides |
|Mon Jan 14|| Assignment 1 released: Spreadsheets |
Project 1 released: Personal Data Analysis
|Tue Jan 15||Data Analysis & Visualization Using Spreadsheets (Part 2)|
|Thu Jan 17||Advanced Data Visualization Using Tableau|
|Mon Jan 21|| Assignment 1 due |
Assignment 2 released: Tableau, SQL
|Tue Jan 22||Relational Databases and Basic SQL|
|Thu Jan 24||Advanced SQL|
|Mon Jan 28|| Project 1 proposal due |
|Tue Jan 29||Introduction to Python |
(optional if familiar with Python including lists and dictionaries)
|Thu Jan 31||Python for Data Analysis & Visualization (part 1)|
|Thu Jan 31|| Assignment 2 due |
Assignment 3 released: Python
|Tue Feb 5||Python for Data Analysis & Visualization (part 2)|
|Thu Feb 7||Machine Learning - Regression|
|Mon Feb 11|| Assignment 3 due |
|Tue Feb 12||Machine Learning - Classification and Clustering|
|Thu Feb 14|| Midterm Exam - in class |
|Mon Feb 18|| Project 1 due |
Assignment 4 released: Machine Learning, R
Project 2 released: Movie-Rating Predictions
|Tue Feb 19||Using Python for Machine Learning|
|Thu Feb 21||The R Language - Data Analysis, Visualization, and Machine Learning|
|Tue Feb 26||Data Mining Algorithms|
|Thu Feb 28||Data Mining Using SQL and Python|
|Thu Feb 28|| Assignment 4 due |
Assignment 5 released: Data Mining, Network Analysis, Unstructured Data
|Tue March 5||Network Analysis|
|Thu March 8||Unstructured Data|
|Thu March 8||Project 2 due|
|Tue March 12||Guest lecture: Big Data Platforms and Services|
|Thu March 14|| Project #2 results and discussion Correlation and causation |
Follow-on courses and pathways
|Thu March 14||Assignment 5 due|
|Mon March 18|| Final Exam 12:15-3:15 PM |
Students with Documented DisabilitiesStudents who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. For CS102 we require accommodation letters to be filed with the instructor a minimum of two weeks before the requested accommodation. This policy is strictly enforced. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).