Supervised Learning (DSCI 425) - Spring Semester 2018

Instructor:  Dr. Brant Deppa
Office:  124B Gildemeister Hall
Phone:  457 - 5457

Office Hours:

MWF 1:00 - 3:00
TTh 9:00-12:00
                             or by appointment

e-mail:   bdeppa@winona.edu

Textbook:

An Introduction to Statistical Learning - James, Witten, Hastie, and Tibshirani (Springer-Verlag 2014). The book is available for free in PDF format at the textbook website: www.StatLearning.com.

More Advanced Text - Elements of Statistical Learning - Hastie and Tibshirani (Spring-Verlag). This book is also available for free in PDF format at the textbook website: https://web.stanford.edu/~hastie/ElemStatLearn/

Another Optional Text - Applied Predictive Modeling by Max Kuhn and Kjell Johnson (Spring-Verlag 2013). This book is available through Amazon.com and here is the website for the text: appliedpredictivemodeling.com

PDF of the Applied Predictive Modeling text: click here

Grading:
Your course grade will be based entirely on your performance on course assignments/projects.   There may be both a midterm and final course project that will require you to present your analyses to the class in a seminar format.  This course does satisfy the oral flag requirement as a result. It will be given no additional weight except for the fact that these projects will be worth a large number of points relative to the other course assignments.

Course Projects:
All assignments will be posted on this webpage along with R commands and additional help to get you started on them. You may work in groups of two on all assignments and make one assignment submission for your group. Make sure both group member names are at the top of the assignment. Assignment submissions will done through dropboxes on D2L. Group assignments are NOT a division of labor, i.e. both group members must be working collaboratively on all parts of the assignments/projects. Academic dishonesty will result in dire consequences and I will cry. (WSU policy)

Computing:
We will primarily be using R in this course and a large number of packages for R which you will have to install.  However, we will be using JMP on occasion. 

Tentative Course Outline:

I.   Introduction - Supervised (regression & classification) and Unsupervised Learning

II. Review of Regression Modeling - Review of the important concepts from STAT 360.  Particular emphasis on residual plots, case diagnostics, transformations, and model selection.  I will use R and content from STAT 360 in reviewing this material.

III. Prediction Methods for a Numeric Response

a. shrinkage methods (Ridge, LASSO, LARS)
b. dimension reduction methods (PCR & PLS)
c. automated transformations (ACE & AVAS)
d. smoothing, additive models and local regression
e. multivariate adaptive regression splines (MARS)
f. tree-based regression
g. neural networks
h. nearest neighbor regression


IV.   Prediction Methods for a Categorical Response (Classification Methods)

a. logistic regression
b. penalized logistic regression (ridge, LASSO, Elastic Net)
c. discriminant analysis (LDA, QDA, flexible)
d. support vector machines (SVM)
e. Naive Bayes classifiers
f. neural networks
g.tree-based classification models
h. nearest neighbors


V. Resampling Methods - (covered throughout the course)

a. bootstrap and jackknife
b. estimating prediction error/classification error
c. cross-validation and model/tuning parameter selection
d. randoms subsets