Learning Outcomes
- Given a problem description, determine features that may be relevant, including those directly present in the raw data and those that must be constructed.
- Given a problem description, determine an appropriate response variable.
- Design a model evaluation scheme for a specific problem / model context, keeping in mind the dangers of overfitting and the bias-variance trade-off.
- Given a covariate / response pair, use visualization and summary statistics to identify preprocessing or transformation methods that may improve downstream model performance.
- Discuss the relative merits of linear, sparse, and tree-based methods in a particular problem setting, and prepare code that implements them appropriately.
- Use model summaries and data visualization to summarize the important features in a model and note areas for potential improvement.
Exercises
Exercises from this session are available here.