Given a problem description, determine features that may be relevant, including those directly present in the raw data and those that must be constructed.
Given a problem description, determine an appropriate response variable.
Design a model evaluation scheme for a specific problem / model context, keeping in mind the dangers of overfitting and the bias-variance trade-off.
Discuss the relative merits of linear, sparse, and tree-based methods in a particular problem setting, and prepare code that implements them appropriately.