In this practical we will try to keep everything else fixed and focus on one single aspect of the data mining pipeline — feature engineering, or more specifically feature extraction and feature selection. (Yes, I'm aware of the inconsistently between "single" and the use of "and". :-/)
Your starting point is the notebooks
Your job this week is to create new features to see if it is possible to find features to better explain the target and hopefully result in a better score (as defined in the notebook).
To keep this exercise as realistic as possible I'm not going to tell you which features will be of benefit, nor how much (if any) the chosen metric can be improved. My role is to answer questions re constructing specific features (even if I know that the resulting feature will be of no use).
Suggestions:
Best strategy is to develop multiple features using multiple lines of attack — features that might seem plausible might not be of benefit when applied to the data.
Adding features related to State appears to be a good idea but will take more effort — always think of effort vs reward.
I'm not saying that such features will improve the score. This is for you to determine.
So don't get tunnel-vision and unnecessarily limit your options.
I do not want influence your feature engineering by giving too much detail of the grading, but the grade will be based on:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Upload to Moodle zip archive created by code at end of US_Churn-04-Feature_Engineering.ipynb