Practical 03 — Churn — Feature Engineering

In this practical we will try to keep everything else fixed and focus on one single aspect of the data mining pipeline — feature engineering, or more specifically feature extraction and feature selection. (Yes, I'm aware of the inconsistently between "single" and the use of "and". :-/)

Your starting point is the notebooks

Your job this week is to create new features to see if it is possible to find features to better explain the target and hopefully result in a better score (as defined in the notebook).

To keep this exercise as realistic as possible I'm not going to tell you which features will be of benefit, nor how much (if any) the chosen metric can be improved. My role is to answer questions re constructing specific features (even if I know that the resulting feature will be of no use).

Suggestions:

Best strategy is to develop multiple features using multiple lines of attack — features that might seem plausible might not be of benefit when applied to the data.
Adding features related to State appears to be a good idea but will take more effort — always think of effort vs reward.
I'm not saying that such features will improve the score. This is for you to determine.
So don't get tunnel-vision and unnecessarily limit your options.

Grading

I do not want influence your feature engineering by giving too much detail of the grading, but the grade will be based on:

Number of features that pass the automatic feature selection algorithm RFECV.
Improvement on the baseline models (relative to rest of class).

Model LR
    CV scores: 0.86, 0.85, 0.86, 0.85, 0.85, 0.87, 0.88, 0.85, 0.84, 0.86
    mean=85.84% std=0.0095

Model DT
    CV scores: 0.94, 0.92, 0.90, 0.89, 0.90, 0.91, 0.91, 0.91, 0.91, 0.89
    mean=90.64% std=0.0119

Model DT(max_depth=3)
    CV scores: 0.90, 0.92, 0.90, 0.88, 0.91, 0.92, 0.89, 0.92, 0.91, 0.86
    mean=89.79% std=0.0175

Model KNN
    CV scores: 0.90, 0.90, 0.89, 0.93, 0.90, 0.88, 0.88, 0.90, 0.90, 0.88
    mean=89.39% std=0.0134

Model SVC
    CV scores: 0.93, 0.91, 0.92, 0.94, 0.91, 0.91, 0.91, 0.89, 0.92, 0.90
    mean=91.09% std=0.0136

Best Performing Model SVC with (mean CV of) accuracy = 91.09%

Submission

Upload to Moodle zip archive created by code at end of US_Churn-04-Feature_Engineering.ipynb