Practical 03 — Churn — Feature Engineering

In this practical we will try to keep everything else fixed and focus on one single aspect of the data mining pipeline — feature engineering, or more specifically feature extraction and feature selection. (Yes, I'm aware of the inconsistently between "single" and the use of "and". :-/)

Your starting point is the notebooks

Your job this week is to create new features to see if it is possible to find features to better explain the target and hopefully result in a better score (as defined in the notebook).

To keep this exercise as realistic as possible I'm not going to tell you which features will be of benefit, nor how much (if any) the chosen metric can be improved. My role is to answer questions re constructing specific features (even if I know that the resulting feature will be of no use).

Suggestions:

Grading

I do not want influence your feature engineering by giving too much detail of the grading, but the grade will be based on:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
Model LR
    CV scores: 0.86, 0.85, 0.86, 0.85, 0.85, 0.87, 0.88, 0.85, 0.84, 0.86
    mean=85.84% std=0.0095

Model DT
    CV scores: 0.94, 0.92, 0.90, 0.89, 0.90, 0.91, 0.91, 0.91, 0.91, 0.89
    mean=90.64% std=0.0119

Model DT(max_depth=3)
    CV scores: 0.90, 0.92, 0.90, 0.88, 0.91, 0.92, 0.89, 0.92, 0.91, 0.86
    mean=89.79% std=0.0175

Model KNN
    CV scores: 0.90, 0.90, 0.89, 0.93, 0.90, 0.88, 0.88, 0.90, 0.90, 0.88
    mean=89.39% std=0.0134

Model SVC
    CV scores: 0.93, 0.91, 0.92, 0.94, 0.91, 0.91, 0.91, 0.89, 0.92, 0.90
    mean=91.09% std=0.0136

Best Performing Model SVC with (mean CV of) accuracy = 91.09%

Submission

Upload to Moodle zip archive created by code at end of US_Churn-04-Feature_Engineering.ipynb