Baseline Model Churm_-_03_-_BaseLine_Model.ipynb
Now that you have some general understanding of the dataset that you are working with, we can start to think about modelling.
Train/Test split
Since we want to have an unbiased measurement of our model quality we perform the usual train/test spilt, with two slight modifications.
- Use stratified split (on target) to ensure train and test sub-datasets have approximately the same percentage of samples of each target class as the complete set.
- use a fixed seed for the random number generator to allow comparisons between students (i.e., grading (for later practicals)).
| from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, stratify=df.Churn, test_size=.30, random_state=SEED)
|
Baseline Model
-
What features should be used?
- If only I did not skip the EDA ...
-
What data preparation steps are needed?
- Have not covered this, but at the minimum scale the data.
-
Which classifiers to use?
- Try 6-10 classifiers and compare results.
-
What is the evaluation metric?
- Today we will use
accuracy but this has issues.