Baseline Model `Churm_-_03_-_BaseLine_Model.ipynb`

Now that you have some general understanding of the dataset that you are working with, we can start to think about modelling.

Train/Test split

Since we want to have an unbiased measurement of our model quality we perform the usual train/test spilt, with two slight modifications.

Use stratified split (on target) to ensure train and test sub-datasets have approximately the same percentage of samples of each target class as the complete set.
use a fixed seed for the random number generator to allow comparisons between students (i.e., grading (for later practicals)).

from sklearn.model_selection import train_test_split

df_train, df_test = train_test_split(df, stratify=df.Churn, test_size=.30, random_state=SEED)

Baseline Model

What features should be used?
- If only I did not skip the EDA ...
What data preparation steps are needed?
- Have not covered this, but at the minimum scale the data.
Which classifiers to use?
- Try 6-10 classifiers and compare results.
What is the evaluation metric?
- Today we will use accuracy but this has issues.