Base Learners/Estimators

First I load a selection of heterogeneous learners. There are two reasons for this:

So I imported the following

1
2
3
4
5
6
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from xgboost import XGBClassifier

but ended up not using SVC because I am impatient and it took too long to train. Then I created a dictionary of the models

1
2
3
4
5
6
7
8
models = {
    "LR": LogisticRegression(max_iter=1000),
    "DT": DecisionTreeClassifier(),
    "KNN": KNeighborsClassifier(),
    "RF": RandomForestClassifier(),
    "ET": ExtraTreesClassifier(),
    "XGB": XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=SEED)
}

and imported the usual functions to score and report on performance

1
2
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

So now I run over each of the individual learners to see how they preform

1
2
3
4
cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=SEED)
for name,model in models.items():
    scores = cross_val_score(model, df_train, y_train, cv=cv)
    print(name, scores.mean())

Next we will build a stacked model and see if we can improved on this.