Stacked Learners
Aim
- Build a stacked model for the
okcupid-stem dataset and compare with basic classifiers (spoiler: no improvement AFAIK, but you might do a better job than I).
- Use dataframes with
sklearn functions (rather that numpy.array).
Setup
Create notebook Stacked_Learners.ipynb with following structure
Create subsection ## Imports and Setup.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 | import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import yaml, time, sys, os
from IPython.display import display, Markdown
plt.style.use("seaborn-darkgrid")
pd.set_option('display.max_columns', None)
sns.set_style("darkgrid")
DATASET = "okcupid-stem"
COLAB = 'google.colab' in sys.modules
if COLAB:
ROOT = f"/content/gdrive/MyDrive/datasets/{DATASET.replace(' ','_')}/"
else:
ROOT = "./"
DEBUG = True
|
Then make the project tree
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | if COLAB:
from google.colab import drive
if not os.path.isdir("/content/gdrive"):
drive.mount("/content/gdrive")
d = "/content/gdrive/MyDrive/datasets"
if not os.path.isdir(d): os.makedirs(d)
if not os.path.isdir(ROOT): os.makedirs(ROOT)
def makedirs(d):
if COLAB:
if not os.path.isdir(ROOT+d): os.makedirs(ROOT+d)
else:
if not os.path.isdir(ROOT+d): os.makedirs(ROOT+d, mode=0o777, exist_ok=True)
for d in ['orig','data','output']: makedirs(d)
|
and to hid warnings when using xgboost
| import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
|
Next we import the required dataset using the openml API ...