Datasets

Create subsection ## Datasets.

In section Datasets read metainformation for dataset okcupid-stem as follows:

1
2
3
4
5
6
7
8
9
from openml.datasets import get_dataset

did = 41440

dataset = get_dataset(did)

print(f"This is dataset '{dataset.name}', the target feature is "
    f"'{dataset.default_target_attribute}'"
)

and get data as a dataframe (last week we selected numpy.array).

1
2
target = dataset.default_target_attribute
df, _, categorical_indicator, attribute_names = dataset.get_data(dataset_format="dataframe")

Check import using the usual shape and head code:

1
2
print(df.shape)
df.head(10)
Dataframe