Loading Data `Churm_-_01_-_Import.ipynb`

The Import step for this data set is simple:

Step 0: Create project tree

if COLAB:
  from google.colab import drive
  if not os.path.isdir("/content/gdrive"):
    drive.mount("/content/gdrive")
    d = "/content/gdrive/MyDrive/datasets"
    if not os.path.isdir(d): os.makedirs(d)
  if not os.path.isdir(ROOT): os.makedirs(ROOT)

def makedirs(d):
  if COLAB:
    if not os.path.isdir(ROOT+d): os.makedirs(ROOT+d)
  else:
    if not os.path.isdir(ROOT+d): os.makedirs(ROOT+d, mode=0o777, exist_ok=True)

for d in ['orig','data','output']: makedirs(d)

Step 1 Get data

Put the original data into folder orig. Either download the files from the links provided (for this project we have a data file (CSV) and a data sheet file (YAML) specifying the label encoding) ) or run something like

BASE_URL = "https://SETU-DataMining2.github.io/live/resources/churn"

for filename in ['data.csv','datasheet.yaml']:
  source = f"{BASE_URL}/{filename}"
    target = f"{ROOT}/orig/{filename}"

  if not os.path.isfile(target):
    print (f"Downloading remote file {filename}", sep="")
    import urllib.request
    urllib.request.urlretrieve(source, target)
  else:
    print(f"Using local copy of {filename}")

You need to:

Standardise columns names.
Check for missing values.
I have used the datasheet to convert some of the labels to text you have the rest.
Convert variables to more appropriate datatype, to simplify EDA.
Save cleaned files to data subfolder (using pickle format).

Loading Data Churm_-_01_-_Import.ipynb

Loading Data `Churm_-_01_-_Import.ipynb`