EDA - Attributes
For each attribute find the following information.
- The attribute type, e.g. nominal, ordinal, numeric.
- Percentage of missing values in the data.
- Max, min, mean, standard deviation.
- The type of distribution that the numeric attribute seems to follow (e.g. normal).
- Are there any records that have a value for the attribute that no other record has (i.e. unique values)?
- Study the histogram of the attribute and note how it seems to influence the risk for churning.
- Are there any outliers for the attribute under consideration?
If you suspect of the existence of outliers for an attribute, you may consider the possibility of using box plots for outlier detection.
- Which attributes seem to be linked to the risk for churning?