Mistakes were Made, but not by Me

Aim

This is a small (<30 lines of code) text (or filename) cleaning task requiring the use of regex or/and fuzzy pattern matching.

In last year's Data Mining 2 class I asked students to submit two files as part of a particular assignment. The files were to be named 01-Clean.ipynb and 02-Model.ipynb.

Like all students — except you of course — they decided to ignore the instructions and do their own thing and use different file names or some even decided to "help" by compressing files :-).

So I had to write code to redress this situation, and want you to go through the same pain.

Details

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
nasa
├── Student-00
│   ├── 01-Clean.ipynb
│   └── 02-Model.ipynb
├── Student-01
│   ├── 01-Clean.ipynb
│   └── 02-Model.ipynb
├── Student-02
│   └── NASA-Software-Defect-Assignment.zip
├── Student-07
│   ├── clean.ipynb
│   └── model.ipynb
...
├── Student-14
│   └── NASA_ASSIGNMENT.zip
...
├── Student-18
│   ├── 01-Clean.ipynb
│   └── 02-ModelVersion2.ipynb
├── Student-19
│   ├── 01-Clean.ipynb
│   └── 02-Model.ipynb
├── Student-20
│   └── 02-Model.ipynb
├── Student-21
│   ├── 01-Clean.ipynb
│   └── 02-model.ipynb
├── Student-22
│   └── NasaSoftwareDefection.zip
...
└── Student-35
    ├── 01\ -\ Cleaning.ipynb
    └── 02\ -\ Pipeline.ipynb

Parse this tree to produce dataframe similar to that shown below

Result of cleaning NASA assignment submission files.

Where: