Building a Dashboard

Aim

The purpose of this assignment is to build a dashboard to allow non technical users to explore a dataset.

The dataset selected for this assignment, in part, because it is too large to build a useful (=well organised with appropriate visualisations/statistics to allow effective EDA) dashboard that covers the entire dataset. So you need to focus on particular aspects of the dataset.

Before we discuss the dataset it is important to differentiate between Exploratory Data Analysis (EDA) and building a dashboard. Yes, in both you are trying to build a summary that represents a dataset, but in the EDA you do more analysis and interpretation of results while when building a dashboard you are developing an interface to aid others to perform an EDA. So in building a dashboard your coverage of the dataset will be greater (but it still does not have to be 100% coverage).

Background to Dataset (TIMSS 2023)

TIMSS 2023 is the eight assessment cycle of TIMSS, the Trends in International Mathematics and Science Study. TIMSS 2023, was conducted with students in 64 countries and 8 benchmarking systems. The study is conducted every four years and given to students in grade 4 (4th class in Irish primary schools) and to students in grade 8 (2nd year in Irish secondary schools).

The eight assessment cycle of TIMSS, TIMSS 2023 was published in December 2024 and the data collated was made publicly available in February 6, 2025. We want to explore this data and see what insights we can gain from it.

The dataset is huge (2.9+ GB including documentation). For each of the 64 countries in the study and for both of two grades there are eight files T23 User Guide, (exhibit 2.4, p.52).

So to keep this assignment manageable we will:

As a result, the 1024+ data files in the TIMSS study drop to the following files:

Filename Description
SPSS/bcgirlm8.sav        School context data
SPSS/bsairlm8.sav Student achievement data
SPSS/bspirlm8.sav Student process data
SPSS/bsrirlm8.sav Within-country scoring reliability data
SPSS/bsgirlm8.sav Student context data data
SPSS/bstirlm8.sav Student achievement data
SPSS/btmirlm8.sav Mathematics teacher context data
SPSS/btsirlm8.sav Science teacher context data (we will ignore this)

Notes:

To understand this dataset you will need the following:

Also you might find of use/interest following report generated by the Irish team which examines the performance of the Ireland students, TIMSS_2023_National_Report_Ireland.pdf.

Reading the dataset

The dataset is published in two formats SPSS and SAS. We will use the SPSS format which contains the data and meta information about each column (type,categories, etc).

Pandas can read SPSS .sav files once the python module pyreadstat is installed. So first install pyreadstat by:

1
!conda install conda-forge::pyreadstat
1
conda install conda-forge::pyreadstat
1
pip install pyreadstat

Then to read in a given datafile we use pd.read_spss command.

We also want to read the codebook to understand the data. Since this is an excel workbook, we can use the pd.read_excel command to read this.

Building a Dashboard

We want to build a dashboard using the python library, streamlit.

Even taking the limited selection of files that we have, there is still too much data with with (approximately 3542 variables). In fact, one of the reasons for picking this dataset was so that there are too many variables to deal with in the limited time you should allocate to a single assignment.

So you will need to pick aspects that are of interest and focus on these. For example (and don't be limited to these suggestions):

Steps

Step 1+: Assignment Setup

Create a root folder for this assignment and download notebook, T23_G8_-00-_Assignment.ipynb. Run all cells, to:

You should end up with the following tree structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
.
├── T23_G8_-_00_-_Assignment.ipynb
├── T23_G8_-_01_-_Import.ipynb
├── TIMSS_Dashboard.zip
├── app
│   └── streamlit_app.py
├── data
└── orig
    ├── SPSS
       ├── bcgirlm8.sav
       ├── bsairlm8.sav
       ├── bsgirlm8.sav
       ├── bspirlm8.sav
       ├── bsrirlm8.sav
       ├── bstirlm8.sav
       ├── btmirlm8.sav
       └── btsirlm8.sav
    ├── docs
       ├── T23_G8_Codebook.xlsx
       └── T23_User_Guide_International_Database.pdf
    └── extra
        └── TIMSS_2023_National_Report_Ireland.pdf

Step 2+: Import the Dataset

The notebook T23_G8_-_01_-_Import.ipynb shows some code that I used to initially explore the dataset. Your should to review this code and then complete the cleaning process. Some initial cleaning step ideas are given to help you get started.

Step 3+: Explore the Dataset

You should perform a cursory exploratory data analysis to decide on what aspects of the dataset you wish to focus on, and the resulting structure of your dashboard.

Step 4+: Build the Dashboard

here you ... well, build the dashboard

Step 5+: Create Archive to Upload to Moodle

Run all cells in notebook T23_G8_-_00_-_Assignment.ipynb to generate ZIP Archive to upload to Moodle using the following link.

Moodle Assignment