Use a mixture of techniques to study user behaviour on known QAnon popular subreddits.
This assignment is prompted by the paper Characterizing Reddit Participation of Users Who Engage in the QAnon Conspiracy Theories whose authors collated the dataset that we will use.
The dataset is large (2.3GB as uncompressed CSV) so I will post it on slack. It consists of five tables:
authors: List of 13,182 reddit users who commented/submitted to QAnon identified subreddits. No missing values or other issues.comments: List of 10,831,922 comments with full text. submissions: List of 2,099,875 posts with full text. subreddits: List of 12,987 subreddits where at least two QAnon-enthusiastic users have made a submission.paper: List of 19 subreddits, identified in the paper [Appendix A, 1], where QAnon users were more active.The paper looks at interaction — submission and comments — over three periods, each of 319 days in length. The three periods are:
4chan up to the Reddit ban of QAnon-focused subreddits.We won't do the same analysis as covered in the paper, but instead try to see what other information we can extract.
In below I have tried to break the analysis into separate tasks (with approximate weighting) which, except for the first task, are relatively independent. Your notebook should have level 2 sections covering each task.
Upload notebook with sections/task implemented as covered above.