Task 5: Sample Dataset (~5%)

Create a section in your notebook called Sample Dataset.


The dataset is too large to analyise fully so we will perform the topic modelling analysis for a single subreddit in the df_submissions dataframe. So pick a subreddit from one of the top 20 subreddits ordered in terms of number of submissions, I picked TruthLeaks and suggest you use this — but feel free to try another subreddit.

Once you have picked your subreddit of choice then create a new dataframe called df, from the df_submissions dataframe by selecting all rows where: