Our curriculum for text analysis covered a broad range of techniques for analyzing unstructured text. Yesterday, the group exercise was designed to help you get to know each other better and explore possible topics for a group research project that you might launch during the second week by designing a study that would employ digital trace data. Today, we are going to have a more structured group exercise designed to compare the strengths and weaknesses of different text analysis techniques when they are applied to a single dataset.

  1. Everyone will be randomized into groups of 5 people.

  2. Load the dataframe of tweets by President Trump that we analyzed during the discussion of Dictionary-Based text methods:

load(url("https://cbail.github.io/Trump_Tweets.Rdata"))

Note that this data frame also includes counts of the number of times each of his tweets were retweeted or “liked.” NOTE: IF YOU PREFER TO USE A DATASET YOUR GROUP COLLECTED DURING YESTERDAY’S EXERCISE, IT IS OK TO USE THIS AS WELL.

  1. Use at least two of the techniques discussed in the videos to pull out features from the text of Trump’s tweets (e.g. substantive themes, topics, sentiment), or the text-based dataset you collected yesterday.

  2. Work together to identify whether any features of Trump’s Twitter language predict the number of retweets or likes his messages receive. If you are working with other Twitter data, feel free to predict likes by other users as well.

  3. Load the dataframe of daily approval ratings for President Trump from the five-thirty-eight website’s github:

trump_approval<-read.csv("https://projects.fivethirtyeight.com/trump-approval-data/approval_topline.csv")
  1. Work together to determine whether there are any features of Trump’s Twitter language that have an association with his overall approval ratings.

  2. Produce a visualization that describes the findings of your analysis and share it in our slack channel with a brief description of what you did.