Today we covered a broad range of techniques for analyzing unstructured text. Yesterday, the group exercise was designed to help you get to know each other better and explore possible topics for a group research project that you might launch during the second week by designing a study that would employ digital trace data. Today, we are going to have a more structured group exercise designed to compare the strengths and weaknesses of different text analysis techniques when they are applied to the same dataset.

  1. Divide yourselves into groups of four by counting off in order around the room.

  2. Load the dataframe of tweets by President Trump that we analyzed during the discussion of Dictionary-Based text methods:


Note that this data frame also includes counts of the number of times each of his tweets were retweeted or “liked.”

  1. Use at least two of the techniques we discussed this morning to pull out features from the text of Trump’s tweets (e.g. substantive themes, topics, sentiment).

  2. Work together to identify whether any features of Trump’s Twitter language predict the number of retweets or likes his messages receive.

  3. Load the dataframe of daily approval ratings for President Trump from the five-thirty-eight website’s github:

  1. Work together to determine whether there are any features of Trump’s Twitter language that have an association with his overall approval ratings.

  2. Produce a visualization that describes the findings of your analysis. Unlike yesterday, your group will be asked to share this visualization with the rest of your Institute in the 15 minutes before the visiting speaker’s lecture. Please email a copy of your presentation to the organizer of your institute so that it can be displayed on a large screen.