The past decade has witnessed an increasingly voluminous amount of digital data that is produced on the internet which describes human behavior and other objects of scholarly inquiry. As the figure below shows, recent decades have not only witnessed an increase in the amount of text based data, but also increased computing power which is increasingly necessary to analyze it. Together, these two shifts hold the potential to significantly expand the scope of research in many different fields.
I will begin by discussing some of the positive aspects of digital trace data, and then move on to some of the challenges. In so doing, I draw upon Matt Salganik’s Book Bit by Bit which I highly recommend—not only for a more detailed discussion of digital trace data, but the nascent field of computational social science more broadly.
One of the most attractive features of digital trace data is that it is continuously collected, unlike surveys which usually only provide a brief snapshot of the social world. As the image below indicates, social media can occasionally provide a glimpse of major events such as protests, revolutions, or stock market surges, as they unfold.