NLP and CSS 201: Beyond the Basics
Most people starting out with NLP think of text in terms of single-word units called “unigrams.” But many concepts in documents can’t be represented by single words. For instance, the single words “New” and “York” can’t really represent the concept “New York.” In this tutorial, you’ll get hands-on practice using the phrasemachine package and the Phrase-BERT model to 1) extract multi-word expressions from a corpus of U.S. Supreme Court arguments and 2) use such phrases for downstream analysis tasks, such as analyzing the use of phrases among different groups or describing latent topics from a corpus.
Author: Abe Handler and Shufan Wang,
Duration: 57:46