Chris Bail
Duke University
Website: https://www.chrisbail.net
Twitter: https://www.twitter.com/chris_bail
Github: https://github.com/cbail
install.packages("rvest")
library(rvest)
We are going to begin by scraping this very simple web page from Wikipedia.
wikipedia_page<-
read_html("https://en.wikipedia.org/wiki/World_Health_Organization_ranking_of_health_systems_in_2000")
wikipedia_page
{html_document}
<html class="client-nojs" lang="en" dir="ltr">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset= ...
[2] <body class="mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-sub ...
section_of_wikipedia<-
html_node(wikipedia_page,
xpath='//*[@id="mw-content-text"]/div/table')
head(section_of_wikipedia)
$node
<pointer: 0x7fac43e9c750>
$doc
<pointer: 0x7fac43e65030>
health_rankings<-html_table(section_of_wikipedia)
head(health_rankings[,(1:2)])
Country Attainment of goals / Health / Level (DALE)
1 Afghanistan 164
2 Albania 102
3 Algeria 44
4 Andorra 10
5 Angola 165
6 Antigua and Barbuda 48