9 Webからのデータ取得(2)「World Bank API」
e-Statは日本の統計データを見る場合に利用しましたが、世界の国別の統計や指標を見たり比較したりする場合には世界銀行のウェブサイトが便利です。世界銀行のデータをAPI経由で取得するためのRのパッケージがwbstatsです。
使っているPCで初めて利用する場合は、
install.packages("remotes")
::install_github("nset-ornl/wbstats") remotes
でインストールしてください。セッションを立ち上げた後(プロジェクトを立ち上げた後)には、
library(wbstats)
を実行してwbstatsの関数が利用できるようにしておきます。wb_cachelist
は、利用可能な指標や国、トピックなどの一覧を保持しています。
str(wb_cachelist, max.level = 1)
## List of 8
## $ countries : tibble [304 x 18] (S3: tbl_df/tbl/data.frame)
## $ indicators : tibble [16,649 x 8] (S3: tbl_df/tbl/data.frame)
## $ sources : tibble [63 x 9] (S3: tbl_df/tbl/data.frame)
## $ topics : tibble [21 x 3] (S3: tbl_df/tbl/data.frame)
## $ regions : tibble [48 x 4] (S3: tbl_df/tbl/data.frame)
## $ income_levels: tibble [7 x 3] (S3: tbl_df/tbl/data.frame)
## $ lending_types: tibble [4 x 3] (S3: tbl_df/tbl/data.frame)
## $ languages : tibble [23 x 3] (S3: tbl_df/tbl/data.frame)
利用可能な指標の一覧は以下のコマンドで確認できます。
View(wb_cachelist$indicators)
indicator_id | indicator | unit | indicator_desc | source_org | topics | source_id | source |
---|---|---|---|---|---|---|---|
1.0.HCount.1.90usd | Poverty Headcount ($1.90 a day) | NA | The poverty headcount index measures the proportion of the population with daily per capita income (in 2011 PPP) below the poverty line. | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). | 11 , Poverty | 37 | LAC Equity Lab |
1.0.HCount.2.5usd | Poverty Headcount ($2.50 a day) | NA | The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). | 11 , Poverty | 37 | LAC Equity Lab |
1.0.HCount.Mid10to50 | Middle Class ($10-50 a day) Headcount | NA | The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). | 11 , Poverty | 37 | LAC Equity Lab |
1.0.HCount.Ofcl | Official Moderate Poverty Rate-National | NA | The poverty headcount index measures the proportion of the population with daily per capita income below the official poverty line developed by each country. | LAC Equity Lab tabulations of data from National Statistical Offices. | 11 , Poverty | 37 | LAC Equity Lab |
1.0.HCount.Poor4uds | Poverty Headcount ($4 a day) | NA | The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). | 11 , Poverty | 37 | LAC Equity Lab |
1.0.HCount.Vul4to10 | Vulnerable ($4-10 a day) Headcount | NA | The poverty headcount index measures the proportion of the population with daily per capita income (in 2005 PPP) below the poverty line. | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). | 11 , Poverty | 37 | LAC Equity Lab |
データの検索はwb_search()
関数を用います。
wb_search("forest area")
## # A tibble: 4 x 3
## indicator_id indicator indicator_desc
## <chr> <chr> <chr>
## 1 AG.LND.FRST.HA Forest area (hectares) Forest area is land under n~
## 2 AG.LND.FRST.K2 Forest area (sq. km) Forest area is land under n~
## 3 AG.LND.FRST.ZS Forest area (% of land area) Forest area is land under n~
## 4 ER.FST.DFST.ZG Annual deforestation (% of change) Average annual deforestatio~
実際のデータの取得は、wb_data()
関数を用います。indicator
引数に取得したいデータのIDを指定します。
<- wb_data(indicator = "AG.LND.FRST.ZS", country = "all") fa_data
いくつかの国について、国土面積に対する森林面積の割合の推移を折れ線グラフとしてプロットしてみます。
%>%
fa_data filter(country %in% c("Japan", "World", "China",
"Latin America & Caribbean")) %>%
mutate(date = lubridate::ymd(date, truncated = 2)) %>%
rename(value = 5) %>%
filter(!is.na(value)) %>%
ggplot(aes(date, value, group = country, color = country)) +
geom_line() +
theme_minimal(base_size = 16) +
theme(legend.position = "bottom") +
labs(x = "年", y = "森林面積の割合[%]",
title = "国土面積に対する森林面積の割合の推移") +
gghighlight(label_params = list(direction = "y"))
## label_key: country
次に、日本のCO2排出量の推移を見てみましょう。
View(wb_search("CO2 emissions"))
<- wb_data(indicator = "EN.ATM.CO2E.KT")
co2_data head(co2_data)
## # A tibble: 6 x 9
## iso2c iso3c country date EN.ATM.CO2E.KT unit obs_status footnote
## <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 AW ABW Aruba 1960 11093. <NA> <NA> <NA>
## 2 AW ABW Aruba 1961 11577. <NA> <NA> <NA>
## 3 AW ABW Aruba 1962 12713. <NA> <NA> <NA>
## 4 AW ABW Aruba 1963 12178. <NA> <NA> <NA>
## 5 AW ABW Aruba 1964 11841. <NA> <NA> <NA>
## 6 AW ABW Aruba 1965 10623. <NA> <NA> <NA>
## # ... with 1 more variable: last_updated <date>
%>%
co2_data filter(country == "Japan") %>%
mutate(date = lubridate::ymd(date, truncated = 2)) %>%
rename(value = 5) %>%
filter(!is.na(value)) %>%
ggplot(aes(date, value)) +
geom_area(alpha = 0.7) +
theme_minimal(base_size = 16) +
labs(x = "年", y = "CO2排出量[kt]",
title = "日本のCO2排出量の推移")