Download Voteview data in parallel

library(filibustr)

The Voteview functions have the power to download lots of data on many years of Congress. One downside of this power is that downloading many large datasets from the internet can be slow.

One way to speed up your data downloads is to download data in parallel. When you call a Voteview function to download data from multiple Congresses (i.e., when length(congress) > 1), {filibustr} will download data in parallel if you have set up that capability.

Everything described below is a purely optional way to accelerate your data imports. If you don’t set up parallel computing processes, the Voteview functions will simply download data sequentially.

Setting up for parallel downloads

Make sure the {mirai} and {carrier} packags are installed

Under the hood, the Voteview functions use purrr::in_parallel() for parallel downloads. purrr::in_parallel() depends on two packages ({mirai} and {carrier}) that are not otherwise used in {filibustr}, so you may not have them installed.

To check if you have installed the required versions of these packages, run this code. It will prompt you to install any packages you’re missing.

rlang::check_installed(c("carrier", "mirai"), version = c("0.3.0",  "2.5.1"))

Set parallel processes

To download Voteview data in parallel, use mirai::daemons() to create parallel processes ({mirai} calls these “daemons”).

# detect the number of cores available on your machine
parallel::detectCores()

# launch a specific number of processes, or
mirai::daemons(4)
# launch a process on all but one available cores
mirai::daemons(parallel::detectCores() - 1)

How many processes should I create?

In general, if you split the work up across more processes, the download will finish faster. Theoretically, N processes can finish the download up to N times faster.

At the same time, there can be diminishing returns to creating a large number of processes.

Also, there is less benefit when you set more processes than the number of cores available on your machine (which you can see using parallel::detectCores()). A good rule of thumb (per the purrr documentation) is to use (at most) one less than the number of cores on your machine, leaving one core open for the main R process.

Downloading data in parallel

Once you’ve set up your parallel processes, just call the Voteview functions like normal, and they will automatically download data in parallel!

Reminder: parallel processing only impacts downloads where length(congress) > 1.

# download Voteview data from multiple Congresses
get_voteview_members(congress = 95:118)

get_voteview_rollcall_votes(congress = 95:118)

When you’re done with all your parallel processing, you can close the daemon connections with mirai::daemons(0) if you’d like. The connections will close automatically when your session ends otherwise.

mirai::daemons(0)

More details

See the documentation for purrr::in_parallel() and {mirai} (especially mirai::daemons()) for additional details on parallel processing.

Setting up for parallel downloads

Make sure the {mirai} and {carrier} packags are installed

Set parallel processes

How many processes should I create?

Downloading data in parallel

More details

Make sure the `{mirai}` and `{carrier}` packags are installed