Collect data from database to local environment piecemeal

collector(.df, ..., verbose = TRUE)

Arguments

.df

remote dplyr::tbl() table that needs to be collected

...

columns used to break .data into pieces to be downloaded

verbose

whether to include messages or not. Defaul is TRUE

Value

a tibble

Details

IDEA has 60,000+ students, which means data that is collected daily or more frequently can get large really quickly as the year passes. From experience using collect() to pull data from the remote DB to a local environment will fail eventually for collections that are larger than about 60-100K rows. collector() allows you to break up pulling the data down into smaller peices. Passing collector columns from the database table results in multiple calls to collect() subsutted to disctinct combinations of the selected columns

Note that there is a performance hit. If you can pull down data with collect, you should, since it's faster than calling collect multiple times (as collector() does). However, if you find that collect() keeps failing, than collector() will likely solve that problem by pulling the data set in

Examples

if (FALSE) {
library(dplyr)

schools_remote <- get_schools()
schools <- schools_remote %>% collector(SchoolShortName, RegionID)
}