Comprehensive Usage Guide

Version: 4.0.0 Purpose: Cross-platform R package for accessing IDEA Public Schools’ SQL Server data warehouse

Prerequisites

Before using ideadata, ensure you have:

System Requirements

R >= 2.10
Operating System: Windows, macOS, or Linux
VPN Connection: Must be connected to IDEA network
ODBC Driver: Microsoft ODBC Driver 17+ for SQL Server
- Windows: Usually pre-installed
- macOS: Install via Microsoft’s download page
- Linux: Follow distribution-specific instructions

Kerberos (macOS/Linux only)

macOS: Kerberos utilities (kinit, klist) are pre-installed
Linux: Install with sudo apt-get install krb5-user (Ubuntu/Debian)

Network Access

Active IDEA employee account
VPN credentials and connection to IDEA network
Database access permissions granted by IT

Installation

# Install from GitHub using remotes
install.packages("remotes")
remotes::install_github("idea-analytics/ideadata")

# Or using renv (recommended for project reproducibility)
renv::install("idea-analytics/ideadata")

Verifying Installation

library(ideadata)
# You should see startup message showing:
# - Kerberos ticket status
# - Number of tables/databases/servers available

Initial Setup

Step 1: Configure Credentials

Run the interactive credential setup:

setup_creds()

You’ll be prompted for:

User ID: Your IDEA email prefix (e.g., firstname.lastname)
Password: Your IDEA network password
ODBC Driver: Usually ODBC Driver 17 for SQL Server

This creates/updates your .Renviron file with:

IDEA_RNA_DB_UID=firstname.lastname
IDEA_RNA_DB_PWD=your_password
IDEA_RNA_ODBC_DRIVER=ODBC Driver 17 for SQL Server

Step 2: Restart R

After setup, restart your R session to load the new credentials:

# In RStudio: Session > Restart R
# Or simply reload the package
library(ideadata)

Step 3: Verify Connection

Test your connection with a simple query:

schools <- get_schools() %>% collect()

If successful, you’re ready to use the package!

Understanding Lazy Evaluation

CRITICAL CONCEPT: All get_*() functions return lazy query objects, not actual data.

What This Means

# This does NOT pull data into R
students <- get_students()
class(students)
# [1] "tbl_Microsoft SQL Server" "tbl_dbi" "tbl_sql" "tbl_lazy" "tbl"

# The query is only executed when you call collect()
students_data <- get_students() %>% collect()

Benefits of Lazy Evaluation

Fast filtering: Filters execute on the SQL Server, not in R
Memory efficient: Only pull data you actually need
Preview capability: Examine query structure before pulling

The Pattern

# 1. Build your query (lazy, instant)
query <- get_students() %>%
  filter(grade_level >= 9) %>%
  select(student_number, first_name, last_name, schoolid)

# 2. Preview the SQL (optional)
show_query(query)

# 3. Pull the data (this executes on SQL Server)
data <- collect(query)

Basic Usage Patterns

Pattern 1: Simple Table Retrieval

library(ideadata)
library(dplyr)

# Get current schools
schools <- get_schools() %>%
  collect()

# Get current students (filters applied server-side)
current_students <- get_currently_enrolled_students() %>%
  filter(grade_level %in% c(9, 10, 11, 12)) %>%
  collect()

Pattern 2: Using `get_table()` for Any Table

# If you know the exact table name
my_table <- get_table(.table_name = "Students")

# If table name is ambiguous, you'll get helpful error with options:
# Error: Found 3 tables named 'Students':
# [Copy-paste code for each option shown]

# Specify database to disambiguate
students <- get_table(
  .table_name = "Students",
  .database_name = "PowerSchool"
)

Pattern 3: Exploring Available Tables

# Browse all available tables in a viewer
view_warehouse_metadata()

# Search for tables containing a keyword
warehouse_meta_data %>%
  filter(str_detect(table_name, "attendance"))

Pattern 4: Joining Data from Multiple Sources

# Students with attendance
student_attendance <- get_students() %>%
  left_join(
    get_student_daily_attendance(),
    by = c("id" = "studentid")
  ) %>%
  filter(att_date >= "2024-01-01") %>%
  collect()

Core Functions

Connection Management

`create_connection(.database_name, .server_name)`

Creates a new database connection.

# Usually not needed - connections are created automatically
# But useful for custom database access
con <- create_connection(
  .database_name = "PowerSchool",
  .server_name = "RGVPDSD-DWPRD1"
)

`check_get_connection(.database_name, .server_name)`

Checks for existing valid connection, creates if needed.

# This is called internally by all get_* functions
# Returns connection object
con <- check_get_connection(.database_name = "PowerSchool")

PowerSchool Data Functions

Student Data

# All students
students <- get_students() %>% collect()

# Only currently enrolled
current <- get_currently_enrolled_students() %>% collect()

# Academic summary
academic <- get_students_academic_summary() %>% collect()

# Continuous enrollment tracking
enrollment <- get_students_continuous_enrollment() %>% collect()

School Data

# All schools
schools <- get_schools() %>% collect()

# Regions
regions <- get_regions() %>% collect()

Attendance Data

# Daily attendance
attendance <- get_student_daily_attendance() %>%
  filter(att_date >= "2024-01-01") %>%
  collect()

# Engagement attendance (remote vs in-person)
engagement <- get_student_engagement_attendance() %>%
  collect()

Assessment Data Functions

Interim Assessments

# IAB/WA assessments
iabwa <- get_iabwa() %>%
  filter(test_year == 2024) %>%
  collect()

Renaissance STAR

# Math assessments
star_math <- get_renstar_math() %>%
  filter(school_year == 2024) %>%
  collect()

# Reading assessments
star_reading <- get_renstar_reading() %>%
  filter(school_year == 2024) %>%
  collect()

Survey Data Functions

Employee Survey (AEES)

# Survey questions
questions <- get_aees_questions() %>% collect()

# Survey results
results <- get_aees_results() %>% collect()

# Open-ended responses
oe_responses <- get_aees_oe_responses() %>% collect()

# All AEES data (convenience wrapper)
all_aees <- get_aees() %>% collect()

Great Place to Work (GPTW)

# Survey questions
questions <- get_gptw_questions() %>% collect()

# Survey results
results <- get_gptw_results() %>% collect()

# Open-ended responses
oe_responses <- get_gptw_oe_responses() %>% collect()

Student Services Functions

Persistence Data

# Historical persistence data
persistence <- get_persistence_historical() %>%
  filter(school_year >= 2020) %>%
  collect()

# Persistence codes
codes <- get_persistence_code() %>% collect()

# Persistence reason codes
reasons <- get_persistence_reasons_codes() %>% collect()

Learning Location

# Daily learning location (remote vs in-person)
location <- get_learning_location() %>%
  filter(date >= "2024-01-01") %>%
  collect()

Working with Large Datasets

Problem: Connection Failures on Large Pulls

When pulling large datasets, SQL Server connections may timeout or fail.

Solution: Use `collector()`

The collector() function breaks large queries into chunks and combines them.

Basic Usage

# Instead of this (may fail):
# big_data <- get_student_daily_attendance() %>% collect()

# Use collector to pull in chunks by schoolid
big_data <- get_student_daily_attendance() %>%
  collector(schoolid, verbose = TRUE)

Multiple Grouping Variables

# Chunk by both schoolid and yearid
very_big_data <- get_student_daily_attendance() %>%
  collector(schoolid, yearid, verbose = TRUE)

What It Does

Identifies unique values of grouping columns
Filters data for each unique combination
Pulls data in manageable chunks
Combines all chunks into single tibble
Shows progress with encouraging messages (if verbose = TRUE)

When to Use `collector()`

Pulling multiple years of daily attendance
Large student-level datasets spanning all schools
Historical data spanning many years
Any query that previously caused connection failures

Best Practices

1. Always Filter on the Server

# GOOD: Filter before collect()
students_2024 <- get_students() %>%
  filter(schoolid == 12345, enroll_status == 0) %>%
  collect()

# BAD: Pulls ALL students first, then filters in R
students_2024 <- get_students() %>%
  collect() %>%
  filter(schoolid == 12345, enroll_status == 0)

2. Select Only Needed Columns

# GOOD: Select columns on server
students_subset <- get_students() %>%
  select(student_number, first_name, last_name, grade_level) %>%
  collect()

# AVOID: Pulling all 50+ columns when you only need 4
students_all <- get_students() %>% collect()

3. Preview Before Pulling

# Build your query
query <- get_students() %>%
  filter(grade_level >= 9) %>%
  select(student_number, first_name, last_name)

# Check the SQL that will be generated
show_query(query)

# Preview first few rows (fast)
head(query)

# Count rows (fast)
tally(query)

# Pull when satisfied
data <- collect(query)

4. Use Joins on the Server

# GOOD: Join lazy objects before collect()
combined <- get_students() %>%
  inner_join(get_schools(), by = "schoolid") %>%
  collect()

# LESS EFFICIENT: Pull both tables, then join in R
students <- get_students() %>% collect()
schools <- get_schools() %>% collect()
combined <- inner_join(students, schools, by = "schoolid")

5. Reuse Connections

# Connections are automatically reused
# No need to manually manage connections in normal usage

# If you need to close a connection:
disconnect(conn_PowerSchool)

6. Handle Dates Properly

# SQL Server dates work with standard R date filtering
recent_attendance <- get_student_daily_attendance() %>%
  filter(att_date >= "2024-01-01", att_date < "2024-07-01") %>%
  collect()

# Can also use lubridate functions (may vary in SQL translation)
recent_attendance <- get_student_daily_attendance() %>%
  filter(att_date >= lubridate::ymd("2024-01-01")) %>%
  collect()

Troubleshooting

“Kerberos ticket expired” Error

Problem: Kerberos tickets expire after ~10 hours.

Solution:

# Reload package to renew ticket
detach("package:ideadata", unload = TRUE)
library(ideadata)

“Could not find table” Error

Problem: Table name is ambiguous or doesn’t exist.