# Load spatial data
pa_counties <- st_read("data/Pennsylvania_County_Boundaries.shp")
pa_hospitals <- st_read("data/hospitals.geojson")
pa_tracts <- tracts(state = "PA", cb = TRUE, class = "sf")
# Check that all data loaded correctly
st_crs(pa_hospitals)
st_crs(pa_tracts)
st_crs(pa_counties)Lab 2: Spatial Analysis and Visualization
Healthcare Access and Equity in Pennsylvania
Assignment Overview
Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences
Part 1: Healthcare Access for Vulnerable Populations
Research Question
Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?
Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.
Required Analysis Steps
Complete the following analysis, documenting each step with code and brief explanations:
Step 1: Data Collection (5 points)
Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts
Your Task:
data_summary <- tibble(
Dataset = c("Hospitals", "Census Tracts", "Counties"),
`Number of Features` = c(
nrow(pa_hospitals),
nrow(pa_tracts),
nrow(pa_counties)
)
)
data_summary %>%
mutate(`Number of Features` = scales::comma(`Number of Features`)) %>%
kable(
caption = "Summary of Loaded Spatial Datasets",
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| Dataset | Number of Features |
|---|---|
| Hospitals | 223 |
| Census Tracts | 3,445 |
| Counties | 67 |
Questions to answer:
How many hospitals are in your dataset?
223How many census tracts?
3,445What coordinate reference system is each dataset in?
Hospital: WGS84 (EPSG: 4326)
PA census tracts: NAD83 (EPSG: 4269)
PA counties: WGS 84 / Pseudo-Mercator (EPSG: 3857)
Step 2: Get Demographic Data
Use tidycensus to download tract-level demographic data for Pennsylvania.
Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)
Your Task:
# Get demographic data from ACS
pa_acs <- get_acs(
geography = "tract",
state = "PA",
year = 2022,
survey = "acs5",
output = "wide",
variables = c(
total_pop = "B01003_001",
med_income = "B19013_001",
m_65_66 = "B01001_020",
m_67_69 = "B01001_021",
m_70_74 = "B01001_022",
m_75_79 = "B01001_023",
m_80_84 = "B01001_024",
m_85up = "B01001_025",
f_65_66 = "B01001_044",
f_67_69 = "B01001_045",
f_70_74 = "B01001_046",
f_75_79 = "B01001_047",
f_80_84 = "B01001_048",
f_85up = "B01001_049"
)
)
pa_acs <- pa_acs %>%
mutate(
pop_65plus =
m_65_66E + m_67_69E + m_70_74E +
m_75_79E + m_80_84E + m_85upE +
f_65_66E + f_67_69E + f_70_74E +
f_75_79E + f_80_84E + f_85upE
)
pa_acs <- pa_acs %>%
select(
GEOID,
total_pop = total_popE,
med_income = med_incomeE,
pop_65plus
)
# Join to tract boundaries
pa_tracts_demo <- pa_tracts %>%
left_join(pa_acs, by = "GEOID")
# How many tracts have missing income data?
missing_income_n <- pa_tracts_demo %>%
st_drop_geometry() %>%
summarise(n_missing = sum(is.na(med_income)))
missing_income_n %>%
kable(
caption = "Number of tracts with missing median income values",
col.names = "Number of Missing Tracts",
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| Number of Missing Tracts |
|---|
| 62 |
# What is the median income across all PA census tracts?
median_income_pa <- pa_tracts_demo %>%
st_drop_geometry() %>%
summarise(median_income = median(med_income, na.rm = TRUE))
median_income_pa %>%
mutate(median_income = scales::comma(median_income)) %>%
kable(
caption = "Median household income across Pennsylvania census tracts",
col.names = "Median Household Income (USD)",
align = "c"
) %>%
kable_styling(
position = "center", full_width = FALSE)| Median Household Income (USD) |
|---|
| 70,188 |
Questions to answer:
What year of ACS data are you using?
2022How many tracts have missing income data?
62What is the median income across all PA census tracts?
$70,188
Step 3: Define Vulnerable Populations
Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)
Your Task:
# Filter for vulnerable tracts based on your criteria
pa_tracts_demo <- pa_tracts_demo %>%
mutate(
elderly_share = pop_65plus / total_pop
)
income_threshold <- quantile(pa_tracts_demo$med_income, 0.1, na.rm = TRUE)
elderly_threshold <- 0.25
vulnerable_tracts <- pa_tracts_demo %>%
filter(
!is.na(med_income),
!is.na(elderly_share),
med_income <= income_threshold |
elderly_share >= elderly_threshold
)
summary_vulnerability <- tibble(
Number_of_Vulnerable_Tracts = nrow(vulnerable_tracts),
Percent_of_All_Tracts = percent(Number_of_Vulnerable_Tracts / nrow(pa_tracts_demo), accuracy = 0.01)
)
summary_vulnerability %>%
kable(
caption = "Vulnerable Census Tracts in Pennsylvania",
col.names = c("Number of Vulnerable Tracts",
"Percent of Vulnerable Tracts"),
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| Number of Vulnerable Tracts | Percent of Vulnerable Tracts |
|---|---|
| 900 | 26.12% |
Questions to answer:
What income threshold did you choose and why?
I define low income as tracts in the bottom 10% of median household income statewide (≤ about $40,841). This focuses on the most economically disadvantaged communities and avoids labeling too many tracts as “vulnerable”.What elderly population threshold did you choose and why?
I define a high elderly population as tracts where the 65+ share is ≥ 25%. In Pennsylvania, census tracts have an average 65+ share of 19.0%, and the 75th percentile is 23.1%. Based on this distribution, I use 25% to flag heavily aging communities, since it exceeds the upper-quartile benchmark and indicates unusually high healthcare need.How many tracts meet your vulnerability criteria?
900What percentage of PA census tracts are considered vulnerable by your definition?
26.12%
Step 4: Calculate Distance to Hospitals
For each vulnerable tract, calculate the distance to the nearest hospital.
Your Task:
# Transform to appropriate projected CRS
vulnerable_tracts <- st_transform(vulnerable_tracts, 32129)
pa_hospitals <- st_transform(pa_hospitals, 32129)
pa_counties <- st_transform(pa_counties, 32129)
# Calculate distance from each tract centroid to nearest hospital
centroids <- st_point_on_surface(vulnerable_tracts)
dist <- st_distance(centroids, pa_hospitals)
min_dist_m <- apply(dist, 1, min)
vulnerable_tracts$dist_to_hosp_miles <- as.numeric(min_dist_m) / 1609.344 # Convert meters to miles
avg_dist <- mean(vulnerable_tracts$dist_to_hosp_miles, na.rm = TRUE)
max_dist <- max(vulnerable_tracts$dist_to_hosp_miles, na.rm = TRUE)
n_over15 <- sum(vulnerable_tracts$dist_to_hosp_miles > 15, na.rm = TRUE)
distance_summary <- tibble(
Average_Distance_Miles = round(avg_dist, 2),
Maximum_Distance_Miles = round(max_dist, 2),
Tracts_Over_15_Miles = comma(n_over15)
)
distance_summary %>%
kable(
caption = "Hospital Access Distance Summary for Vulnerable Tracts",
col.names = c(
"Average Distance (miles)",
"Maximum Distance (miles)",
"Number of Tracts > 15 Miles"
),
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| Average Distance (miles) | Maximum Distance (miles) | Number of Tracts > 15 Miles |
|---|---|---|
| 3.98 | 28.59 | 26 |
Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection
I use NAD83 / Pennsylvania State Plane South (EPSG: 32129). This is a projected coordinate system designed specifically for Pennsylvania, which minimizes distortion within the state and allows for accurate distance calculations. Using a geographic CRS would produce incorrect distance measurements. However, this projection uses meters as its unit, so I converted distances from meters to miles.
Questions to answer:
What is the average distance to the nearest hospital for vulnerable tracts?
3.97 milesWhat is the maximum distance?
29.12 milesHow many vulnerable tracts are more than 15 miles from the nearest hospital?
25
Step 5: Identify Underserved Areas
Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.
Your Task:
# Create underserved variable
vulnerable_tracts <- vulnerable_tracts %>%
mutate(underserved = dist_to_hosp_miles > 15)
n_underserved <- sum(vulnerable_tracts$underserved, na.rm = TRUE)
pct_underserved <- mean(vulnerable_tracts$underserved, na.rm = TRUE)
underserved_summary <- tibble(
Number_of_Underserved_Tracts = comma(n_underserved),
Percent_of_Vulnerable_Tracts = percent(pct_underserved, accuracy = 0.1)
)
underserved_summary %>%
kable(
caption = "Underserved Vulnerable Tracts",
col.names = c(
"Number of Underserved Tracts",
"Percent of Vulnerable Tracts"
),
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| Number of Underserved Tracts | Percent of Vulnerable Tracts |
|---|---|
| 26 | 2.9% |
Questions to answer:
How many tracts are underserved?
25What percentage of vulnerable tracts are underserved?
2.78%Does this surprise you? Why or why not?
A total of 25 vulnerable tracts are classified as underserved, representing 2.78% of all vulnerable tracts. This relatively small proportion is not entirely surprising, given that Pennsylvania has a dense network of hospitals, particularly around metropolitan areas such as Philadelphia. It suggests that overall hospital coverage in Pennsylvania is strong, but it still highlights localized access gaps in rural areas.
Step 6: Aggregate to County Level
Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.
Your Task:
# Spatial join tracts to counties
tract_pts <- st_point_on_surface(vulnerable_tracts)
vuln_with_county <- st_join(
tract_pts,
pa_counties,
join = st_within
)
# Aggregate statistics by county
county_stats <- vuln_with_county %>%
st_drop_geometry() %>%
group_by(COUNTY_NAM) %>%
summarise(
n_vulnerable_tracts = n(),
n_underserved_tracts = sum(underserved, na.rm = TRUE),
pct_underserved_vulnerable = n_underserved_tracts / n_vulnerable_tracts,
avg_dist_miles = mean(dist_to_hosp_miles, na.rm = TRUE),
total_vulnerable_pop = sum(total_pop, na.rm = TRUE)
) %>%
ungroup()
all_counties <- pa_counties %>%
st_drop_geometry() %>%
distinct(COUNTY_NAM)
county_stats_full <- all_counties %>%
left_join(county_stats, by = "COUNTY_NAM") %>%
mutate(
n_vulnerable_tracts = replace_na(n_vulnerable_tracts, 0),
n_underserved_tracts = replace_na(n_underserved_tracts, 0),
pct_underserved_vulnerable = replace_na(pct_underserved_vulnerable, 0),
total_vulnerable_pop = replace_na(total_vulnerable_pop, 0)
)
top5_pct_underserved <- county_stats_full %>%
arrange(desc(pct_underserved_vulnerable)) %>%
slice(1:5)
top5_pct_underserved %>%
mutate(
pct_underserved_vulnerable = scales::percent(pct_underserved_vulnerable, accuracy = 0.1),
n_vulnerable_tracts = scales::comma(n_vulnerable_tracts),
n_underserved_tracts = scales::comma(n_underserved_tracts),
total_vulnerable_pop = scales::comma(total_vulnerable_pop),
avg_dist_miles = round(avg_dist_miles, 2)
) %>%
select(
County = COUNTY_NAM,
`Vulnerable Tracts` = n_vulnerable_tracts,
`Underserved Tracts` = n_underserved_tracts,
`Percent Underserved` = pct_underserved_vulnerable,
`Total Vulnerable Population` = total_vulnerable_pop,
`Avg Distance (miles)` = avg_dist_miles
) %>%
kable(
caption = "Top 5 Counties by Share of Underserved Vulnerable Tracts",
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| County | Vulnerable Tracts | Underserved Tracts | Percent Underserved | Total Vulnerable Population | Avg Distance (miles) |
|---|---|---|---|---|---|
| BRADFORD | 1 | 1 | 100.0% | 5,466 | 17.59 |
| SULLIVAN | 3 | 3 | 100.0% | 3,860 | 20.87 |
| CAMERON | 2 | 2 | 100.0% | 4,536 | 19.13 |
| JUNIATA | 1 | 1 | 100.0% | 1,782 | 16.20 |
| FOREST | 1 | 1 | 100.0% | 2,701 | 18.91 |
far_people_by_county <- vuln_with_county %>%
filter(underserved) %>%
st_drop_geometry() %>%
group_by(COUNTY_NAM) %>%
summarise(
vulnerable_people_far = sum(total_pop, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(vulnerable_people_far))
far_people_by_county %>%
slice(1:5) %>%
mutate(
vulnerable_people_far = scales::comma(vulnerable_people_far)
) %>%
select(
County = COUNTY_NAM,
`Vulnerable Population >15 Miles from Hospital` = vulnerable_people_far
) %>%
kable(
caption = "Top 5 Counties by Number of Vulnerable Residents Living More Than 15 Miles from a Hospital",
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| County | Vulnerable Population >15 Miles from Hospital |
|---|---|
| PIKE | 7,666 |
| CHESTER | 5,626 |
| BRADFORD | 5,466 |
| CLEARFIELD | 5,189 |
| CAMERON | 4,536 |
Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts - Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population
Questions to answer:
Which 5 counties have the highest percentage of underserved vulnerable tracts?
Bradford, Sullivan, Cameron, Juniata, ForestWhich counties have the most vulnerable people living far from hospitals?
PikeAre there any patterns in where underserved counties are located?
The counties with the highest percentage of underserved vulnerable tracts are small, largely rural counties, where hospitals are sparse and travel distances are longer. Pike may not have the highest underserved share, but it has a large enough vulnerable population in underserved areas that its absolute count is the highest. This suggests that the greatest number of vulnerable people living far from hospitals can occur in outer counties where population is substantial but hospital supply remains limited.
Step 7: Create Summary Table
Create a professional table showing the top 10 priority counties for healthcare investment.
Your Task:
# Create and format priority counties table
priority_data <- county_stats_full %>%
left_join(far_people_by_county, by = "COUNTY_NAM") %>%
mutate(
vulnerable_people_far = replace_na(vulnerable_people_far, 0)
)
priority_table <- priority_data %>%
filter(n_vulnerable_tracts > 0) %>%
arrange(desc(pct_underserved_vulnerable),
desc(vulnerable_people_far)) %>%
slice(1:10) %>%
mutate(
County = COUNTY_NAM,
`Vulnerable Tracts` = n_vulnerable_tracts,
`Underserved Tracts` = n_underserved_tracts,
`% Underserved` = percent(pct_underserved_vulnerable, accuracy = 0.1),
`Avg Distance (mi)` = round(avg_dist_miles, 2),
`Vulnerable Pop` = comma(total_vulnerable_pop),
`Underserved Pop` = comma(vulnerable_people_far)
) %>%
select(
County,
`Vulnerable Tracts`,
`Underserved Tracts`,
`% Underserved`,
`Avg Distance (mi)`,
`Vulnerable Pop`,
`Underserved Pop`
)
priority_table %>%
kable(
caption = "Top 10 Priority Counties for Healthcare Investment in Pennsylvania",
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| County | Vulnerable Tracts | Underserved Tracts | % Underserved | Avg Distance (mi) | Vulnerable Pop | Underserved Pop |
|---|---|---|---|---|---|---|
| BRADFORD | 1 | 1 | 100.0% | 17.59 | 5,466 | 5,466 |
| CAMERON | 2 | 2 | 100.0% | 19.13 | 4,536 | 4,536 |
| SULLIVAN | 3 | 3 | 100.0% | 20.87 | 3,860 | 3,860 |
| FOREST | 1 | 1 | 100.0% | 18.91 | 2,701 | 2,701 |
| JUNIATA | 1 | 1 | 100.0% | 16.20 | 1,782 | 1,782 |
| CLEARFIELD | 3 | 2 | 66.7% | 11.34 | 10,474 | 5,189 |
| PIKE | 10 | 5 | 50.0% | 16.78 | 15,324 | 7,666 |
| POTTER | 3 | 1 | 33.3% | 10.11 | 8,343 | 1,948 |
| HUNTINGDON | 4 | 1 | 25.0% | 13.88 | 8,755 | 1,966 |
| MONROE | 4 | 1 | 25.0% | 7.81 | 13,140 | 1,850 |
Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)
Part 2: Comprehensive Visualization
Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.
Map 1: County-Level Choropleth
Create a choropleth map showing healthcare access challenges at the county level.
Your Task:
# Create county-level access map
pa_counties_map <- pa_counties %>%
left_join(county_stats_full, by = "COUNTY_NAM")
bb <- st_bbox(pa_counties_map)
x_pad <- (bb["xmax"] - bb["xmin"]) * 0.35
ggplot() +
geom_sf(
data = pa_counties_map,
aes(fill = pct_underserved_vulnerable),
color = "white",
size = 0.2
) +
geom_sf(
data = pa_hospitals,
aes(color = "Hospital"),
size = 1,
alpha = 0.8
) +
scale_fill_viridis_c(
option = "mako",
labels = scales::percent_format(accuracy = 1),
na.value = "grey90",
name = "% Vulnerable\nTracts Underserved"
) +
scale_color_manual(
values = c("Hospital" = "red"),
name = ""
) +
labs(
title = "Healthcare Access Challenges in Pennsylvania",
subtitle = "Counties shaded by percent of vulnerable tracts located >15 miles from a hospital",
caption = "Data Source: ACS 5-year estimates"
) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_fancy_orienteering,
height = unit(0.9, "cm"),
width = unit(0.9, "cm")
) +
coord_sf(
xlim = c(bb["xmin"], bb["xmax"] + x_pad),
ylim = c(bb["ymin"], bb["ymax"]),
datum = NA,
expand = FALSE
) +
theme_void() +
theme(
legend.position = c(0.8, 0.52),
legend.justification = c("left", "center"),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(size = 10)
) +
annotation_scale(
location = "br",
width_hint = 0.25,
unit_category = "imperial"
) Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels
Map 2: Detailed Vulnerability Map
Create a map highlighting underserved vulnerable tracts.
Your Task:
# Create detailed tract-level map
ggplot() +
geom_sf(
data = pa_counties,
fill = NA,
aes(color = "County boundary"),
linewidth = 0.4
) +
geom_sf(
data = pa_tracts,
aes(color = "Census tract"),
alpha = 0.25
) +
geom_sf(
data = vulnerable_tracts %>% filter(underserved),
aes(fill = "Underserved vulnerable tract"),
color = NA,
alpha = 0.9
) +
geom_sf(
data = pa_hospitals,
aes(color = "Hospital"),
size = 0.9,
alpha = 0.8
) +
scale_color_manual(
values = c(
"Hospital" = "red",
"Census tract" = "grey",
"County boundary" = "black"
),
name = ""
) +
scale_fill_manual(
values = c(
"Underserved vulnerable tract" = "#2c7fb8"
),
name = ""
) +
labs(
title = "Underserved Vulnerable Census Tracts in Pennsylvania",
subtitle = "Highlighted tracts are vulnerable and located >15 miles from the nearest hospital",
caption = "Data Source: ACS 5-year estimates"
) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_fancy_orienteering,
height = unit(0.9, "cm"),
width = unit(0.9, "cm")
) +
coord_sf(
xlim = c(bb["xmin"], bb["xmax"] + x_pad),
ylim = c(bb["ymin"], bb["ymax"]),
datum = NA,
expand = FALSE
) +
theme_void() +
theme(
legend.position = c(0.8, 0.52),
legend.justification = c("left", "center"),
plot.title = element_text(face = "bold"),
plot.subtitle = element_text(size = 10)
) +
annotation_scale(
location = "br",
width_hint = 0.25,
unit_category = "imperial"
) Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle
Chart: Distribution Analysis
Create a visualization showing the distribution of distances to hospitals for vulnerable populations.
Your Task:
# Create distribution visualization
ggplot(vulnerable_tracts, aes(x = dist_to_hosp_miles)) +
geom_histogram(
bins = 30,
fill = "#2c7fb8",
color = "white",
alpha = 0.8
) +
geom_vline(
xintercept = 15,
linetype = "dashed",
color = "red",
linewidth = 0.8
) +
labs(
title = "Distribution of Distance to Nearest Hospital",
x = "Distance to Nearest Hospital (miles)",
y = "Number of Vulnerable Tracts",
caption = "Red dashed line indicates 15-mile underserved threshold.\nData Source: ACS 5-year estimates"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold")
)Brief Interpretation:
Most vulnerable tracts are located within 10 miles of a hospital. However, a small tail extends beyond the 15-mile threshold (red dashed line), indicating geographically isolated communities facing potential healthcare access challenges.
Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size
Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)
Part 3: Bring Your Own Data Analysis
Choose your own additional spatial dataset and conduct a supplementary analysis.
Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question
Your Analysis
Your Task:
- Find and load additional data
- Document your data source
- Check and standardize the CRS
- Provide basic summary statistics
# Load your additional dataset
wifi <- st_read("data/free_city_wifi_locations/free_city_wifi_locations.shp")
internet <- get_acs(
geography = "tract",
state = "PA",
county = "Philadelphia",
output = "wide",
variables = c(
hh_total = "B28002_001",
hh_no_internet = "B28002_013"
),
year = 2022,
survey = "acs5",
geometry = FALSE
)
phl_tracts <- tracts(
state = "PA",
county = "Philadelphia",
year = 2022,
cb = TRUE
)
st_crs(wifi)
st_crs(phl_tracts)wifi_summary <- tibble(
Dataset = c("Free Public Wi-Fi Locations",
"ACS Internet Data",
"Philadelphia Census Tracts"),
`Number of Features` = c(
nrow(wifi),
nrow(internet),
nrow(phl_tracts)
)
)
wifi_summary %>%
mutate(`Number of Features` = scales::comma(`Number of Features`)) %>%
kable(
caption = "Summary of Digital Equity Spatial Datasets (Philadelphia)",
align = "c"
) %>%
kable_styling(position = "center", full_width = FALSE)| Dataset | Number of Features |
|---|---|
| Free Public Wi-Fi Locations | 254 |
| ACS Internet Data | 408 |
| Philadelphia Census Tracts | 408 |
Questions to answer:
What dataset did you choose and why?
I use three datasets: (1)Free Wi-Fi locations, (2) ACS tract-level data on households without internet access, (3) Philadelphia census tract boundaries.
I selected these datasets to evaluate digital equity by comparing areas with high levels of internet disconnection to the spatial distribution of free Wi-Fi locations. Using Philly census tracts data allows me to analyze tract-level patterns and identify digital access gaps.What is the data source and date?
The free Wi-Fi dataset is obtained from OpenDataPhilly. The data were last updated on February 12, 2026.
Demographic data are obtained from the U.S. Census Bureau’s American Community Survey (ACS) 5-year estimates, 2022 release.
Philadelphia census tract boundaries are downloaded using the tigris package.How many features does it contain?
The wifi dataset has 254 features. The internet_acs dataset and phl_tract both have 408 features.What CRS is it in? Did you need to transform it?
Wifi: WGS 84 / Pseudo-Mercator (EPSG: 3857)
Phl tracts: NAD83 (EPSG: 4269)
I need to transform all spatial layers into a common projected coordinate system, NAD83 / Pennsylvania State Plane South (EPSG: 32129).
- Pose a research question
Do neighborhoods with high levels of household internet disconnection have adequate access to free public Wi-Fi locations?
- Conduct spatial analysis
Use at least TWO spatial operations to answer your research question.
Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation
Your Task:
# Your spatial analysis
# Transform to a projected CRS
phl_tracts <- st_transform(phl_tracts, 32129)
wifi <- st_transform(wifi, 32129)
# Calculate % of hh without internet
internet <- internet %>%
mutate(
pct_no_internet = hh_no_internetE / hh_totalE,
pct_no_internet = replace_na(pct_no_internet, 0)
)
# Join acs data to tracts
internet_tracts <- phl_tracts %>%
left_join(internet, by = "GEOID")
# Define threshold for digital vulnerability
internet_tracts <- internet_tracts %>%
mutate(
high_digital_vuln = pct_no_internet > 0.20
)
# Set 0.5 mile buffer for each wifi location as its service area
wifi_buf <- st_buffer(wifi, dist = 1609.34/2)%>%
st_union()
wifi_buf <- st_sf(geometry = wifi_buf)
# Intersect with tract area
tract_intersection <- st_intersection(
internet_tracts %>% select(GEOID),
wifi_buf
)
# Calculate: wifi service area / tract area
internet_tracts <- internet_tracts %>%
mutate(
tract_area = st_area(geometry) # Calculate tract area
)
cov_share <- tract_intersection %>%
mutate(int_area = st_area(geometry)) %>% # Calculate the intersected area
st_drop_geometry() %>%
group_by(GEOID) %>%
summarise(int_area = sum(int_area), .groups = "drop")
# A tract is classified as a "gap" if less than 50% of its area is covered by the service area of free wifi
internet_tracts <- internet_tracts %>%
left_join(cov_share, by = "GEOID") %>%
mutate(
int_area = as.numeric(int_area),
tract_area = as.numeric(tract_area),
int_area = replace_na(int_area, 0),
share_covered_05mi = int_area / tract_area,
gap_05mi = share_covered_05mi < 0.5
)
# Wi-Fi coverage gap summary (all tracts)
overall_summary <- internet_tracts %>%
st_drop_geometry() %>%
summarise(
total_tracts = n(),
total_gap = sum(gap_05mi, na.rm = TRUE),
Share_Gap = paste0(round(mean(gap_05mi, na.rm = TRUE) * 100, 2), "%"),
Mean_Coverage = paste0(round(mean(share_covered_05mi, na.rm = TRUE) * 100, 2), "%")
)
kable(overall_summary, caption = "Overall Wi-Fi Coverage Summary")| total_tracts | total_gap | Share_Gap | Mean_Coverage |
|---|---|---|---|
| 408 | 53 | 12.99% | 84.83% |
# Wi-Fi coverage gap summary (digitally vulnarable tracts)
high_vuln_summary <- internet_tracts %>%
st_drop_geometry() %>%
summarise(
High_Vuln_Tracts = sum(high_digital_vuln, na.rm = TRUE),
High_Vuln_Gap = sum(high_digital_vuln & gap_05mi, na.rm = TRUE),
Share_High_Vuln_Gap = paste0(
round((High_Vuln_Gap / High_Vuln_Tracts) * 100, 2),"%"),
Mean_Coverage = paste0(round(mean(share_covered_05mi, na.rm = TRUE) * 100, 2), "%")
)
kable(high_vuln_summary, caption = "Wi-Fi Coverage Among High Digital Vulnerability Tracts")| High_Vuln_Tracts | High_Vuln_Gap | Share_High_Vuln_Gap | Mean_Coverage |
|---|---|---|---|
| 62 | 5 | 8.06% | 84.83% |
# Map 1. Digital Vulnerability by Census Tract
ggplot() +
geom_sf(data = internet_tracts, aes(fill = pct_no_internet),
color = "grey70", linewidth = 0.1) +
geom_sf(data = internet_tracts %>% filter(high_digital_vuln),
fill = NA, color = "white", linewidth = 0.5) +
scale_fill_continuous(
name = "% households\nwithout internet",
labels = percent_format(accuracy = 1)
) +
labs(
title = "Map 1. Digital Vulnerability by Census Tract, Philadelphia",
subtitle = "High digital vulnerability: % households without internet > 20% (outlined)",
caption = "Data: ACS 2022 5-year estimates."
) +
annotation_scale(
location = "br",
width_hint = 0.3,
unit_category = "imperial"
) +
annotation_north_arrow(
location = "tl",
which_north = "true",
style = north_arrow_fancy_orienteering,
height = unit(0.9, "cm"),
width = unit(0.9, "cm")
) +
coord_sf(datum = NA) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
panel.grid.major = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()
)# Map 2. Free Public Wi-Fi Service Area and Access Gap, Philadelphia
high_vuln_gap <- internet_tracts %>%
filter(high_digital_vuln & gap_05mi)
high_vuln <- internet_tracts %>%
filter(high_digital_vuln)
ggplot() +
geom_sf(
data = internet_tracts,
aes(fill = "Census tract"),
color = "grey40",
linewidth = 0.15
) +
geom_sf(
data = high_vuln,
aes(fill = "High vulnerability tract"),
color = "grey40",
linewidth = 0.15
) +
geom_sf(
data = wifi_buf,
aes(color = "Wi-Fi service area (0.5 mile)"),
fill = "deepskyblue2",
alpha = 0.20,
linewidth = 0.5
) +
geom_sf(
data = wifi,
aes(shape = "Free public Wi-Fi location"),
color = "white",
fill = "black",
stroke = 0.5,
size = 1.5,
alpha = 0.95
) +
geom_sf(
data = high_vuln_gap,
aes(color = "High-vulnerability gap tract"),
fill = NA,
linewidth = 0.7
) +
scale_fill_manual(
name = "",
values = c(
"Census tract" = "white",
"High vulnerability tract" = "grey60"
)
) +
scale_color_manual(
name = "",
values = c(
"Wi-Fi service area (0.5 mile)" = "deepskyblue4",
"High-vulnerability gap tract" = "red"
)
) +
scale_shape_manual(
name = "",
values = c("Free public Wi-Fi location" = 21)
) +
guides(
shape = guide_legend(order = 1),
fill = guide_legend(order = 2),
color = guide_legend(order = 3)
) +
labs(
title = "Map 2. Free Public Wi-Fi Service Area and Access Gap, Philadelphia",
subtitle = "High vulnerability: % households without internet > 20%.\nGap tracts: <50% of tract area covered within 0.5 mile service area of free Wi-Fi.",
caption = "Data: OpenDataPhilly, ACS 2022 5-year estimates."
) +
annotation_scale(
location = "br",
width_hint = 0.3,
unit_category = "imperial"
) +
annotation_north_arrow(
location = "tl",
which_north = "true",
style = north_arrow_fancy_orienteering,
height = unit(0.9, "cm"),
width = unit(0.9, "cm")
) +
coord_sf(datum = NA) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
panel.grid.major = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.position = "right"
)Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)
Your interpretation:
Across Philadelphia, digital vulnerability is spatially clustered, with high-vulnerability tracts (% households without internet > 20%) concentrated mainly in West and North Philadelphia. (See Map 1.) Overlaying free public Wi-Fi locations and their 0.5-mile service areas suggests that coverage is broadly extensive and reaches most high-need areas. (See Map 2.) However, a small set of five high-vulnerability tracts remain “access gaps,” where more than 50% of tract area falls out of the Wi-Fi service coverage. Taken together, while the overall spatial distribution of Wi-Fi appears extensive, the presence of these uncovered vulnerable tracts indicates that coverage is not fully aligned with areas of highest need. Targeted placement of additional Wi-Fi sites in or near these specific gap tracts can improve digital equity outcomes.
Finally - A few comments about your incorporation of feedback!
Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.
For some improvement on feedback:
First, instead of directly printing outputs, I used kable() to present summary tables in a more formal format. Second, while I chose to retain some guideline structure in the template for future review, I removed unnecessary guideline content such as examples to make it clearer. Third, I leaned how to hide setup code and package loading using echo = FALSE. Finally, I omitted the Census API installation step, as it is already configured locally and should not be shared publicly.
Submission Requirements
What to submit:
- Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
- Use
embed-resources: truein YAML so it’s a single file - All code should run without errors
- All maps and charts should display correctly
- Use
- Submit the correct and working links of your assignment on Canvas