This document is available here: http://alexknorre.com/files/geodata-with-R-30jun2022.html
ph <- get_acs(state = "PA",
county = "Philadelphia",
geography = "tract",
variables = "B19013_001",
geometry = TRUE,cache_table = T,
output = "wide") %>%
mutate(income = B19013_001E)
mapview(ph, zcol = "income")
ggplot(ph) +
geom_sf(aes(fill = income)) +
theme_minimal()
Various ways of projecting an ellipsoid (or its part) onto a plane
Source: http://practicalgeoskills.blogspot.com/2020/04/map-projections-meaning-and-examples.html
Mercator projection is the most popular one (Google Maps). It’s conformal: a direction (for example, compass arrow) on such map is a straight line. But it distorts areas near poles.
Cartesian (X-Y on a plane) vs spherical/ellipsoidal (latitude and longitude)
Source: Edzer Pebesma, Roger Bivand (CC BY-NC-ND 4.0)
Datum: basis (form of Earth). WGS 84, NAD 83.
Let’s try to replicate this and play with various CRS using shapefile of states from the US Census website:
# Source: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_20m.zip
us_map <- read_sf("~/Downloads/cb_2018_us_state_20m/")
us_map
## Simple feature collection with 52 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS: NAD83
## # A tibble: 52 × 10
## STATEFP STATENS AFFGEOID GEOID STUSPS NAME LSAD ALAND AWATER
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 24 01714934 0400000US24 24 MD Maryland 00 2.52e10 6.98e 9
## 2 19 01779785 0400000US19 19 IA Iowa 00 1.45e11 1.08e 9
## 3 10 01779781 0400000US10 10 DE Delaware 00 5.05e 9 1.40e 9
## 4 39 01085497 0400000US39 39 OH Ohio 00 1.06e11 1.03e10
## 5 42 01779798 0400000US42 42 PA Pennsylvania 00 1.16e11 3.39e 9
## 6 31 01779792 0400000US31 31 NE Nebraska 00 1.99e11 1.37e 9
## 7 53 01779804 0400000US53 53 WA Washington 00 1.72e11 1.26e10
## 8 72 01779808 0400000US72 72 PR Puerto Rico 00 8.87e 9 4.92e 9
## 9 01 01779775 0400000US01 01 AL Alabama 00 1.31e11 4.59e 9
## 10 05 00068085 0400000US05 05 AR Arkansas 00 1.35e11 2.96e 9
## # … with 42 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]>
us_map %>%
ggplot() +
geom_sf() +
theme_minimal()
It does not look good: Alaska, Hawaii, and Puerto Rico are too far away from contiguous states. Let’s try removing them:
us_smaller <- us_map %>% filter(!(NAME %in% c("Alaska","Hawaii", "Puerto Rico")))
us_smaller %>%
ggplot() +
geom_sf() +
theme_minimal()
We can now try projecting states into different CRS:
p1<-us_smaller %>%
st_transform(crs = 3857) %>%
ggplot() +
geom_sf() +
labs(title = "EPSG: 3857 (Web Mercator,\nprojection (X-Y),\nused in Google Maps)")+
theme_minimal()
p2<-us_smaller %>%
st_transform(crs = 4326) %>%
ggplot() +
geom_sf() +
labs(title = "EPSG: 4326 (WGS 84,\nelliptical (lot-lan),\nused in Google Earth)")+
theme_minimal()
p3<-us_smaller %>%
st_transform(crs = 2955) %>%
ggplot() +
geom_sf() +
labs(title = "EPSG: 2955 (UTM projection, Zone 11N)")+
theme_minimal()
p4<-us_smaller %>%
st_transform(crs = 2163) %>%
ggplot() +
geom_sf() +
labs(title = "EPSG: 2163 (U.S. National Atlas)")+
theme_minimal()
ggarrange(p1,p2,p3,p4)
Points, lines, polygons
shootings <- read.csv("https://phl.carto.com/api/v2/sql?q=SELECT+*,+ST_Y(the_geom)+AS+lat,+ST_X(the_geom)+AS+lng+FROM+shootings&filename=shootings&format=csv&skipfields=cartodb_id")
shootings_sf <- shootings %>%
filter(!is.na(lat) & !is.na(lng)) %>%
filter(lat > 30) %>%
st_as_sf(coords = c("lng","lat"), crs = 4326)
shootings_sf %>% select(location, geometry)
## Simple feature collection with 11996 features and 1 field
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -75.27362 ymin: 39.88576 xmax: -74.95936 ymax: 40.1173
## Geodetic CRS: WGS 84
## First 10 features:
## location geometry
## 1 2700 BLOCK REED ST POINT (-75.18942 39.9357)
## 2 3000 BLOCK MIFFLIN ST POINT (-75.19553 39.93023)
## 3 2100 BLOCK MOUNTAIN ST POINT (-75.18002 39.93117)
## 4 2100 BLOCK MOUNTAIN ST POINT (-75.18002 39.93117)
## 5 1500 BLOCK S 23RD ST POINT (-75.18287 39.93339)
## 6 1600 BLOCK Annin St POINT (-75.1715 39.93688)
## 7 1200 BLOCK S 22ND ST POINT (-75.18051 39.93611)
## 8 1600 BLOCK S 18TH ST POINT (-75.17497 39.93031)
## 9 600 BLOCK S 27TH ST POINT (-75.18427 39.94566)
## 10 2400 BLOCK REED ST POINT (-75.18425 39.93504)
Let’s map it:
shootings_sf %>%
ggplot() +
geom_sf()
Let’s make it more aesthetically pleasing:
shootings_sf %>%
ggplot() +
geom_sf(lwd = 0.3,alpha = 0.3) +
annotation_scale(location = "bl", width_hint = 0.3) +
annotation_north_arrow(location = "br", which_north = "true",
pad_x = unit(0.1, "in"),
pad_y = unit(0, "in"),
style = north_arrow_fancy_orienteering,height = unit(.3, "in"),
width = unit(0.3, "in")) +
labs(title = "Shootings in Philadelphia") +
theme_minimal()
Instead of ggplotting, we can also use interactive visualizations:
shootings_sf_selected <- shootings_sf %>%
select(date_,location, age,sex,race,fatal)
mapview(shootings_sf_selected,zcol = "fatal")
tidycensus is a package to programmatically download data from US Census bureau (decennials censuses and American Community Survey data).
Before using it on your computer, you need to set up an API key: (for this, run ?tidycensus::census_api_key()
)
Let’s look at the variables ACS 2019 (5 yr) can give us:
acs_vars <- load_variables(2019, "acs5", cache = TRUE)
head(acs_vars)
## # A tibble: 6 × 3
## name label concept
## <chr> <chr> <chr>
## 1 B01001_001 Estimate!!Total: SEX BY AGE
## 2 B01001_002 Estimate!!Total:!!Male: SEX BY AGE
## 3 B01001_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE
## 4 B01001_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE
## 5 B01001_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE
## 6 B01001_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE
ph <- get_acs(state = "PA",
county = "Philadelphia",
geography = "tract",
variables = "B19013_001",
geometry = TRUE,cache_table = T,
output = "wide") %>%
mutate(income = B19013_001E)
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
ph
## Simple feature collection with 384 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -75.28027 ymin: 39.86705 xmax: -74.95578 ymax: 40.13799
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME
## 1 42101014500 Census Tract 145, Philadelphia County, Pennsylvania
## 2 42101004202 Census Tract 42.02, Philadelphia County, Pennsylvania
## 3 42101004102 Census Tract 41.02, Philadelphia County, Pennsylvania
## 4 42101000804 Census Tract 8.04, Philadelphia County, Pennsylvania
## 5 42101025300 Census Tract 253, Philadelphia County, Pennsylvania
## 6 42101028100 Census Tract 281, Philadelphia County, Pennsylvania
## 7 42101008602 Census Tract 86.02, Philadelphia County, Pennsylvania
## 8 42101030700 Census Tract 307, Philadelphia County, Pennsylvania
## 9 42101031101 Census Tract 311.01, Philadelphia County, Pennsylvania
## 10 42101025900 Census Tract 259, Philadelphia County, Pennsylvania
## B19013_001E B19013_001M geometry income
## 1 17991 6274 MULTIPOLYGON (((-75.15131 3... 17991
## 2 47935 14670 MULTIPOLYGON (((-75.15614 3... 47935
## 3 40958 12742 MULTIPOLYGON (((-75.16396 3... 40958
## 4 108293 11643 MULTIPOLYGON (((-75.17067 3... 108293
## 5 41545 3413 MULTIPOLYGON (((-75.18653 4... 41545
## 6 35591 10534 MULTIPOLYGON (((-75.15237 4... 35591
## 7 39394 7829 MULTIPOLYGON (((-75.22156 3... 39394
## 8 49906 11268 MULTIPOLYGON (((-75.0935 40... 49906
## 9 38292 9285 MULTIPOLYGON (((-75.08267 4... 38292
## 10 46548 8716 MULTIPOLYGON (((-75.1809 40... 46548
Let’s try intersecting shootings with census tracts. In other words, let’s project points onto polygons:
shootings_on_tracts <- st_join(ph, shootings_sf_selected)
“Error in st_geos_binop(”intersects”, x, y, sparse = sparse, prepared = prepared, : st_crs(x) == st_crs(y) is not TRUE”
Two datasets have different CRS systems. We need to reproject one into another’s CRS.
st_crs(ph)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
st_crs(shootings_sf_selected)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
ph_wgs84 <- ph %>%
st_transform(crs = 4326)
st_crs(ph_wgs84)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## ENSEMBLE["World Geodetic System 1984 ensemble",
## MEMBER["World Geodetic System 1984 (Transit)"],
## MEMBER["World Geodetic System 1984 (G730)"],
## MEMBER["World Geodetic System 1984 (G873)"],
## MEMBER["World Geodetic System 1984 (G1150)"],
## MEMBER["World Geodetic System 1984 (G1674)"],
## MEMBER["World Geodetic System 1984 (G1762)"],
## MEMBER["World Geodetic System 1984 (G2139)"],
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ENSEMBLEACCURACY[2.0]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Horizontal component of 3D system."],
## AREA["World."],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
Let’s first attach shootings to each of the tracts:
shootings_on_tracts <- st_join(shootings_sf_selected, ph_wgs84, join = st_within, left = T)
Then, let’s count how many shootings occured in each of the tracts:
shootings_by_tracts <- shootings_on_tracts %>% count(GEOID)
shootings_by_tracts
## Simple feature collection with 347 features and 2 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -75.27362 ymin: 39.88576 xmax: -74.95936 ymax: 40.1173
## Geodetic CRS: WGS 84
## First 10 features:
## GEOID n geometry
## 1 42101000100 2 MULTIPOINT ((-75.14644 39.9...
## 2 42101000200 6 MULTIPOINT ((-75.15631 39.9...
## 3 42101000300 1 POINT (-75.16274 39.95698)
## 4 42101000402 11 MULTIPOINT ((-75.16585 39.9...
## 5 42101000500 5 MULTIPOINT ((-75.15973 39.9...
## 6 42101000600 7 MULTIPOINT ((-75.15734 39.9...
## 7 42101000700 7 MULTIPOINT ((-75.16614 39.9...
## 8 42101000804 2 MULTIPOINT ((-75.16616 39.9...
## 9 42101000901 6 MULTIPOINT ((-75.16319 39.9...
## 10 42101001002 6 MULTIPOINT ((-75.14439 39.9...
Now we have table with 347 tract labels and each has a count of shootings. Let’s merge it with tracts:
shootings_by_tracts_no_geometry <- shootings_by_tracts %>% st_drop_geometry()
tracts_with_shooting_counts <- merge(ph_wgs84,shootings_by_tracts_no_geometry)
tracts_with_shooting_counts
## Simple feature collection with 347 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -75.28027 ymin: 39.86705 xmax: -74.95578 ymax: 40.1236
## Geodetic CRS: WGS 84
## First 10 features:
## GEOID NAME
## 1 42101000100 Census Tract 1, Philadelphia County, Pennsylvania
## 2 42101000200 Census Tract 2, Philadelphia County, Pennsylvania
## 3 42101000300 Census Tract 3, Philadelphia County, Pennsylvania
## 4 42101000402 Census Tract 4.02, Philadelphia County, Pennsylvania
## 5 42101000500 Census Tract 5, Philadelphia County, Pennsylvania
## 6 42101000600 Census Tract 6, Philadelphia County, Pennsylvania
## 7 42101000700 Census Tract 7, Philadelphia County, Pennsylvania
## 8 42101000804 Census Tract 8.04, Philadelphia County, Pennsylvania
## 9 42101000901 Census Tract 9.01, Philadelphia County, Pennsylvania
## 10 42101001002 Census Tract 10.02, Philadelphia County, Pennsylvania
## B19013_001E B19013_001M income n geometry
## 1 103585 17012 103585 2 MULTIPOLYGON (((-75.15154 3...
## 2 49871 17900 49871 6 MULTIPOLYGON (((-75.16234 3...
## 3 86296 15622 86296 1 MULTIPOLYGON (((-75.17994 3...
## 4 78947 8808 78947 11 MULTIPOLYGON (((-75.17329 3...
## 5 45438 9588 45438 5 MULTIPOLYGON (((-75.16506 3...
## 6 84632 12428 84632 7 MULTIPOLYGON (((-75.16401 3...
## 7 66923 16742 66923 7 MULTIPOLYGON (((-75.18091 3...
## 8 108293 11643 108293 2 MULTIPOLYGON (((-75.17067 3...
## 9 48929 10484 48929 6 MULTIPOLYGON (((-75.16475 3...
## 10 113431 16717 113431 6 MULTIPOLYGON (((-75.15034 3...
tracts_with_shooting_counts %>%
ggplot() +
geom_sf(aes(fill = n))
mapview(tracts_with_shooting_counts, zcol = "n")
Nice thing about mapview is that you can stack several geometries:
mapview(tracts_with_shooting_counts, zcol = "n") +
mapview(shootings_sf_selected, zcol = "fatal")
Is there correlation between median household income and shootings in areas?
plot(tracts_with_shooting_counts$income, tracts_with_shooting_counts$n)