DANNY TABACH
  • Home
  • R Studio
    • Logistic Regression with ROC
    • Small Problems >
      • Combinations
      • Connecting R and Google Sheets
    • Hospitals Project >
      • Clustering Hospitals
      • Leaflet and Shiny

Mapping Hospital Data

Intro

2/7/2020

0 Comments

 

If you ask a random person on the street if they would consider New York City to be "urban", they'll most likely answer with a "Yes". But how urban? Can we consider New York 100% urban? I think the assumption is fair. New York has a population of 9 million living in a dense area. But what if you asked about a city like Orlando? Or Kansas City? What about Rochester? As a percentage, how urban would these cities be? If you asked this from a person on the street, you'l probably receive a slew of answers ranging from number to number. In reality only humans could really answer if something is "urban" or not. There is no practical way to measure just how "urban" an area is. However, there are several good indicators that help with this problem. 

​For starters there are many things in common between urban centers such as New York, Chicago, and LA. They all have very concentrated population densities (as expected of Urban areas) and a large population. 

library(plyr)
library(lubridate)
library(ggplot2)
library(dplyr)
library(data.table)
library(ggrepel)
library(tidyverse)
library(ggmap)
#Installation for ggmap package here. Make sure you register your google api key #with register_google(key = " ")
library(sp)
library(rgdal)
library(geosphere)
library(readxl)
library(matrixStats)
library(magick)

#R will generally install any missing packages. 
​#Here is a quick plot I created with the data provided up there
​uscitiesold <- read_csv("C://Users//Daniel//Desktop//ExcelStuff//uscities.csv")
#view(uscitiesold)

options(scipen=999)
ggplot(data = uscitiesold, aes(population, state_id)) +
  geom_point() +
  geom_text_repel(data = subset(uscitiesold, population > 1000000), aes(label = city),
  ) + 
  theme(axis.text.x = element_text(face="bold", color="#993333", 
                                   size=10, angle=0),
        axis.text.y = element_text(face="bold", color="#993333", 
                                   size=10, angle=0)) +
  scale_x_continuous("Population", breaks =c(100000,1000000,2000000,4000000,8000000,16000000))+
  scale_y_discrete("State") +
  ggtitle("Population of each city by state")+
  windows()

#Feel free to play around with the numbers. To break down this code for beginners:
#read_csv reads an excel file on my computer with a path to that file. I keep it in a folder. 
#options(scipen=999) is taken from here. all I am doing is removing scientific notation from my plot
#geom_text_repel is using the ggrepel package. I am separating labels. More here. Notice how 
​#I subsetted the labels to only label everything over a population of 1 million You can change this to see how it looks. 
#To make my numbers more visible I added the theme() line. More here on how to customize labels and tick marks 
#scale_x_continuous takes the scale (which is the x axis of 'population'). I labeled it "Population", and 
# added breaks that are visible in the plot below on the x-axis. More on this here
# ggtitle just adds a title to the plot. 
Picture
Looks interesting! However it doesn't give us much information about the densities of each city. We can do that with this: 
ggplot(data = uscitiesold, aes(population, density)) +
  geom_point() +
  geom_text_repel(data = subset(uscitiesold, population > 2000000 | density > 10000), aes(label = city),
  ) + 
  theme(axis.text.x = element_text(face="bold", color="#993333", 
                                   size=10, angle=0),
        axis.text.y = element_text(face="bold", color="#993333", 
                                   size=10, angle=0)) +
  scale_x_continuous("Population", breaks =c(100000,1000000,2000000,4000000,8000000,16000000))+
  scale_y_continuous("Density", limits = c(0,30000)) +
  ggtitle("Population vs Density")+
  windows()
Picture
Note: the New York point on the far right is actually the entire metropolitan area around NYC. As far as I could tell, this is the only point in the dataset that does this.
So now we have a pretty decent graphic for looking at a city's density and population. However, this might actually be misleading in terms of defining what area is more urban than another. Based on this graphic, it may be implied that Brooklyn is more urban than San Francisco because Brooklyn is more dense. However, this might not be the case. It can also imply that just because you are below a certain population or density- you are automatically not 'urban'. 

Here is a chart comparing population and area- which are the components of population density. 
options(scipen=999)
uscitiesarea <- uscitiesold %>% mutate(area_km2 =  population/density) 
uscitiesarea <- uscitiesarea %>% filter(area_km2 < 100000)
view(uscitiesarea)
ggplot(data = uscitiesarea, aes(area_km2,population)) +
  geom_point() +
  geom_text_repel(data = subset(uscitiesarea, population > 1000000 & density > 1500), aes(label = city),
  ) + 
  theme(axis.text.x = element_text(face="bold", color="#993333", 
                                   size=10, angle=0),
        axis.text.y = element_text(face="bold", color="#993333", 
                                   size=10, angle=0)) +
  scale_x_continuous("Area in Km Squared", limits = c(0,8000))+
  scale_y_continuous("Population", breaks =c(2000000,4000000,8000000,16000000)) +
  ggtitle("Population vs Area")+
  windows()

Picture
Unfortunately this graphic still gives us a vague representation of urban areas. Although this allows us to see which cities and towns have a large area or population, it doesn't tell us how "urban" anything might be. Densities, populations, and land area statistics are more useful when compared with something else.The densities taken for each city are estimates and averages of a large area encompassing many neighborhoods. It would be unfair to use this data alone to make any predictions about which areas are urban.
So what do urban centers have in common (disregarding population)?
Urban centers generally have many stores, food chain restaurants, a higher standard of living, and a lot of infrastructure. Urban centers also have parks, easy access to healthcare and apartments for living space. The problem here is that a lot of the data on these variables are limited. One of the easier pieces of data I could have accessed were hospitals.

Hospitals are incredibly expensive to construct (read more here), so it  makes sense if hospitals were located in areas where the general population can benefit from it being there. Knowing this information, it is safe to assume that areas with greater density and population generally have more hospitals than areas that have low densities and populations. This can be a great indicator to show how urban an area might be. To test this, we are going to plot hospital locations across the United States and start looking at some cities with hospitals! 
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

  • Home
  • R Studio
    • Logistic Regression with ROC
    • Small Problems >
      • Combinations
      • Connecting R and Google Sheets
    • Hospitals Project >
      • Clustering Hospitals
      • Leaflet and Shiny