HypothesisIn the last post, I showed how density and population statistics alone may not be useful for finding how urban an area is. However, I started to talk about how hospitals may be a better indicator for this problem. Hospitals can be measured by minimum distances from each other. If the minimum distance between one hospital to another is large, we can assume that the area between them has a small population. At the same time, we can also assume if the distance between two hospitals is small, that area probably has a higher population. Like I mentioned before, hospitals are very expensive to construct and maintain, so it would make sense for hospitals to only be built where the general populace can benefit from it. If the minimum distance from one hospital to another hospital is 30 miles, then it can be safe to assume that there isn't enough of a population between these two hospitals to warrant building another. We are going to run with this theory, and create a map that clusters hospitals relative to the minimum distances they have from each other. Let's get to work! library(plyr) Next we are going to get some data. You can find data for hospitals here Note: You can still do this project in the same way if you prefer to use another dataset. Just make sure your dataset is comprehensive enough and has latitude and longitude points. Going the distanceI think downloading this as a spreadsheet is the fastest way to get this into Rstudio. Lets save the file somewhere easy to locate on our computer. Hospitals <- read_csv("C:/Users/Daniel/Desktop/Hospitals.csv")
Now our new dataframe mdist2 has all 0 values as NA. You can check by clicking on the environment tab and clicking on mdist2. mdist2 <- mdist Now we can properly find the minimum distance in each row. The numbers here are in meters. mdist3 <- rowMins(mdist2, na.rm = TRUE) MappingThe next step is to start plotting these hospitals around a map! I am a sucker for maps and visualizing data using maps. We are going to use the ggmap package, so make sure you have the API key registered. If you have not yet installed ggmap, I suggest you do that by scrolling up and reading the instructions for ggmap in the libraries section. If you have the package, let's get started! A good place to familiarize yourself with ggmap is this excellent cheat sheet you can download here. You can also check out this blog for a great tutorial on ggmap. Lets just start off by plotting every hospital based on the latitude and longitude points. p <- ggmap(get_googlemap(center = c("United States"), Next we want to give these latitude and longitude points some aesthetics. These points only indicate where the hospitals are located. So what is our goal here? If we want to sort the hospitals in relation to their distances, we need to set up some parameters for what these distances should be. I based my parameters from measuring the distances between various hospitals in the country from rural areas and urban areas. Remember; this is only a hypothesis and not meant to be exact. Hospitals3 <- Hospitals2 %>% mutate(Classification = case_when(Distance >= 0 & Distance <= 300 ~ "Hospital Center", This is a map of the hospitals around Kansas. I figured this method of clustering comes with drawbacks (It is pretty messy to start off). Wichita on the top left (for example) has points that make it seem very urban. This is because the method of clustering only takes a single minimum distance into account. There might be a case where the points are so close to each other, but there is nothing around them. This might make small towns with one or 2 hospitals seem like urban centers. However, since we have a distance matrix in mdist, we can take the lowest "n" values in each row, and average out the distances. That way, we could rely on "n" closest hospitals instead of just one. mdistnew <- t(apply(mdist2, 1, sort))[, 1:5] This creates a new table called mdistnew. We use t(apply()) to do this for every row. The sort function inside the apply function sorts the values in each row in a certain way. In this case, [, 1:5] sorts it from the first smallest number in value to the 5th smallest number. If I wanted to get the max, I would take the tail end value of the last row, and the nth number below that (in this case the top values would look like [, n:7581]). I only used the 5 smallest distances in this example, but you can try as many as you'd like by changing 5 in the above code to a higher number. Now all I do is create a new data frame using mdistnew. colnames(mdistnew)[1:5]<- c("distance1","distance2","distance3","distance4","distance5") If we just used the minimum distance from one hospital to another (like we did with Hospitals2), the lowest value would be just half a meter. This would label the area as a "Hospital Center". Also, you might have noticed the distances in some rows are duplicated, this is because hospitals that are only meters apart from each other share the same minimum distance. In this case, we now have 5 distances.If we check row 5154, the first 2 distances are only .4 and 90 meters from another hospital. Using the previous method, this hospital would be labeled as "Very Urban" or a "Hospital center". However, now we can use 5 distances to gauge just how close a hospital actually is from other hospitals in the area. After looking at 5 distances in row 5154, we can say that the actual average distance to another hospital is around 1600 meters. This number can obviously change depending on how many distances we use. Hospitals4 <- Hospitals4 %>% mutate(Classification = case_when(MeanDistance >= 0 & MeanDistance <= 3000 ~ "Very Urban", Updated VersionOld VersionWe can clearly see the difference. Manhattan has all purple dots because all the hospitals are pretty close to each other, and the parameters managed to make Brooklyn and Queens appear less urban than Manhattan (as it should be). You can see how getting a "Very Urban" status is difficult because it will require all 5 nearby hospitals to average under 3000 meters z <- ggmap(get_googlemap(center = c("North West America"), Many points that are closer to each other are no longer labeled as urban. There are always exceptions, and I only tried 5 distances so far. However, we can see how points very close together can still be considered rural or suburban rather than urban. Next up I will be finding new ways to improve this, and I will look over how various "n" values for distances effect the output of the plots. Cheers!
0 Comments
If you ask a random person on the street if they would consider New York City to be "urban", they'll most likely answer with a "Yes". But how urban? Can we consider New York 100% urban? I think the assumption is fair. New York has a population of 9 million living in a dense area. But what if you asked about a city like Orlando? Or Kansas City? What about Rochester? As a percentage, how urban would these cities be? If you asked this from a person on the street, you'l probably receive a slew of answers ranging from number to number. In reality only humans could really answer if something is "urban" or not. There is no practical way to measure just how "urban" an area is. However, there are several good indicators that help with this problem.
For starters there are many things in common between urban centers such as New York, Chicago, and LA. They all have very concentrated population densities (as expected of Urban areas) and a large population. |