DANNY TABACH
  • Home
  • R Studio
    • Logistic Regression with ROC
    • Small Problems >
      • Combinations
      • Connecting R and Google Sheets
    • Hospitals Project >
      • Clustering Hospitals
      • Leaflet and Shiny

Combinations

Combinations

3/19/2020

0 Comments

 
Hi! Welcome to the blog!

I came across an interesting problem at work. The company I work for has a dataset full of items, and they needed to find what the optimal combinations of these items would be. This sounds like a simple problem, but when you are dealing with large spreadsheets, this task may be a little daunting. No fear! There are multiple ways to tackle combination problems in R, and today I will showcase a few methods of doing so. Lets first make a sample data set.
​​data <- data.frame(
  category = c("Tshirt","Tshirt","Tshirt","Tshirt","Cup","Cup","Cup","Cup","Bag","Bag","Bag","Bag"),
  item = c("RedShirt",
"GreenShirt",
"BlueShirt",
"YellowShirt",
"CeramicCup",
"ClayCup",
"RubberCup",
"CoffeeCup",
"BigBag",
"SmallBag",
"MediumBag",
​"JumboBag"),
  price = c(10,12,8,15,7,5,8,10,25,15,20,30))
view(data)
Picture

This is obviously not a great set of data, but it is sufficient to make combinations! We are first going to use the expand.grid() function. You can check out the documentation for expand.grid() here. It is a very simple function that gets the job done by taking vectors to make combinations. Let's use it!
#To make combinations for each item in each category, we need to filter this data into separate vectors for the expand grid function. So lets make some filters. 

Tshirt <- filter(data, category == "Tshirt")
Cup <- filter(data, category == "Cup")
Bag <- filter(data, category == "Bag")

#Now we can finally use the expand.grid() function. 
combinations <- expand.grid(Bag$item,Cup$item,Tshirt$item)
​view(combinations)
​
Picture

The benefit of using expand.grid() is that it's really easy to add many vectors (or columns) to use for combinations. As far as I know, something like outer() uses just 2 vectors (without some function). For example;
​ ​
#Let's make another example data frame. 

​classes <- c("A+","A-","A","B+","B-","B","C+","C-","C","D+","D-","D","F")
#This is all the possible grades possible for classes... I am not sure if D+ or D- is a thing but let's try to see what's possible with these letters.
#here we are going to use Combn.
ncomb <- combn(classes, 5)
# 5 is the length of each combination, with is the useful part of combn. expand.grid makes the combination as long as the amount of vectors. n^z where "n" is the number of rows in the list and "z" is the amount of columns. 
view(ncomb)

Picture

The drawback with combn is that it never makes dupes in each combination. So a combination will never be A A A B B B. 

here is how it works with expand.grid()


n2comb <- expand.grid(data2,data2,data2)

#data2 has 13 rows. And since we are using data2 three times, the output will have 13^3 = 2917 cells.

view(n2comb)
Picture

expand.grid has the advantage of doing this in a really simple way. It might get too large if you add too many columns though. If you have only 2 vectors you can also use outer()

​data3 <- c("A","B","C","D","E")
data4 <- c("a","b","c","d","e")
n3comb <- outer(data3, data4, FUN = "paste", sep = "")
Picture
0 Comments
  • Home
  • R Studio
    • Logistic Regression with ROC
    • Small Problems >
      • Combinations
      • Connecting R and Google Sheets
    • Hospitals Project >
      • Clustering Hospitals
      • Leaflet and Shiny