Lecture15: Logistic Regression

Example: Michelin and Zagat guides to New York City restaurants

In November 2005, Michelin published its first ever guide to hotels and restaurants in New York City (Anonymous, 2005). According to the guide, inclusion in the guide is based on Michelin’s “meticulous and highly confidential evaluation process (in which) Michelin inspectors – American and European – conducted anonymous visits to New York City restaurants and hotels. … Inside the premier edition of the Michelin Guide New York City you’ll find a selection of restaurants by level of comfort; those with the best cuisine have been awarded our renowned Michelin stars. … From the best casual, neighborhood eateries to the city’s most impressive gourmet restaurants, the Michelin Guide New York City provides trusted advice for an unbeatable experience, every time.”

On the other hand, the Zagat Survey 2006: New York City Restaurants (Gathje and Diuguid, 2005) is purely based on views submitted by customers using mail-in or online surveys.

We shall restrict our comparison of the two restaurant guides to the 164 French restaurants that are included in the Zagat Survey 2006: New York City Restaurants. We want to be able to model \(\theta\), the probability that a French restaurant is included in the 2006 Michelin Guide New York City, based on customer views from the Zagat Survey 2006: New York City Restaurants. We begin looking at the effect of \(x\), customer ratings of food on \(\theta\). As such we define the following binary response variable:

\(y_i = 1\) if the restuarant is included in the Michelin guide
\(y_i = 0\) if the restuarant is NOT included in the Michelin guide

We shall consider the following potential predictor variables:

\(x\) = Food = customer rating of the food (out of 30)

MichelinNY <- read.csv("MichelinNY.csv", header=TRUE)
attach(MichelinNY)

y <- InMichelin

# check the data using 'head' and 'tail' 
head(cbind(y, Food))

##      y Food
## [1,] 0   19
## [2,] 0   17
## [3,] 0   23
## [4,] 1   19
## [5,] 0   23
## [6,] 0   18

tail(cbind(y, Food))

##        y Food
## [159,] 0   22
## [160,] 0   23
## [161,] 1   25
## [162,] 1   25
## [163,] 0   20
## [164,] 1   23

# Plot of y vs. food rating 
par(mfrow=c(1,1))
plot(jitter(Food,amount=.15),jitter(y,amount=0.03),xlab="Food Rating",
ylab="In Michelin Guide? (0=No, 1=Yes)")

# Box plots of Food Ratings
boxplot(Food~y, ylab="Food Rating",xlab="In Michelin Guide? (0=No, 1=Yes)")

# aggregate y sliced by Food: sample proportion of "successes" against food rating 
agg_y = aggregate(y~Food, FUN=mean)
plot(agg_y$Food, agg_y$y)

#Logistic regression output
m1 <- glm(y~Food,family=binomial(),data=MichelinNY)
summary(m1)

## 
## Call:
## glm(formula = y ~ Food, family = binomial(), data = MichelinNY)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3484  -0.8555  -0.4329   0.9028   1.9847  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -10.84154    1.86234  -5.821 5.83e-09 ***
## Food          0.50124    0.08767   5.717 1.08e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.79  on 163  degrees of freedom
## Residual deviance: 175.73  on 162  degrees of freedom
## AIC: 179.73
## 
## Number of Fisher Scoring iterations: 4

The fitted model is \[\widehat{\theta}(x) = \frac{1}{1 + \exp(\widehat \beta_0 + \widehat \beta_1 x)} = \frac{1}{1 + \exp(-\{-10.842 + 0.501 x\})}.\]

Rearranging the fitted model equation gives the log(odds) or logit \[ \log \left(\frac{\widehat \theta(x)}{1 - \widehat \theta(x)} \right) = \widehat \beta_0 + \widehat \beta_1 x = -10.842 + 0.501 x. \] The estimated odds for being included in the Michelin guide are given by \[ \left(\frac{\widehat \theta(x)}{1 - \widehat \theta(x)} \right) = \exp(\widehat \beta_0 + \widehat \beta_1 x) = \exp(-10.842 + 0.501 x). \]

x <- seq(15,28,0.05)
y <- 1/(1+exp(-1*(m1$coeff[1] + m1$coeff[2]*x)))
plot(agg_y$Food,agg_y$y,ylab="Probability of inclusion in the Michelin Guide",xlab="Zagat Food Rating")
lines(x,y)

detach(MichelinNY)

For example, if \(x\), Zagat food rating is increased by

One unit then the odds for being included in the Michelin guide increases by exp(0.501) = 1.7
Five units then the odds for being included in the Michelin guide increases by exp(5 × 0.501) = 12.2

Lecture15: Logistic Regression

Meng Li

11/9/2017

Example: Michelin and Zagat guides to New York City restaurants