---
title: "Logistic Regression Code"
output: html_document
---
# Load and Format data
Here, we read and format data.
```{r}
library("readr")
avocado_df<-read_csv('avocado_df_categories.csv')
avocado_df$PriceCategory <- factor(avocado_df$PriceCategory, levels= c("Cheap","Expensive"),labels=c(0,1))
avocado_df$TotalVolume <- as.numeric(avocado_df$TotalVolume)
avocado_df$Type <- as.factor(avocado_df$Type)
avocado_df$Region <- as.factor(avocado_df$Region)
avocado_df$Month <- as.factor(avocado_df$Month)
avocado_df$Year <- as.factor(avocado_df$Year)
```
# Summary of Data
```{r}
head(avocado_df)
```
```{r}
summary(avocado_df)
```
# Logistic Regression in R
## Total Volume + Constant Term
We will now build a logistic model with a constant term and a TotalVolume term.
```{r}
model_volume <-glm(PriceCategory ~ 1 + TotalVolume, data = avocado_df,family = 'binomial')
```
### Model Summary
```{r}
summary(model_volume)
```
You can also get the coefficents of the model.
```{r}
print(coef(model_volume))
```
### Predicting Class Probabilities
```{r}
log_odds <- predict.glm(model_volume, data.frame(TotalVolume=15.0))
print(log_odds)
```
```{r}
probs <- exp(log_odds)/(1+ exp(log_odds))
print(probs)
```
### Getting Class Probabilities of Data
We can get the probability estimates of the model.
```{r}
probabilities <- model_volume$fitted.values
print(head(probabilities))
```
```{r}
print(avocado_df$PriceCategory)
```
### Predicting Class from Class Probabilities
Using a threshold value and the probabilities, we can assign classes to each observation.
### Predicting Classes
```{r}
predicted_classes = as.numeric(probabilities > 0.5)
print(predicted_classes)
```
Let's see our actual classes.
```{r}
print(avocado_df$PriceCategory)
```
### Confusion Matrix
We can calculate the error in our prediction.
```{r}
library(caret)
confusionMatrix(data=factor(predicted_classes),reference=avocado_df$PriceCategory)
```
## Model with Constant Term
We will now build a logistic model with a constant term.
```{r}
model_constant <-glm(PriceCategory ~ 1, data = avocado_df,family = 'binomial')
```
### Model Summary
```{r}
summary(model_constant)
```
## Model Comparison
We can compare models using the anova function.
```{r}
anova(model_constant, model_volume, test='Chisq')
```
## Model with Volume and Type
### Create Model
```{r}
model_volume_type <- glm(PriceCategory ~ 1 + TotalVolume + Type, data=avocado_df, family='binomial')
```
### Model Summary
```{r}
summary(model_volume_type)
```
### Predicting Class Probabilities
```{r}
log_odds <- predict.glm(model_volume_type, data.frame(TotalVolume=c(15.0,5.0), Type=c("conventional","organic")))
print(log_odds)
```
```{r}
probs <- exp(log_odds)/(1+ exp(log_odds))
print(probs)
```
### Class Probabilities
We can get the probability estimates of the model.
```{r}
probabilities = model_volume_type$fitted.values
print(head(probabilities))
```
```{r}
print(head(avocado_df$PriceCategory))
```
### Predicting Class from Class Probabilities
Using a threshold value and the probabilities, we can assign classes to each observation.
### Predicting Classes
```{r}
predicted_classes = as.numeric(probabilities > 0.5)
print(predicted_classes)
```
Let's see our actual classes.
```{r}
print(avocado_df$PriceCategory)
```
### Confusion Matrix
We can calculate the error in our prediction.
```{r}
confusionMatrix(data=factor(predicted_classes),reference=avocado_df$PriceCategory)
```
### Anova with null model
```{r}
anova(model_constant, model_volume_type, test='Chisq')
```
### Anova with model with only volume
```{r}
anova( model_volume, model_volume_type, test='Chisq')
```