---
title: "Logistic Regression Code"
output: html_document
---
# Load and Format data
Here, we read and format data.
```{r}
library("readr")
avocado_df<-read_csv('avocado_df_categories.csv')
avocado_df$PriceCategory <- factor(avocado_df$PriceCategory, levels= c("Cheap","Expensive"),labels=c(0,1))
avocado_df$TotalVolume <- as.numeric(avocado_df$TotalVolume)
avocado_df$Type <- as.factor(avocado_df$Type)
avocado_df$Region <- as.factor(avocado_df$Region)
avocado_df$Month <- as.factor(avocado_df$Month)
avocado_df$Year <- as.factor(avocado_df$Year)
```
# Summary of Data
```{r}
head(avocado_df)
```
```{r}
summary(avocado_df)
```
# Logistic Regression in R
## Total Volume + Constant Term
We will now build a logistic model with a constant term and a TotalVolume term.
```{r}
# create model with PriceCategory vs AveragePrice, family='binomial'
```
### Model Summary
```{r}
# Print model summary
```
You can also get the coefficents of the model.
```{r}
# print model ceofs
```
### Predicting Class Probabilities
```{r}
unknown_df <- data.frame(TotalVolume=15.0)
# store prediction of unknown_df as log_odds
```
```{r}
# get convert log odds to probability
```
### Getting Class Probabilities of Data
We can get the probability estimates of the model.
```{r}
# get fitted.values and store in probabilities
# print head of probabilities
```
```{r}
# print PriceCategory from avocado_df
```
### Predicting Class from Class Probabilities
Using a threshold value and the probabilities, we can assign classes to each observation.
### Predicting Classes
```{r}
# create boolean probabilities > 0.5
# coerce to numeric and store as predicted_classes
# print predicted_classes
```
Let's see our actual classes.
```{r}
# print the actual classes
```
### Confusion Matrix
We can calculate the error in our prediction.
```{r}
library(caret)
# confusionMatrix with data=factor(predicted_classes) and reference=avocado_df$PriceCategory
```
## Model with Constant Term
We will now build a logistic model with a constant term.
```{r}
# model with a constant term, family='binomial'
```
### Model Summary
```{r}
# print summary
```
## Model Comparison
We can compare models using the anova function.
```{r}
# compare model volume to model constant, test='Chisq'
```
## Model with Volume and Type
### Create Model
```{r}
# create model with TotalVolume and Type, family='binomial'
```
### Model Summary
```{r}
summary(model_volume_type)
```
### Predicting Class Probabilities
```{r}
unknown_df <- data.frame(TotalVolume=c(15.0,5.0), Type=c("conventional","organic"))
# store prediction of unknown_df as log_odds
```
```{r}
# get convert log odds to probability
```
### Getting Class Probabilities of Data
We can get the probability estimates of the model.
```{r}
# get fitted.values and store in probabilities
# print head of probabilities
```
```{r}
# print PriceCategory from avocado_df
```
### Predicting Class from Class Probabilities
Using a threshold value and the probabilities, we can assign classes to each observation.
### Predicting Classes
```{r}
# create boolean probabilities > 0.5
# coerce to numeric and store as predicted_classes
# print predicted_classes
```
Let's see our actual classes.
```{r}
# print the actual classes
```
### Confusion Matrix
We can calculate the error in our prediction.
```{r}
# confusionMatrix with data=factor(predicted_classes) and reference=avocado_df$PriceCategory
```
## Model Comparison
We can compare models using the anova function.
```{r}
# compare model_constant to model_volume_type, test='Chisq'
```
```{r}
# compare model_volume to model_volume_type, test='Chisq'
```