--- title: "Logistic Regression Code" output: html_document --- # Load and Format data Here, we read and format data. ```{r} library("readr") avocado_df<-read_csv('avocado_df_categories.csv') avocado_df$PriceCategory <- factor(avocado_df$PriceCategory, levels= c("Cheap","Expensive"),labels=c(0,1)) avocado_df$TotalVolume <- as.numeric(avocado_df$TotalVolume) avocado_df$Type <- as.factor(avocado_df$Type) avocado_df$Region <- as.factor(avocado_df$Region) avocado_df$Month <- as.factor(avocado_df$Month) avocado_df$Year <- as.factor(avocado_df$Year) ``` # Summary of Data ```{r} head(avocado_df) ``` ```{r} summary(avocado_df) ``` # Logistic Regression in R ## Total Volume + Constant Term We will now build a logistic model with a constant term and a TotalVolume term. ```{r} # create model with PriceCategory vs AveragePrice, family='binomial' ``` ### Model Summary ```{r} # Print model summary ``` You can also get the coefficents of the model. ```{r} # print model ceofs ``` ### Predicting Class Probabilities ```{r} unknown_df <- data.frame(TotalVolume=15.0) # store prediction of unknown_df as log_odds ``` ```{r} # get convert log odds to probability ``` ### Getting Class Probabilities of Data We can get the probability estimates of the model. ```{r} # get fitted.values and store in probabilities # print head of probabilities ``` ```{r} # print PriceCategory from avocado_df ``` ### Predicting Class from Class Probabilities Using a threshold value and the probabilities, we can assign classes to each observation. ### Predicting Classes ```{r} # create boolean probabilities > 0.5 # coerce to numeric and store as predicted_classes # print predicted_classes ``` Let's see our actual classes. ```{r} # print the actual classes ``` ### Confusion Matrix We can calculate the error in our prediction. ```{r} library(caret) # confusionMatrix with data=factor(predicted_classes) and reference=avocado_df$PriceCategory ``` ## Model with Constant Term We will now build a logistic model with a constant term. ```{r} # model with a constant term, family='binomial' ``` ### Model Summary ```{r} # print summary ``` ## Model Comparison We can compare models using the anova function. ```{r} # compare model volume to model constant, test='Chisq' ``` ## Model with Volume and Type ### Create Model ```{r} # create model with TotalVolume and Type, family='binomial' ``` ### Model Summary ```{r} summary(model_volume_type) ``` ### Predicting Class Probabilities ```{r} unknown_df <- data.frame(TotalVolume=c(15.0,5.0), Type=c("conventional","organic")) # store prediction of unknown_df as log_odds ``` ```{r} # get convert log odds to probability ``` ### Getting Class Probabilities of Data We can get the probability estimates of the model. ```{r} # get fitted.values and store in probabilities # print head of probabilities ``` ```{r} # print PriceCategory from avocado_df ``` ### Predicting Class from Class Probabilities Using a threshold value and the probabilities, we can assign classes to each observation. ### Predicting Classes ```{r} # create boolean probabilities > 0.5 # coerce to numeric and store as predicted_classes # print predicted_classes ``` Let's see our actual classes. ```{r} # print the actual classes ``` ### Confusion Matrix We can calculate the error in our prediction. ```{r} # confusionMatrix with data=factor(predicted_classes) and reference=avocado_df$PriceCategory ``` ## Model Comparison We can compare models using the anova function. ```{r} # compare model_constant to model_volume_type, test='Chisq' ``` ```{r} # compare model_volume to model_volume_type, test='Chisq' ```