Which model is better at predicting whether a mushroom is poisonous or edible (with a twist)?

Noli Angeles

Introduction

The goal for this project is to create different machine learning models on a dataset and see which one performs better on out-of-sample data. We will be using the Mushroom dataset (includes mushrooms from only the Agaricus and Lepiota Family) from the UC Irvine Machine Learning Repository and can be found here: https://archive.ics.uci.edu/dataset/73/mushroom. It has 8124 observations and 23 total columns (22 features), with the target variable being poisonous. All of the columns in the dataset are either of categorical or binary data type. The machine learning models we will build are: Naive Bayes, Logistic Regression with Ridge Regularization, and Classification Tree. To compare all the models we will use accuracy scores, false negative rates, confusion matrices, and cross-validation scores.

The "twist"

To make it harder for the models to predict correctly, I want to limit which features are included in the dataset. In this case I decided to keep predictor variables that included ONLY colors and can be seen immediately. There are only 6 features in the dataset that describe the color of certain parts of the mushroom: cap-color, gill-color, stalk-color-above-ring, stalk-color-below-ring, veil-color, and spore-print-color. Since spore-print-color isn't something that we can see at first glance, we will also drop this feature, leaving only 5 color features left to predict the target. Why only colors? Maybe in the future we can create a camera that can evaluate and predict if mushrooms are poisonous in the Agaricus and Lepiota Family by only looking at the the different colors on the surface of the mushroom.

Steps

In order to use multiple models on a single data set, I would have to do the following steps.

1) Import Libraries and Load Data

2) Data Cleaning and Preprocessing

3) Exploratory Data Analysis

4) Split Dataset into Train/Test Sets

5) Create Models

6) Compare Models

Import Libraries and Load Data

First, we need import all necessary libraries for processing, creating models, and to compare the model performance.

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split #splitting dataset
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, LabelEncoder #encoding variables
from sklearn.naive_bayes import CategoricalNB #naive bayes model
from sklearn.linear_model import LogisticRegression #logistic regression model
from sklearn.tree import DecisionTreeClassifier #classification tree model
from sklearn.neighbors import KNeighborsClassifier #knn model
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, ConfusionMatrixDisplay #metrics to compare models
from sklearn.model_selection import cross_val_score #cross-validation
import matplotlib.pyplot as plt #plotting graphs
import seaborn as sns #creating plots/graphs
In [3]:
### unzip file
import zipfile

with zipfile.ZipFile('mushroom.zip', 'r') as zip_ref:
    zip_ref.extractall() #unzip into same directory
In [4]:
### write the column names for each column
column_names = ['poisonous','cap-shape','cap-surface','cap-color','bruises','odor','gill-attatchment',
               'gill-spacing','gill-size','gill-color','stalk-shape','stalk-root','stalk-surface-above-ring',
                'stalk-surface-below-ring', 'stalk-color-above-ring','stalk-color-below-ring','veil-type',
               'veil-color','ring-number','ring-type','spore-print-color','population','habitat']

### load data file with specified column names
df = pd.read_csv("agaricus-lepiota.data", sep=",", header=None, names=column_names)
print(df.head(5)) # print to check if everything looks correct
  poisonous cap-shape cap-surface cap-color bruises odor gill-attatchment  \
0         p         x           s         n       t    p                f   
1         e         x           s         y       t    a                f   
2         e         b           s         w       t    l                f   
3         p         x           y         w       t    p                f   
4         e         x           s         g       f    n                f   

  gill-spacing gill-size gill-color  ... stalk-surface-below-ring  \
0            c         n          k  ...                        s   
1            c         b          k  ...                        s   
2            c         b          n  ...                        s   
3            c         n          n  ...                        s   
4            w         b          k  ...                        s   

  stalk-color-above-ring stalk-color-below-ring veil-type veil-color  \
0                      w                      w         p          w   
1                      w                      w         p          w   
2                      w                      w         p          w   
3                      w                      w         p          w   
4                      w                      w         p          w   

  ring-number ring-type spore-print-color population habitat  
0           o         p                 k          s       u  
1           o         p                 n          n       g  
2           o         p                 n          n       m  
3           o         p                 k          s       u  
4           o         e                 n          a       g  

[5 rows x 23 columns]

Data Cleaning

First we clean the data by checking for any missing or duplicate rows. If there are any I will delete/omit them. Spoiler... There aren't any!

In [5]:
### check for missing data
print(df.isnull().sum())
print()

### check for duplicated rows
dupes = df.duplicated().sum()
print(f'There are {dupes} duplicated rows in this dataset.')
poisonous                   0
cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attatchment            0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-type                   0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
dtype: int64

There are 0 duplicated rows in this dataset.
In [6]:
### check the raw values so we can encode later
for col in df.columns:
    print(f"{col}: {df[col].unique()}")
poisonous: ['p' 'e']
cap-shape: ['x' 'b' 's' 'f' 'k' 'c']
cap-surface: ['s' 'y' 'f' 'g']
cap-color: ['n' 'y' 'w' 'g' 'e' 'p' 'b' 'u' 'c' 'r']
bruises: ['t' 'f']
odor: ['p' 'a' 'l' 'n' 'f' 'c' 'y' 's' 'm']
gill-attatchment: ['f' 'a']
gill-spacing: ['c' 'w']
gill-size: ['n' 'b']
gill-color: ['k' 'n' 'g' 'p' 'w' 'h' 'u' 'e' 'b' 'r' 'y' 'o']
stalk-shape: ['e' 't']
stalk-root: ['e' 'c' 'b' 'r' '?']
stalk-surface-above-ring: ['s' 'f' 'k' 'y']
stalk-surface-below-ring: ['s' 'f' 'y' 'k']
stalk-color-above-ring: ['w' 'g' 'p' 'n' 'b' 'e' 'o' 'c' 'y']
stalk-color-below-ring: ['w' 'p' 'g' 'b' 'n' 'e' 'y' 'o' 'c']
veil-type: ['p']
veil-color: ['w' 'n' 'o' 'y']
ring-number: ['o' 't' 'n']
ring-type: ['p' 'e' 'l' 'f' 'n']
spore-print-color: ['k' 'n' 'u' 'h' 'w' 'r' 'o' 'y' 'b']
population: ['s' 'n' 'a' 'v' 'y' 'c']
habitat: ['u' 'g' 'm' 'd' 'p' 'w' 'l']
In [7]:
### dropping anything that doesn't include colors
df.drop('bruises', axis=1, inplace=True)
df.drop('habitat', axis=1, inplace=True)
df.drop('ring-type', axis=1, inplace=True)
df.drop('odor', axis=1, inplace=True)
df.drop('gill-size', axis=1, inplace=True)
df.drop('population', axis=1, inplace=True)
df.drop('stalk-surface-above-ring', axis=1, inplace=True)
df.drop('stalk-surface-below-ring', axis=1, inplace=True)
df.drop('stalk-root', axis=1, inplace=True)
df.drop('gill-spacing', axis=1, inplace=True)
df.drop('cap-shape', axis=1, inplace=True)
df.drop('ring-number', axis=1, inplace=True)
df.drop('cap-surface', axis=1, inplace=True)
df.drop('gill-attatchment', axis=1, inplace=True)
df.drop('veil-type', axis=1, inplace=True)
df.drop('stalk-shape', axis=1, inplace=True)
df.drop('spore-print-color', axis=1, inplace=True)

### dropping all color features
# df.drop('stalk-color-below-ring', axis=1, inplace=True)
# df.drop('stalk-color-above-ring', axis=1, inplace=True)
# df.drop('gill-color', axis=1, inplace=True)
# df.drop('cap-color', axis=1, inplace=True)
# df.drop('veil-color', axis=1, inplace=True)
In [8]:
for col in df.columns:
    print(f"{col}: {df[col].unique()}")
poisonous: ['p' 'e']
cap-color: ['n' 'y' 'w' 'g' 'e' 'p' 'b' 'u' 'c' 'r']
gill-color: ['k' 'n' 'g' 'p' 'w' 'h' 'u' 'e' 'b' 'r' 'y' 'o']
stalk-color-above-ring: ['w' 'g' 'p' 'n' 'b' 'e' 'o' 'c' 'y']
stalk-color-below-ring: ['w' 'p' 'g' 'b' 'n' 'e' 'y' 'o' 'c']
veil-color: ['w' 'n' 'o' 'y']

Exploratory Data Analysis

Before we go and create our models, I want to get a better understanding of the dataset. We can explore the data a little bit by creating graphs to see if we can uncover any stories or insights.

Countplots

To make it easier to make multiple countplots I created a function as well as a manual color map to easily change the labels (instead of changing the names in the original dataset).

In [9]:
### define a function to easily create count plots for desired variables
def plot_countplot(data, feature, target_count, label_mapping=None):
    plt.figure(figsize=(10, 6))
    sns.countplot(data=df, x=feature, hue=target_count)
    plt.title(f"Distribution of Poisonous vs. Edible by {feature.capitalize()}")
    plt.xlabel(f"{feature.capitalize()}")
    plt.ylabel("Count")
    plt.legend(title="Poisonous")
    
    if label_mapping:
        plt.xticks(ticks=range(len(df[feature].unique())), 
                   labels=[label_mapping[label] for label in df[feature].unique()])
        
    plt.show()
In [10]:
# manual mapping for color features
cap_color_mapping = {
    'n': 'Brown', 'b': 'Buff',  'c': 'Cinnamon',  'g': 'Gray', 'r': 'Green', 
    'p': 'Pink',  'u': 'Purple',  'e': 'Red',  'w': 'White',  'y': 'Yellow'
}

gill_color_mapping = {
    'k': 'Black', 'n': 'Brown', 'b': 'Buff', 'h': 'Chocolate', 
    'g': 'Gray',  'r': 'Green', 'o': 'Orange', 'p': 'Pink', 
    'u': 'Purple','e': 'Red', 'w': 'White', 'y': 'Yellow'
}

stalk_color_above_ring_mapping = {
    'n': 'Brown', 'b': 'Buff', 'c': 'Cinnamon', 
    'g': 'Gray', 'o': 'Orange', 'p': 'Pink', 
    'e': 'Red', 'w': 'White', 'y': 'Yellow'
}

stalk_color_below_ring_mapping = {
    'n': 'Brown', 'b': 'Buff', 'c': 'Cinnamon', 
    'g': 'Gray', 'o': 'Orange', 'p': 'Pink', 
    'e': 'Red', 'w': 'White', 'y': 'Yellow'
}

veil_color_mapping = {
    'n': 'Brown', 'o': 'Orange', 'w': 'White', 'y': 'Yellow'
}

spore_print_color_mapping = {
    'k': 'Black', 'n': 'Brown', 'b': 'Buff', 
    'h': 'Chocolate', 'r': 'Green', 'o': 'Orange', 
    'u': 'Purple', 'w': 'White', 'y': 'Yellow'
}
In [11]:
### countplot by cap-color
plot_countplot(df,'cap-color','poisonous', cap_color_mapping)
In [12]:
### countplot by gill-color
plot_countplot(df,'gill-color','poisonous', gill_color_mapping)
In [13]:
### countplot by stalk-color-above-ring
plot_countplot(df,'stalk-color-above-ring','poisonous', stalk_color_above_ring_mapping)
In [14]:
### count plot by stalk-color-below-ring
plot_countplot(df,'stalk-color-below-ring','poisonous', stalk_color_below_ring_mapping)
In [15]:
### countplot by veil-color
plot_countplot(df,'veil-color','poisonous', veil_color_mapping)

Split Dataset into Train/Test Set

In order to train our models and test their performance, we have to split the dataset accordingly. Since the Naive Bayes Classifier model needs to use OrdinalEncoder for the predictor variables and the other models only need to use OneHotEncoding, we will first split the dataset with an 80/20 split. Afterwards, we will use two separate Train/Test split (with the same data) so we can properly create the models for each of the methods.

Splitting the data

In [16]:
X = df.drop(['poisonous'], axis=1)  #drop poisonous
y = df['poisonous']  #poisonous as the target

# continue with the same steps for train/test split and encoding
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=456)

# check the dimension of the train and test sets
print(X_train.shape)
print(X_test.shape)

print(y_train.shape)
print(y_test.shape)
(6499, 5)
(1625, 5)
(6499,)
(1625,)

Sanity Check

Let's double check to see if X (train and test) only contains the predictor variables and y (train and test) only has the poisonous column. Then we can also check to see if the proportions of poisonous to edible mushrooms are the same in the train/test split and also in the original data set.

Checking training data

In [17]:
X_train.columns
Out[17]:
Index(['cap-color', 'gill-color', 'stalk-color-above-ring',
       'stalk-color-below-ring', 'veil-color'],
      dtype='object')
In [18]:
# check X training data
X_train.head(5)
Out[18]:
cap-color gill-color stalk-color-above-ring stalk-color-below-ring veil-color
2312 n p p p w
6071 e b p w w
7957 n n o o n
8029 g g w w w
3179 g g n p w
In [19]:
# check y training data
y_train.head(5)
Out[19]:
2312    e
6071    p
7957    e
8029    e
3179    p
Name: poisonous, dtype: object

Checking test data

In [20]:
# check x test data
X_test.head(5)
Out[20]:
cap-color gill-color stalk-color-above-ring stalk-color-below-ring veil-color
970 n n w w w
4969 y h b b w
4758 g p p b w
3052 w p w w w
6009 n b p p w
In [21]:
# check y test data
y_test.head(5)
Out[21]:
970     e
4969    p
4758    p
3052    p
6009    p
Name: poisonous, dtype: object

Double Checking Proportions

In [22]:
# proportion of edible:poisonous labels in training set
y_train.value_counts(normalize=True)
Out[22]:
e    0.517156
p    0.482844
Name: poisonous, dtype: float64
In [23]:
# proportion of edible:poisonous labels in test set
y_test.value_counts(normalize=True)
Out[23]:
e    0.521231
p    0.478769
Name: poisonous, dtype: float64
In [24]:
# proportion of original data set before train/test split
df['poisonous'].value_counts(normalize=True)
Out[24]:
e    0.517971
p    0.482029
Name: poisonous, dtype: float64

Since the proportions for the training set and test set are very similar to the original dataset, we can now move on to creating our models!

Creating our models

Before we create our models, we have to make sure that our target variable poisonous is encoded using LabelEncoder.

In [25]:
le = LabelEncoder() # rename for simpler coding

y_train_encoded = le.fit_transform(y_train) # encode target variable for train set

y_test_encoded = le.transform(y_test) # encode target variable for test set

# check to see if target variables for both train/test set are encoded
print(y_train_encoded[:5])
print(y_test_encoded[:5])
[0 1 0 0 1]
[0 1 1 1 1]

Naive Bayes Classifier Model

Now that we've encoded the target variable, we now have to use OrdinalEncoder to encode the the predictor variables

In [26]:
enc = OrdinalEncoder() # rename for simpler coding

X_train_nb = enc.fit_transform(X_train) # encoding predictor variables in train set for Naive Bayes model

X_test_nb = enc.transform(X_test) # encoding predictor variables in test set for Naive Bayes model

Train and Evaluate Naive Bayes Model

We can now train our Naive Bayes model by fitting it to the training data. We'll check out the cross-validation scores on the training data, then evaluate the model's performance on out-of-sample data measuring accuracy and a confusion matrix.

In [27]:
nb_model = CategoricalNB() # create model object
nb_model.fit(X_train_nb, y_train_encoded) # fit model on training data
Out[27]:
CategoricalNB()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Cross-Validation Score (Naive Bayes)

In [28]:
nb_cv_score = cross_val_score(nb_model, X_train_nb, y_train_encoded, cv=10, scoring='accuracy')
print("Cross-Validation Accuracy Scores:", nb_cv_score)
print("Mean Cross-Validation Accuracy:", nb_cv_score.mean())
Cross-Validation Accuracy Scores: [0.87076923 0.86923077 0.86153846 0.86461538 0.85230769 0.88
 0.85538462 0.87384615 0.83692308 0.85362096]
Mean Cross-Validation Accuracy: 0.8618236339931254
In [29]:
# generate predictions
y_pred_nb = nb_model.predict(X_test_nb)
y_pred_nb[:9]
Out[29]:
array([0, 1, 1, 0, 1, 1, 1, 0, 1])

Accuracy Score on out-of-sample data (Naive Bayes)

In [30]:
nb_score = nb_model.score(X_test_nb, y_test_encoded) # get accuracy score
print(f'Naive Bayes Model Accuracy on out-of-sample data: {nb_score}')
Naive Bayes Model Accuracy on out-of-sample data: 0.8701538461538462

Confusion Matrix (Naive Bayes)

In [31]:
nb_cm_test = confusion_matrix(y_pred_nb, y_test_encoded) 
disp = ConfusionMatrixDisplay(confusion_matrix=nb_cm_test)
disp.plot(cmap="Blues")
plt.title("Naive Bayes Performance")
plt.show()

False Negative Rate (Naive Bayes)

In [32]:
nb_fnr = 86 / (86+653)
nb_fnr
Out[32]:
0.11637347767253045

Logistic Regression with Ridge Regularization Model

Since we already split the dataset and also encoded our target variable, we only need to encode the predictor variables using OneHotEncoder for the rest of our models.

In [33]:
# one hot encoding for logistic regression and classification tree
oh = OneHotEncoder(sparse=False)
X_train_oh = oh.fit_transform(X_train)
X_test_oh = oh.transform(X_test)
/usr/local/lib/python3.8/dist-packages/sklearn/preprocessing/_encoders.py:975: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
  warnings.warn(

Train and Evaluate Model

In [34]:
lr_model = LogisticRegression(penalty='l2')  # Ridge is L2 regularization
lr_model.fit(X_train_oh, y_train_encoded) # fit the logistic regression model on training data
Out[34]:
LogisticRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [35]:
# generate predictions
y_pred_lr = lr_model.predict(X_test_oh)
y_pred_lr[:9]
Out[35]:
array([0, 1, 1, 0, 1, 1, 1, 0, 1])

Cross-Validation Score (Ridge Regression)

In [36]:
lr_cv_score = cross_val_score(lr_model, X_train_oh, y_train_encoded, cv=10, scoring='accuracy')
print(f"Cross-Validation Accuracy Scores: {lr_cv_score}")
print(f"Mean Cross-Validation Accuracy: {lr_cv_score.mean()}")
/usr/local/lib/python3.8/dist-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Cross-Validation Accuracy Scores: [0.91538462 0.90461538 0.90923077 0.91538462 0.90153846 0.91384615
 0.90461538 0.92769231 0.90153846 0.9183359 ]
Mean Cross-Validation Accuracy: 0.9112182055232901

Accuracy Score on out-of-sample data (Ridge Regression)

In [37]:
lr_score = lr_model.score(X_test_oh, y_test_encoded)
lr_score
Out[37]:
0.9267692307692308
In [38]:
# evaluate the training accuracy
y_train_pred = lr_model.predict(X_train_oh)
train_accuracy = accuracy_score(y_train_encoded, y_train_pred)

# evaluate the test accuracy
y_test_pred = lr_model.predict(X_test_oh)
test_accuracy = accuracy_score(y_test_encoded, y_test_pred)

print(f"Training Accuracy: {train_accuracy}")
print(f"Test Accuracy: {test_accuracy}")
Training Accuracy: 0.9130635482381905
Test Accuracy: 0.9267692307692308

Confusion Matrix (Ridge Regression)

In [39]:
lr_cm_test = confusion_matrix(y_pred_lr, y_test_encoded) 
disp = ConfusionMatrixDisplay(confusion_matrix=lr_cm_test)
disp.plot(cmap="Blues")
plt.title("Ridge Regression Performance")
plt.show()

False Negative Rate (Ridge Regression)

In [40]:
lr_fnr = 6 / (6+665)
lr_fnr
Out[40]:
0.00894187779433681

Classification Tree Model

Train and Evaluate Model

In [41]:
tree_model = DecisionTreeClassifier()
tree_model.fit(X_train_oh, y_train_encoded)
Out[41]:
DecisionTreeClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In [42]:
# generate predictions
y_pred_tree = tree_model.predict(X_test_oh)
y_pred_tree[:9]
Out[42]:
array([0, 1, 1, 0, 1, 1, 1, 0, 1])

Cross-Validation Score (Classification Tree)

In [43]:
tree_cv_score = cross_val_score(tree_model, X_train_oh, y_train_encoded, cv=10, scoring='accuracy')
print(f"Cross-Validation Accuracy Scores: {tree_cv_score}")
print(f"Mean Cross-Validation Accuracy: {tree_cv_score.mean()}")
Cross-Validation Accuracy Scores: [0.91692308 0.91692308 0.91846154 0.93538462 0.91384615 0.92615385
 0.91076923 0.93076923 0.91230769 0.92141757]
Mean Cross-Validation Accuracy: 0.9202956027023823
In [44]:
tree_train_score = tree_model.score(X_train_oh, y_train_encoded)
tree_train_score
Out[44]:
0.9212186490229266

Accuracy Score on out-of-sample data (Classification Tree)

In [45]:
tree_score = tree_model.score(X_test_oh, y_test_encoded)
tree_score
Out[45]:
0.9298461538461539

Confusion Matrix (Classification Tree)

In [46]:
tree_cm_test = confusion_matrix(y_pred_tree, y_test_encoded) 
disp = ConfusionMatrixDisplay(confusion_matrix=tree_cm_test)
disp.plot(cmap="Blues")
plt.title("Classification Tree Performance")
plt.show()

False Negative Rate (Classification Tree)

In [47]:
tree_fnr = 10 / (10+674)
tree_fnr
Out[47]:
0.014619883040935672

Comparing the Three Models

Now that we created our models have the metrics and scores, let's see how they compare head-to-head! Here we'll compare the mean CV scores for each model with the accuracy of the prediction on the test data. If any of the CV scores are vastly different from the accuracy from the test predictions, it could mean the models are overfitting. We can also check out the accuracy of prediction and false negative rates.

In [48]:
results = {
    'Model': ['Naive Bayes', 'Ridge Regression', 'Classification Tree'],
    'CV Score': [nb_cv_score.mean(), lr_cv_score.mean(), tree_cv_score.mean()],
    'Test Accuracy': [nb_score, lr_score, tree_score],
    'False Negative Rate': [nb_fnr, lr_fnr, tree_fnr]
}

contingency_table = pd.DataFrame(results)
print(contingency_table)
                 Model  CV Score  Test Accuracy  False Negative Rate
0          Naive Bayes  0.861824       0.870154             0.116373
1     Ridge Regression  0.911218       0.926769             0.008942
2  Classification Tree  0.920296       0.929846             0.014620
In [49]:
sc# Create subplots for each confusion matrix
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

disp_nb = ConfusionMatrixDisplay(confusion_matrix=nb_cm_test)
disp_nb.plot(cmap='Blues', ax=axes[0])
axes[0].set_title('Confusion Matrix - Naive Bayes')

disp_lr = ConfusionMatrixDisplay(confusion_matrix=lr_cm_test)
disp_lr.plot(cmap='Blues', ax=axes[1])
axes[1].set_title('Confusion Matrix - Logistic Regression')

disp_tree = ConfusionMatrixDisplay(confusion_matrix=tree_cm_test)
disp_tree.plot(cmap='Blues', ax=axes[2])
axes[2].set_title('Confusion Matrix - Classification Tree')

# Display the plot
plt.tight_layout()
plt.show()