EquiHER: AI for Gender-Equal Healthcare

Overview

EquiHER is an AI-powered diagnostic risk flagging system designed to support fairer medical decision-making.

Women are often diagnosed later or incorrectly because symptoms can present differently and many medical datasets historically contain male-dominant patient data.

EquiHER is designed as a clinical support tool, highlighting cases where additional diagnostic attention may be beneficial.


Inspiration

Women frequently experience delayed or incorrect diagnoses for conditions such as:

  • cardiovascular disease
  • autoimmune disorders
  • neurological disorders

One contributing factor is that many medical datasets historically focused on male populations.

This project explores how machine learning can help identify cases where diagnostic oversight risk may be higher.


What It Does

EquiHER analyzes 15 clinical variables and predicts whether a patient may be at higher risk of diagnostic oversight.

If the predicted probability exceeds a threshold:

$$ P(\text{risk}) > \tau $$

the system flags the case for additional clinical review.

The current model achieves approximately 84–87% validation accuracy.


Model Architecture

The neural network follows a fully connected architecture:

$$ 15 \rightarrow 128 \rightarrow 64 \rightarrow 32 \rightarrow 1 $$

Each layer transformation follows:

$$ h^{(l)} = \sigma(W^{(l)}h^{(l-1)} + b^{(l)}) $$

Where:

  • \(h^{(l)}\) represents the hidden layer activation
  • \(W^{(l)}\) represents the weight matrix
  • \(b^{(l)}\) represents the bias vector
  • \(\sigma\) represents the activation function (ReLU)

Prediction Function

The final output is converted into a probability using the sigmoid function:

$$ \hat{y} = \frac{1}{1 + e^{-z}} $$

where

$$ z = Wh + b $$

This produces a probability between 0 and 1.


Training Objective

The model is trained using binary cross-entropy loss:

$$ L = -\frac{1}{n} \sum_{i=1}^{n} \left[y_i \log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)\right] $$

Where:

  • \(y_i\) is the true label
  • \(\hat{y}_i\) is the predicted probability
  • \(n\) is the number of samples

Optimization

Model parameters are updated using gradient descent:

$$ \theta_{t+1} = \theta_t - \eta \nabla L(\theta) $$

Where:

  • \(\theta\) represents model parameters
  • \(\eta\) represents the learning rate
  • \(\nabla L(\theta)\) represents the gradient of the loss function

Data Preparation

Because real medical datasets are difficult to access due to privacy restrictions, a synthetic dataset was generated with realistic clinical ranges and correlations.

Feature normalization was applied using:

$$ x' = \frac{x - \mu}{\sigma} $$

Where:

  • \(\mu\) represents the feature mean
  • \(\sigma\) represents the standard deviation

Challenges

Data Availability

Medical datasets are highly restricted due to patient privacy regulations, making synthetic data generation necessary.

Overfitting

Preventing overfitting required tuning parameters such as:

  • learning rate
  • batch size
  • model depth

Clinical Usability

The goal was to design a system that supports clinicians rather than replacing them.


Impact

EquiHER contributes toward the following United Nations Sustainable Development Goals.

SDG 3.8 — Universal Health Coverage

Improving healthcare quality through AI-assisted clinical decision support.

SDG 5 — Gender Equality

Reducing gender bias in medical decision-making.


Example Prediction Workflow

  1. Patient clinical values are entered
  2. Data is normalized
  3. The neural network computes:

$$ \hat{y} = f(x_1, x_2, ..., x_{15}) $$

  1. A diagnostic risk score is generated

If

$$ \hat{y} > 0.7 $$

the case is flagged for additional clinical review.

Built With

Share this project:

Updates