Slide View
Opening demo script
HachForge About
Scenario 1 and Scenario 2
Scenario 3 and Scenario 4
Results for Deep Leaning Model

HackForge

Transfer learning that cuts AI waste: lower carbon, less compute, safer deployment.

HackForge is a from-scratch PyTorch framework for evaluating how transfer learning affects performance, carbon emissions, parameter efficiency, and deployment feasibility across both classical ML and deep CNNs.

🚀 Inspiration

AI is powerful, but it is also expensive to train, energy-intensive, and often inaccessible to teams without large compute budgets.

In many real-world settings, practitioners do not have unlimited GPU access, massive labeled datasets, or time to retrain models from scratch. At the same time, transfer learning is often treated as automatically beneficial, even though it can help, hurt, or simply save compute without improving performance.

We built HackForge to answer a practical question:

Can transfer learning make AI not just better, but greener?

Instead of making generic efficiency claims, we wanted to measure exactly:

What we care about	Why it matters
Performance	Transfer learning should improve or preserve quality
Carbon	Lower emissions make AI more sustainable
Parameters	Fewer trainable parameters mean lower compute cost
Safety	Harmful transfer should be detected before wasting compute

✨ What it does

HackForge is a transfer-learning sustainability benchmarking framework.

Core capabilities

Area	What HackForge supports
Classical ML	Scratch baselines, regularized transfer, Bayesian transfer, domain-shift analysis, negative transfer safety gate
Deep Learning	ResNet50, EfficientNetB0, MobileNetV2; scratch, frozen backbone, fine-tuning, progressive unfreezing
Benchmarking	Low-data sweeps at 100%, 50%, 25%, and 10%
Reporting	CO2, runtime, trainable vs frozen parameters, official model size, edge feasibility
Metrics	Sensitivity, specificity, F1, ROC-AUC, confusion matrix

🧠 Project overview

Scenario	Category	What we tested	Key takeaway
Housing Affordability	Classical ML	Regression under geographic shift	Transfer reduced compute and emissions while matching or improving performance
Health Screening	Classical ML	Classification under tumor-size shift	Bayesian transfer reduced compute while staying competitive
Negative Transfer Safety	Classical ML	Harmful transfer detection	Prevented wasted compute on a severely degraded transfer setup
Synthetic Histopathology	Deep Learning	CNN transfer, low-data behavior, carbon, deployment	Validated the benchmarking and carbon-tracking pipeline

🌱 Sustainability impact

HackForge is built to make efficiency measurable, not anecdotal.

Classical ML results

Task	Baseline	Transfer result	Carbon impact
Housing Affordability	Scratch: R² = 0.56	Bayesian transfer: R² = 0.59	CO2 dropped from 7.35e-06 kg to 4.78e-09 kg (~99.9% reduction)
Health Screening	Scratch: 93.52% accuracy	Bayesian transfer: 91.55% accuracy	CO2 dropped from 2.32e-06 kg to 1.22e-06 kg (~47% reduction)
Negative Transfer Safety	Naive transfer failed badly	Safety gate detected it and safe transfer recovered performance	Avoided wasteful compute on harmful transfer

Deep learning results

The deep learning portion is currently a synthetic proof-of-concept inspired by breast cancer histopathology classification.

It is used to validate:

Pipeline element	Why it matters
CNN benchmarking	Compare transfer strategies fairly
NVML carbon tracking	Measure real GPU energy on CUDA
Parameter accounting	Show what is actually being trained
Low-data evaluation	Test transfer behavior when labels are scarce
Deployment analysis	Check whether models are edge-feasible

Current results show:

Finding	Current outcome
CO2 reduction	Frozen backbones reduced CO2 by 52–76%
Trainable parameter reduction	Frozen backbones reduced trainable parameters by 87–98%
Low-data result	On the synthetic task, scratch outperformed frozen transfer in all tested low-data regimes
Interpretation	The synthetic signal appears too simple to benefit from pretrained texture features
Next step	Apply the same pipeline to BreaKHis and PatchCamelyon

📐 Carbon measurement

For CUDA/NVIDIA experiments, HackForge uses:

NVML Energy API on Tesla T4

This gives hardware-level GPU energy measurement, rather than relying only on rough timing estimates.

For portable settings, the framework also supports time-based estimation:

$$ CO_2 = P \times t \times PUE \times CI $$

Symbol	Meaning
$P$	Power draw
$t$	Training time
$PUE$	Power usage effectiveness
$CI$	Grid carbon intensity

🏗️ How we built it

HackForge was built as a from-scratch PyTorch framework focused on transparency, control, and reproducibility.

Engineering overview

Component	Implementation
Training	Custom PyTorch loops for scratch, frozen transfer, fine-tuning, and progressive unfreezing
Evaluation	Multi-seed experiments, low-data sweeps, metric aggregation
Sustainability	NVML-based energy measurement, parameter accounting, runtime tracking
Analysis	Shift metrics, transfer safety checks, deployment feasibility
Reliability	98 unit tests, 8 demo scripts, seeded experiments, JSON export

Model support

Deep learning	Classical ML
ResNet50	Scratch baselines
EfficientNetB0	Bayesian transfer
MobileNetV2	Regularized transfer
TorchVision pretrained backbones	Domain-shift metrics
Transfer strategy benchmarking	Negative transfer safety gate

🧪 Deep learning proof-of-concept disclaimer

Important: The CNN section is a synthetic proof-of-concept, not a clinical benchmark.

What it is	What it is not
Synthetic images mimicking histopathology-style structure	Not a medical claim
Controlled source vs target domain shift	Not a diagnostic tool
A benchmarking pipeline for carbon, transfer, and low-data behavior	Not a patient-level clinical evaluation

Real next step

We plan to run the exact same pipeline on:

BreaKHis
PatchCamelyon

with patient-aware splits and real deployment-oriented evaluation.

😓 Challenges we ran into

Challenge	What we learned
Transfer learning is not always better	On our synthetic CNN task, scratch outperformed frozen transfer, which forced us to make the project more rigorous and more honest
Avoiding overclaiming	We intentionally framed the CNN section as a proof-of-concept instead of presenting it as clinical AI
Parameter accounting	Official model size, experimental params, trainable params, and frozen params are all different and needed to be reported clearly
Carbon tracking across hardware	Supporting both NVML and time-based estimation added complexity but made the framework more portable
Reporting sustainability	Carbon, time, and parameter counts had to be treated as first-class outputs, not side notes

🏆 Accomplishments that we're proud of

HackForge is more than a model demo — it is a measurement and decision framework.

Highlight	Why we’re proud of it
Unified sustainability benchmarking	One framework across classical ML and deep learning
~99.9% CO2 reduction in one classical transfer setting	Shows how powerful efficient transfer can be
60.6% aggregate CO2 reduction in the current benchmark run	Demonstrates measurable sustainability impact
Negative transfer safety gate	Prevents wasteful compute before it happens
NVML integration	Adds hardware-level GPU energy tracking
3 CNNs × 4 strategies × 4 regimes	Broad benchmarking instead of cherry-picked results
98 tests + 8 demo scripts	Stronger reproducibility and engineering quality

What matters most to us

HackForge is intentionally honest:

real classical ML results are presented as real
deep learning is clearly marked as a synthetic proof-of-concept
unsupported claims are intentionally avoided

📚 What we learned

We learned that sustainable AI is not just about smaller models.

It is about:

Principle	Meaning
Measure energy	Efficiency should be observable, not assumed
Reuse useful representations	Transfer can reduce waste when it genuinely helps
Avoid harmful transfer	Some transfer setups cost compute without improving performance
Benchmark low-data behavior	Label scarcity is one of the most practical real-world constraints
Design for deployment	A model that cannot be deployed efficiently is harder to justify

Trust matters more than hype.
The strongest projects are the ones where the evidence is clear.

🔭 What's next for HackForge

The next major step is turning the CNN proof-of-concept into a real benchmark.

Roadmap

Next milestone	Goal
Run on BreaKHis	Evaluate transfer on real histopathology structure
Run on PatchCamelyon	Test the pipeline on a larger real benchmark
Use patient-aware splits	Make the evaluation clinically credible
Compare real pretrained transfer vs scratch	Validate whether real texture/shape structure produces the expected transfer gains
Improve reporting and visualizations	Make results easier to understand and present
Expand edge deployment analysis	Test feasibility on practical hospital hardware

Long-term vision

We want HackForge to help teams answer a practical question:

Is this transfer-learning choice actually saving compute, carbon, and cost — and is it worth deploying?

💡 Why this matters

Transfer learning should not only improve model performance.

It should also help reduce:

Waste	Benefit
Carbon emissions	Greener AI systems
Compute waste	Lower training cost
Retraining overhead	Faster iteration
Infrastructure demands	More realistic deployment in low-resource settings