Published online Jun 30, 2025.
https://doi.org/10.61499/dhr.2025.3.e2
Predicting Wet Age-Related Macular Degeneration Recurrence Prediction Using CNN Models With Multimodal Retinal Imaging
Abstract
Background
Wet age-related macular degeneration (wet AMD) is a vision-threatening condition that typically develops with age. The standard treatment involves intravitreal injections, but suboptimal timing of these injections can result in severe outcomes, including irreversible vision loss. While numerous studies have focused on the detection of wet AMD, the prediction of its recurrence remains relatively underexplored. Existing approaches to recurrence prediction primarily rely on a single modality—optical coherence tomography (OCT)—and have achieved only limited prognostic accuracy. Therefore, there is a pressing need to comprehensively investigate recurrence prediction in wet AMD to improve prognostic performance.
Methods
Compared to the existing studies, we collected 3 different types of images from the patients with wet AMD; OCT vertical, OCT horizontal, and fundus. We first trained and evaluated various convolutional neural network-based models using single-modality data which are pre- and post-treatment versions of each type of the collected images. We also evaluated a dual-modality scenario using the 2 optimal performance modalities, which are fundus post-treatment and OCT horizontal pre-treatment. Finally, we examined a multi-modality case which used the entire 6 modalities in our study for wet AMD recurrence prediction. Performance was assessed using area under the receiver operating characteristic curve (AUC) with several classification performance metrics.
Results
Among single-modality approaches, the OCT horizontal pre-treatment model achieved an AUC of 0.617 ± 0.045, while the fundus post-treatment model reached an accuracy of 0.612 ± 0.008. The dual-modality model combining fundus post-treatment and OCT horizontal pre-treatment images attained an AUC of 0.622 ± 0.037, whereas the multi-modality model incorporating all imaging sources yielded an AUC of 0.564 ± 0.026.
Conclusion
Based on our experimental results with 3 different imaging modalities, we found that the fundus-only model achieved the optimal accuracy, and the multimodal combination of OCT and fundus yielded the highest AUC, indicating that considering multimodal imaging data would be critical in wet AMD recurrence prediction.
INTRODUCTION
Wet age-related macular degeneration (AMD) is a leading cause of vision loss in older adults, characterized by the abnormal growth of blood vessels in the subretinal or intraretinal layers, which can ultimately lead to severe vision impairment or blindness.1 The conventional treatment involves intravitreal injections of anti-vascular endothelial growth factor agents, which suppress the growth of new blood vessels. While periodic injections can effectively control the disease, their effects are often temporary. Recurrence occurs in a significant proportion of patients within a few months after treatment cessation, and repeated recurrences can accelerate retinal structural damage and reduce treatment responsiveness. Therefore, regular monitoring and timely administration of injections are essential for preventing recurrence and preserving vision.
Managing patients with wet AMD poses significant challenges, as it is neither feasible for patients to visit the hospital for every assessment nor for ophthalmologists to be continuously involved in monitoring disease progression. Previous studies have compared different treatment regimens, including fixed monthly dosing and pro re nata (as-needed) injections, for AMD management.2 These studies have shown that fixed monthly dosing leads to better treatment outcomes. However, this approach is associated with substantial financial and time burdens due to the high cost of intravitreal injections.
To address these challenges, alternative treatment strategies, such as the treat-and-extend (T&E) protocol, have been proposed.3 The T&E approach involves administering injections at regular intervals initially and gradually extending the treatment intervals based on individual disease progression and treatment response. While promising, this method still presents notable challenges, including variability in individual patient responses and the risk of vision deterioration if intervals are extended excessively. Furthermore, because treatment intervals are adjusted based on individual needs, there is no standardized protocol or universally accepted guideline, leading to variations in treatment decisions depending on the ophthalmologist or medical institution. Importantly, assessing the likelihood of wet AMD recurrence remains a significant challenge, even for experienced ophthalmologists. Therefore, there is a pressing need for an objective and reliable evaluation tool that can accurately identify patients at risk of recurrence, reducing the reliance on subjective clinical judgment.
With recent advancements in artificial intelligence (AI), its applications have expanded across various medical fields, including ophthalmology.4, 5 AI has been increasingly utilized in the assessment of fundus diseases, including wet AMD.6 While substantial research has been conducted on wet AMD, there is a more pressing need for studies focusing on regular monitoring and recurrence prediction rather than merely disease detection.
However, most existing studies have primarily concentrated on diagnosing wet AMD and evaluating disease severity,7, 8 with relatively few addressing the prediction of recurrence. The scarcity of AI-driven studies targeting wet AMD relapse highlights the need for foundational investigations and diverse methodological approaches. Recent advancements in AI and deep learning have demonstrated potential in providing objective assessments of disease progression.8 Nevertheless, existing algorithms rely solely on optical coherence tomography (OCT) imaging and their predictive performance remains insufficient for reliable clinical use.9, 10 Motivated by these shortcomings, it is essential to explore a variety of predictive approaches using additional imaging modalities. Accordingly, we develop a multimodal framework that integrates OCT pre-/post-treatment with fundus images for recurrence prediction.
METHODS
The Institutional Review Board (IRB) of Kim’s Eye Hospital (Seoul, Korea) approved this retrospective study (IRB approval No. 2023-02-004), which was conducted in accordance with the tenets of the Declaration of Helsinki. Because of its retrospective design, the requirement for informed consent was exempted by the Kim’s Eye Hospital IRB. Clinical data were collected at Kim’s Eye Hospital, and the AI models were developed at CHA University School of Medicine (Seongnam, Korea).
Data collection
We enrolled treatment-naïve patients diagnosed with neovascular AMD between January 2013 and June 2021. Each patient received 3 consecutive loading injections of either ranibizumab (0.5 mg/0.05 mL, Lucentis®; Genentech Inc., San Francisco, CA, USA) or aflibercept (2.0 mg/0.05 mL, Eylea®; Regeneron, Tarrytown, NY, USA). Patients were excluded if they met any of the following criteria: (1) residual intraretinal (IRF) or subretinal fluid (SRF) after initial treatment, (2) follow-up duration of less than 12 months after initial treatment. (3) history of vitreoretinal or glaucoma surgery, and (4) poor OCT image quality that could interfere with AI learning. A total of 399 patients (238 males and 161 females; mean age 70.21 ± 8.38 years) satisfied these criteria. After the loading injections, follow-up examinations were scheduled every 1–2 months. Lesion reactivation was defined as the new appearance of IRF, SRF, or macular hemorrhage on OCT, fundus photography, or clinical fundus examination (Table 1).
Table 1
Participant characteristics
Preprocessing
For each subject, medical imaging was acquired immediately before the first and immediately after the third intravitreal injection, corresponding to the pre- and post-treatment stages of a 3-session wet AMD regimen. This protocol yielded 6 images per participant: one fundus photograph and horizontal and vertical OCT scans at each time point. Fundus images measured 940 × 840 pixels, and OCT images measured 1,000 × 650 pixels. All images were uniformly resized to 448 × 448 pixels and standardized by Z-score normalization using the ImageNet mean and standard deviation.11 Recurrence status within 12 months post-treatment served as the binary classification target: 242 of the 399 patients exhibited recurrence, whereas 157 did not. To augment the training set, we applied horizontal flips, adjusted brightness within a range of 0.9 to 1.1, and performed random crops retaining 95–100% of the original area, followed by resizing back to 448 × 448 pixels. Model robustness and generalizability were assessed via 5-fold cross-validation.
Convolutional neural network (CNN)-based architecture for wet AMD recurrence prediction
Fig. 1A presents the single-modality approach, which employs one of the 6 image types to predict wet AMD recurrence. In the dual-modality scheme (Fig. 1B), the 2 highest-performing modalities—fundus post-treatment, OCT horizontal pre-treatment—are combined by concatenating their feature maps to enhance predictive performance.12, 13 The multi-modality scheme (Fig. 1C) unites all 6 imaging inputs to exploit the comprehensive visual data. For each configuration, Inception-v3,14 EfficientNet-b0,15 and EfficientNet-v216 were used as the backbone CNN architectures.
Fig. 1
Overview of the model architectures. (A) Single-modality architecture employing a single image for prediction of recurrence. (B) Dual-modality architecture integrating feature maps from the 2 highest-performing modalities. (C) Multi-modality architecture concatenating feature maps from all 6 images (Fundus pre-/post-treatment and optical coherence tomography horizontal/vertical pre-/post-treatment) for enhanced classification performance.
CNN = convolutional neural network.
Experimental setup
The training procedure was configured to maximize AUC, and supplementary thresholds for sensitivity and specificity were determined via the Youden Index.17 We employed 5-fold cross-validation by partitioning the dataset into 5 equally sized folds—using 4 folds for training and one for validation in each iteration—and evaluated model performance using AUC. All models were implemented in PyTorch (Meta AI, New York, NY, USA) on 64-bit systems. Experiments were performed on 2 server configurations: the first featured an NVIDIA Quadro RTX 8000 GPU (NVIDIA, Santa Clara, CA, USA) paired with an Intel® Xeon® Gold 6226R CPU (2.90 GHz, 16 cores; Intel, Santa Clara, CA, USA), and the second utilized an NVIDIA GeForce RTX 4090 GPU (NVIDIA) alongside an Intel® Xeon® Silver 4309Y CPU (2.80 GHz, 8 cores; Intel).
RESULTS
Several convolutional neural network architectures were employed in PyTorch, including Inception-v3, EfficientNet-b0, EfficientNet-v2. Models were selected according to 5-fold cross-validated AUC. EfficientNet-v2 was adopted for both the fundus pre-/post-treatment and OCT vertical pre-/post-treatment datasets, Inception-v3 for the OCT horizontal pre-treatment dataset, and EfficientNet-b0 for the OCT horizontal post-treatment dataset. As shown in Table 2, all models achieved AUCs exceeding 0.56. Among single-modality inputs, the OCT horizontal pre-treatment configuration attained the highest AUC (0.617 ± 0.045). The dual-modality model, combining fundus post-treatment and OCT horizontal pre-treatment images, yielded an AUC of 0.622 ± 0.037, while the multi-modality model integrating all 6 image types achieved an AUC of 0.564 ± 0.026, indicating no appreciable advantage over the single-modality approaches.
Table 2
Performance comparison of wet-age-related macular degeneration recurrence prediction across single-, dual-, and multi-modality inputs
DISCUSSION
This study systematically evaluates the performance of various CNN models in predicting the recurrence of wet AMD using multiple ophthalmic imaging modalities.
The findings suggest that single image modalities generally achieve superior predictive accuracy compared to multi-modality approach. Among the single image datasets, OCT horizontal pre-treatment exhibited the highest AUC at 0.617. This observation raises the possibility that OCT horizontal pre-treatment images may capture structural changes relevant to recurrence prediction. In contrast, the multi-modality approach, which integrates all 6 imaging modalities, failed to demonstrate a substantial improvement, achieving an AUC of 0.564. These findings suggest that integrating multiple imaging modalities does not necessarily improve predictive performance, likely due to the increased number of learnable parameters. With the same amount of training data, the larger parameter space in multi-modality models may hinder effective learning and increase the risk of overfitting. In contrast, single-image modality models, with fewer parameters, may better capture relevant structural changes while maintaining generalizability. Therefore, optimizing wet AMD recurrence prediction may be more effective by selecting the most informative single-image modality rather than indiscriminately combining multiple modalities.
Research on predicting wet AMD recurrence is still limited, and both of the 2 prior studies relied solely on OCT imaging. The first study9 achieved an accuracy of 0.602 by using 4 OCT scans—one before treatment and 3 immediately after intravitreal injections. By contrast, our work combines multiple imaging modalities (vertical and horizontal OCT scans, plus fundus photographs) to forecast recurrence. We observed that fundus post-treatment images alone yielded an accuracy of 0.612, surpassing the 0.602 benchmark. Furthermore, our dual-modality model—integrating fundus and OCT data—attained the highest AUC of 0.622, suggesting that fundus imaging could be particularly informative for prediction. The second study10 employed a segmentation approach to define regions of interest and reported an AUC of 0.725, and incorporating such segmentation techniques into our multimodal framework represents a promising direction for future research.
Recent advances in multimodal medical research have increasingly moved beyond image-only models to incorporate clinical tabular data, reflecting a broader trend toward richer and more personalized patient representations. For example, one study employed a hypernetwork that dynamically modulated the weights of a CNN processing brain magnetic resonance imaging (MRI) scans, conditioned on clinical, demographic, and genetic tabular inputs, thereby enabling individualized feature learning for each patient.18 Another study generated separate embeddings for tabular features and integrated them with MRI images in a contrastive learning framework, optimizing the model to align representations from the same patient while distinguishing between different individuals.19 Inspired by these approaches, future extensions of our wet AMD recurrence prediction model could explore the integration of clinical tabular data alongside imaging modalities. Such multimodal fusion may provide complementary insights and ultimately enhance the predictive accuracy and clinical utility of our model.
While our current model shows promising results, its predictive performance is not yet sufficient for clinical deployment. To address this, future work should explore advanced modeling techniques—such as ensemble learning,20 vision transformer architectures,21 and multimodal large language models22—to enhance both accuracy and robustness. In parallel, integrating uncertainty-aware training approaches,23 which defer classification in ambiguous cases and provide predictions only when confidence is high, may help reduce the risk of harmful false positives and negatives. These directions represent critical next steps toward building clinically viable systems for wet AMD recurrence prediction and ensuring greater safety in real-world applications.
Funding:This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF2023R1A2C2003577).
Disclosure:All authors have no potential conflicts of interest.
Data Availability Statement:The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Author Contributions:
Conceptualization: Kim JH, Cho BH.
Investigation: Yoon WT.
Methodology: Kim HG, Ngo D.
Project administration: Kim JH, Cho BH.
Software: Kim HG, Ngo D.
Supervision: Moon J, Kim JH, Cho BH.
Writing - original draft: Yoon WT, Kim HG, Moon J.
Writing - review & editing: Moon J, Kim JH, Cho BH.
References
-
El-Den NN, Elsharkawy M, Saleh I, Ghazal M, Khalil A, Haq MZ, et al. AI-based methods for detecting and classifying age-related macular degeneration: a comprehensive review. Artif Intell Rev 2024;57:237.
-
-
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM 2017;60(6):84–90.
-
-
Kim HG, Song S, Cho BH, Jang DP. Deep learning-based stress detection for daily life use using single-channel EEG and GSR in a virtual reality interview paradigm. PLoS One 2024;19(7):38959272
-
-
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision; 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 June 27–30; Las Vegas, NV, USA. Piscataway, NJ, USA: Institute of Electrical and Electronics Engineers (IEEE); 2016. pp. 2818-2826.
-
-
Tan M, Le QV. Efficientnet: rethinking model scaling for convolutional neural networks. in International conference on machine learning. arXiv. 2020 Sep 11; [doi: 10.48550/arXiv.1905.11946]https://doi.org/10.48550/arXiv.1905.11946 .
-
-
Tan M, Le QV. Efficientnetv2: smaller models and faster training. in International conference on machine learning. arXiv. 2021 Jun 23; [doi: 10.48550/arXiv.2104.00298]
-
-
Hager P, Menten MJ, Rueckert D. Best of both worlds: Multimodal contrastive learning with tabular and imaging data; 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023 June 17–24; Vancouver, Canada. Piscataway, NJ, USA: Institute of Electrical and Electronics Engineers (IEEE); 2023. pp. 23924-23935.
-
-
Moradi M, Huan T, Chen Y, Du X, Seddon J. Ensemble learning for AMD prediction using retina OCT scans. Invest Ophthalmol Vis Sci 2022;63(7):732–F0460.
-
-
Ding Y, Liu J, Xu X, Huang M, Zhuang J, Xiong J, et al. Uncertainty-aware training of neural networks for selective medical image segmentation; 2020 International Conference on Medical Imaging With Deep Learning; 2020 July 6–9; Montreal, Canada. Cambridge: Proceedings of Machine Learning Research (PMLR); 2020. pp. 1-17.
-
Publication Types
MeSH Terms
Figures
Tables
Funding Information
-
National Research Foundation of Korea
NRF2023R1A2C2003577


