Dynamic joint resource allocation in maritime wireless communication networks: a meta-reinforcement learning approach based on knowledge embedding

Mao, Zhongyang; Zhang, Zhilin; Lu, Faping; Liu, Xiguo; Xu, Zhichao; Pan, Yaozong; Kang, Jiafang; You, Yang

doi:10.1631/FITEE.2500007

Dynamic joint resource allocation in maritime wireless communication networks: a meta-reinforcement learning approach based on knowledge embedding

Research Article
Published: 23 December 2025

Volume 26, pages 2672–2687, (2025)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope

Zhongyang Mao ORCID: orcid.org/0000-0001-6279-1627^1,2^na1,
Zhilin Zhang ORCID: orcid.org/0009-0006-1442-3735^1,2^na1,
Faping Lu^1,2,
Xiguo Liu^1,2,
Zhichao Xu^1,2,
Yaozong Pan^1,2,
Jiafang Kang^1,2 &
…
Yang You³

120 Accesses
Explore all metrics

Abstract

As human exploration of the ocean expands, the demand for continuous, high-quality, and ubiquitous maritime communication is steadily increasing. However, the dynamic nature of the marine environment and resource constraints present significant challenges for traditional heuristic resource allocation methods, complicating the balance between high-quality communication and limited network resources. This results in suboptimal system throughput and an over-reliance on specific problem structures. To address these issues, in this paper, we introduce a joint resource allocation method based on knowledge embedding. The proposed approach includes an action distribution alignment module designed to improve resource utilization by preventing unreasonable action-output combinations. Furthermore, by integrating knowledge embedding with meta-reinforcement learning techniques, a physical guidance loss function is formulated, which effectively reduces the sample size required for model training, thereby enhancing the algorithm’s generalization capabilities. Simulation results show that the proposed method achieves an increase in average system throughput of 31.19% compared to the model-agnostic meta-learning proximal policy optimization (MAML-PPO) algorithm and 80.91% compared to the RL² algorithm, across various channel environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Dynamic allocation and optimization strategy of communication network resources driven by reinforcement learning

Article Open access 06 January 2026

Demand Based Dynamic Bandwidth Allocation in Multibeam Satellites Using Machine Learning

Article 01 May 2024

Research on 3D Observation Path Planning Method for Mobile Platforms Based on Near-End Strategy Optimization

Data availability

Data are not available due to legal restrictions. Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data are not available.

References

Bekkadal F, 2010. Innovative maritime communications technologies. Proc 18^th Int Conf on Microwaves, Radar and Wireless Communications, p.1–6.
Google Scholar
Bossy B, Kryszkiewicz P, Bogucka H, 2022. Energy-efficient OFDM radio resource allocation optimization with computational awareness: a survey. IEEE Access, 10:94100–94132. https://doi.org/10.1109/ACCESS.2022.3203575
Article Google Scholar
Chen SY, Rui LL, Gao ZP, et al., 2022. Cache-assisted collaborative task offloading and resource allocation strategy: a metareinforcement learning approach. IEEE Internet Things J, 9(20):19823–19842. https://doi.org/10.1109/JIOT.2022.3168885
Article Google Scholar
Dhuheir M, Erbad A, Al-Fuqaha A, et al., 2024. Meta reinforcement learning for UAV-assisted energy harvesting IoT devices in disaster-affected areas. IEEE Open J Commun Soc, 5:2145–2163. https://doi.org/10.1109/OJCOMS.2024.3377706
Article Google Scholar
Duan Y, Schulman J, Chen X, et al., 2016. RL²: fast reinforcement learning via slow reinforcement learning. https://doi.org/10.48550/arXiv.1611.02779
Google Scholar
Fallah A, Mokhtari A, Ozdaglar A, 2020. On the convergence theory of gradient-based model-agnostic meta-learning algorithms. Proc 23^rd Int Conf on Artificial Intelligence and Statistics, p.1082–1092.
Google Scholar
Ferreira GO, Zanella AF, Bakirtzis S, et al., 2024. A joint optimization approach for power-efficient heterogeneous OFDMA radio access networks. IEEE J Select Areas Commun, 42(11):3232–3245. https://doi.org/10.1109/JSAC.2024.3431524
Article Google Scholar
Finn C, Abbeel P, Levine S, 2017. Model-agnostic meta-learning for fast adaptation of deep networks. Proc 34^th Int Conf on Machine Learning, p. 1126–1135.
Google Scholar
Gautam S, Lagunas E, Chatzinotas S, et al., 2019. Relay selection and resource allocation for SWIPT in multi-user OFDMA systems. IEEE Trans Wirel Commun, 18(5):2493–2508. https://doi.org/10.1109/TWC.2019.2904273
Article Google Scholar
Han J, Lee GH, Park S, et al., 2022. Joint subcarrier and transmission power allocation in OFDMA-based WPT system for mobile-edge computing in IoT environment. IEEE Internet Things J, 9(16):15039–15052. https://doi.org/10.1109/JIOT.2021.3103768
Article Google Scholar
Hou QS, Lee M, Yu GD, et al., 2023. Meta-gating framework for fast and continuous resource optimization in dynamic wireless environments. IEEE Trans Commun, 71(9):5259–5273. https://doi.org/10.1109/TCOMM.2023.3292257
Article Google Scholar
Hu SY, Yuan X, Ni W, et al., 2024. OFDMA-F²L: federated learning with flexible aggregation over an OFDMA air interface. IEEE Trans Wirel Commun, 23(7):6793–6807. https://doi.org/10.1109/TWC.2023.3334691
Article Google Scholar
ITU, 2016. Recommendation ITU-R p.372–13. https://www.itu.int/rec/R-REC-p.372-13-201609-S
Google Scholar
Jang D, Spangher L, Khattar M, et al., 2021. Using meta reinforcement learning to bridge the gap between simulation and experiment in energy demand response. Proc 12^th ACM Int Conf on Future Energy Systems, p.483–487. https://doi.org/10.1145/3447555.3466589
Google Scholar
Jha S, Ahmad S, Abdeljaber HAM, et al., 2024. Enabling resilient wireless networks: OFDMA-based algorithm for enhanced survivability and privacy in 6G IoT environments. IEEE Trans Consum Electr, 70(1):3810–3819. https://doi.org/10.1109/TCE.2024.3370414
Article Google Scholar
Jin ZW, Ma ML, Wang Z, et al., 2025a. Optimal transmission schedule with privacy preservation for cyber-physical system against eavesdropping attack. IEEE Signal Process Lett, 32:436–440. https://doi.org/10.1109/LSP.2024.3514793
Article Google Scholar
Jin ZW, Xu CH, Wang Z, et al., 2025b. Towards robust differential privacy in adaptive federated learning architectures. IEEE Trans Consum Electr, 71(2):4087–4099. https://doi.org/10.1109/TCE.2024.3525084
Article Google Scholar
Kim Y, Choi Y, Yang HJ, 2023. Spectrum sensing for underwater cognitive radio with limited sensing time. IEEE Commun Lett, 27(8):2014–2018. https://doi.org/10.1109/LCOMM.2023.3291079
Article Google Scholar
Le NT, Tran LN, Vu QD, et al., 2019. Energy-efficient resource allocation for OFDMA heterogeneous networks. IEEE Trans Commun, 67(10):7043–7057. https://doi.org/10.1109/TCOMM.2019.2936813
Article Google Scholar
Letchford AN, Ni Q, Zhong ZY, 2020. A heuristic for fair dynamic resource allocation in overloaded OFDMA systems. J Heuristics, 26(1):21–32. https://doi.org/10.1007/s10732-019-09422-z
Article Google Scholar
Li SC, Zhang N, Chen HB, et al., 2022. Joint subcarrier allocation, modulation mode selection, and trajectory design in a UAV-based OFDMA network. IEEE Commun Lett, 26(9):2111–2115. https://doi.org/10.1109/LCOMM.2022.3182016
Article Google Scholar
Liu L, Cai L, Ma L, et al., 2021. Channel state information prediction for adaptive underwater acoustic downlink OFDMA system: deep neural networks based approach. IEEE Trans Veh Technol, 70(9):9063–9076. https://doi.org/10.1109/TVT.2021.3099797
Article Google Scholar
Mao ZY, Zhang ZL, Lu FP, et al., 2024. Sea-based UAV network resource allocation method based on an attention mechanism. Electronics, 13(18):3686. https://doi.org/10.3390/electronics13183686
Article Google Scholar
Meister G, Knuble JJ, Gliese U, et al., 2024. The ocean color instrument (OCI) on the plankton, aerosol, cloud, ocean ecosystem (PACE) mission: system design and prelaunch radiometric performance. IEEE Trans Geosci Remote Sensing, 62:5517418. https://doi.org/10.1109/TGRS.2024.3383812
Article Google Scholar
Ning JH, Wang JL, Feng P, et al., 2023. A distributed framework for the ocean IoT network. Proc 34^th Annual Int Symp on Personal, Indoor and Mobile Radio Communications, p.1–6. https://doi.org/10.1109/PIMRC56721.2023.10294049
Google Scholar
Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347
Google Scholar
Shi XH, Zhang S, Liu MZ, et al., 2025. Mystique: user-level adaptation for real-time video analytics in edge networks via meta-RL. IEEE Trans Mob Comput, 24(5):3615–3632. https://doi.org/10.1109/TMC.2024.3514088
Article Google Scholar
Su YS, Liu X, Han GY, et al., 2021. A traffic load-aware OFDMA-based MAC protocol for distributed underwater acoustic sensor networks. IEEE Trans Veh Technol, 70(10): 10501–10513. https://doi.org/10.1109/TVT.2021.3109070
Article Google Scholar
Sun GX, Wang XM, Jiang R, et al., 2022. Beamforming and resource allocation in multi-cell OFDMA systems based on deep transfer reinforcement learning. Proc 95^th Vehicular Technology Conf, p.1–6. https://doi.org/10.1109/VTC2022-Spring54318.2022.9860615
Google Scholar
Švedek V, Kurdija AS, Ilic Ž, 2023. Static and mobile relay selection with chunk-based subcarrier allocation in uplink OFDMA networks. Proc Int Symp on ELMAR, p.137–140.
Google Scholar
Tan QY, He JJ, Gao YY, 2024. Deep reinforcement learning based OFDMA scheduling for WiFi networks with coexisting latency-sensitive and high-throughput services. Proc 5^th Information Communication Technologies Conf, p.146–150. https://doi.org/10.1109/ICTC61510.2024.10601889
Google Scholar
Tefera MK, Zhang SB, Jin ZW, 2023. Deep reinforcement learning-assisted optimization for resource allocation in downlink OFDMA cooperative systems. Entropy, 25(3): 413. https://doi.org/10.3390/e25030413
Article MathSciNet Google Scholar
Tseng SM, Wang PH, Hsu YT, 2023. Modified loss function considering outage capacity for deep learning-based OFDMA NOMA video transmission resource management. Proc 8^th Int Conf on Multimedia Communication Technologies, p.7–11. https://doi.org/10.1109/ICMCT60483.2023.00009
Google Scholar
Wang J, Zhou HF, Li Y, et al., 2018. Wireless channel models for maritime communications. IEEE Access, 6:68070–68088. https://doi.org/10.1109/ACCESS.2018.2879902
Article Google Scholar
Wang LY, Guo J, Zhu JQ, et al., 2024. Cross-layer wireless resource allocation method based on environment-awareness in high-speed mobile networks. Electronics, 13(3):499. https://doi.org/10.3390/electronics13030499
Article Google Scholar
Wang T, You CC, 2024. Adaptive uplink scheduling and UAV association in UAV-assisted OFDMA cellular networks: a game-theoretical approach. IEEE Access, 12:63504–63514. https://doi.org/10.1109/ACCESS.2024.3396152
Article Google Scholar
Wang T, You CC, He Z, et al., 2023. Distributed subcarrier assignment and discrete power allocation for multi-UAV millimeter-wave cooperative OFDMA networks with heterogeneous QoS consideration. IEEE Access, 11:123132–123148. https://doi.org/10.1109/ACCESS.2023.3328214
Article Google Scholar
Wang XH, Su YS, Yang SD, et al., 2024. An OFDMA downlink acoustic communication scheme for AUV-based mobile underwater sensor network. IEEE Sens J, 24(7): 11527–11536. https://doi.org/10.1109/JSEN.2024.3361152
Article Google Scholar
Wang XM, Sun GX, Xin YX, et al., 2022. Deep transfer reinforcement learning for beamforming and resource allocation in multi-cell MISO-OFDMA systems. IEEE Trans Signal Inform Process Netw, 8:815–829. https://doi.org/10.1109/TSIPN.2022.3208432
MathSciNet Google Scholar
Xia TT, Wang MM, Zhang JJ, et al., 2020. Maritime Internet of Things: challenges and solutions. IEEE Wirel Commun, 27(2):188–196. https://doi.org/10.1109/MWC.001.1900322
Article Google Scholar
Yan RW, Li Q, Xiong HG, 2024. Adaptive channel division and subchannel allocation for orthogonal frequency division multiple access-based airborne power line communication networks. Sensors, 24(23):7644. https://doi.org/10.3390/s24237644
Article Google Scholar
Yang LW, Jia BY, Wang F, et al., 2022. Energy efficiency optimization of heterogeneous network resources based on OFDMA. Proc 20^th Int Conf on Optical Communications and Networks, p.1–3. https://doi.org/10.1109/ICOCN55511.2022.9900961
Google Scholar
Yang SD, Su YS, Wang XH, et al., 2024. Resource allocation for cognitive underwater acoustic downlink OFDMA system with a practical spectrum sensing scheme. IEEE Internet Things J, 11(5):8731–8745. https://doi.org/10.1109/JIOT.2023.3320391
Article Google Scholar
Yin H, Huang YH, Han LC, et al., 2023. Thoughts on 6G integrated communication, sensing and computing networks. Sci Sin Inform, 53(9):1838–1842 (in Chinese). https://doi.org/10.1360/SSI-2023-0135
Article Google Scholar
Yuan X, Hu SY, Ni W, et al., 2023. Joint user, channel, modulation-coding selection, and RIS configuration for jamming resistance in multiuser OFDMA systems. IEEE Trans Commun, 71(3):1631–1645. https://doi.org/10.1109/TCOMM.2023.3238062
Article Google Scholar
Zhang L, Han SQ, Yang CY, 2023. Joint scheduling and power allocation with per-user rate constraints for uplink MU-MIMO OFDMA systems. Proc 97^th Vehicular Technology Conf, p.1–5. https://doi.org/10.1109/VTC2023-Spring57618.2023.10200843
Google Scholar

Download references

Author information

These two authors contributed equally to this work

Authors and Affiliations

Naval Aviation University, Yantai, 264001, China
Zhongyang Mao, Zhilin Zhang, Faping Lu, Xiguo Liu, Zhichao Xu, Yaozong Pan & Jiafang Kang
Shandong Key Laboratory of Sea and Air Information Perception and Processing Technology, Yantai, 264001, China
Zhongyang Mao, Zhilin Zhang, Faping Lu, Xiguo Liu, Zhichao Xu, Yaozong Pan & Jiafang Kang
PLA 91001 Unit, Beijing, 100000, China
Yang You

Authors

Zhongyang Mao
View author publications
Search author on:PubMed Google Scholar
Zhilin Zhang
View author publications
Search author on:PubMed Google Scholar
Faping Lu
View author publications
Search author on:PubMed Google Scholar
Xiguo Liu
View author publications
Search author on:PubMed Google Scholar
Zhichao Xu
View author publications
Search author on:PubMed Google Scholar
Yaozong Pan
View author publications
Search author on:PubMed Google Scholar
Jiafang Kang
View author publications
Search author on:PubMed Google Scholar
Yang You
View author publications
Search author on:PubMed Google Scholar

Contributions

Zhongyang MAO designed the research. Zhilin ZHANG and Yang YOU processed the data. Jiafang KANG and Yaozong PAN drafted the paper. Xiguo LIU and Zhichao XU helped organize the paper. Zhilin ZHANG and Faping LU revised and finalized the paper.

Corresponding author

Correspondence to Zhilin Zhang.

Ethics declarations

All the authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mao, Z., Zhang, Z., Lu, F. et al. Dynamic joint resource allocation in maritime wireless communication networks: a meta-reinforcement learning approach based on knowledge embedding. Front Inform Technol Electron Eng 26, 2672–2687 (2025). https://doi.org/10.1631/FITEE.2500007

Download citation

Received: 03 January 2025
Accepted: 28 May 2025
Published: 23 December 2025
Version of record: 23 December 2025
Issue date: December 2025
DOI: https://doi.org/10.1631/FITEE.2500007

Dynamic joint resource allocation in maritime wireless communication networks: a meta-reinforcement learning approach based on knowledge embedding

Abstract

Access this article

Similar content being viewed by others

Dynamic allocation and optimization strategy of communication network resources driven by reinforcement learning

Demand Based Dynamic Bandwidth Allocation in Multibeam Satellites Using Machine Learning

Research on 3D Observation Path Planning Method for Mobile Platforms Based on Near-End Strategy Optimization

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Rights and permissions

About this article

Cite this article

Key words

CLC number

Dynamic joint resource allocation in maritime wireless communication networks: a meta-reinforcement learning approach based on knowledge embedding

Abstract

Access this article

Similar content being viewed by others

Dynamic allocation and optimization strategy of communication network resources driven by reinforcement learning

Demand Based Dynamic Bandwidth Allocation in Multibeam Satellites Using Machine Learning

Research on 3D Observation Path Planning Method for Mobile Platforms Based on Near-End Strategy Optimization

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number