{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:42:26Z","timestamp":1760060546855,"version":"build-2065373602"},"reference-count":51,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T00:00:00Z","timestamp":1756771200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>In off-road scenes, segmentation targets exhibit significant scale progression due to perspective depth effects from oblique viewing angles, meaning that the size of the same target undergoes continuous, boundary-less progressive changes along a specific direction. This asymmetric variation disrupts the geometric symmetry of targets, causing traditional segmentation networks to face three key challenges: (1) inefficientcapture of continuous-scale features, where pyramid structures and multi-scale kernels struggle to balance computational efficiency with sufficient coverage of progressive scales; (2) degraded intra-class feature consistency, where local scale differences within targets induce semantic ambiguity; and (3) loss of high-frequency boundary information, where feature sampling operations exacerbate the blurring of progressive boundaries. To address these issues, this paper proposes the ProCo-NET framework for systematic optimization. Firstly, a Progressive Strip Convolution Group (PSCG) is designed to construct multi-level receptive field expansion through orthogonally oriented strip convolution cascading (employing symmetric processing in horizontal\/vertical directions) integrated with self-attention mechanisms, enhancing perception capability for asymmetric continuous-scale variations. Secondly, an Offset-Frequency Cooperative Module (OFCM) is developed wherein a learnable offset generator dynamically adjusts sampling point distributions to enhance intra-class consistency, while a dual-channel frequency domain filter performs adaptive high-pass filtering to sharpen target boundaries. These components synergistically solve feature consistency degradation and boundary ambiguity under asymmetric changes. Experiments show that this framework significantly improves the segmentation accuracy and boundary clarity of multi-scale targets in off-road scene segmentation tasks: it achieves 71.22% MIoU on the standard RUGD dataset (0.84% higher than the existing optimal method) and 83.05% MIoU on the Freiburg_Forest dataset. Among them, the segmentation accuracy of key obstacle categories is significantly improved to 52.04% (2.7% higher than the sub-optimal model). This framework effectively compensates for the impact of asymmetric deformation through a symmetric computing mechanism.<\/jats:p>","DOI":"10.3390\/sym17091428","type":"journal-article","created":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T08:23:38Z","timestamp":1756801418000},"page":"1428","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["ProCo-NET: Progressive Strip Convolution and Frequency- Optimized Framework for Scale-Gradient-Aware Semantic Segmentation in Off-Road Scenes"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-9876-5244","authenticated-orcid":false,"given":"Zihang","family":"Liu","sequence":"first","affiliation":[{"name":"School of Automation, Wuhan University of Technology, Wuhan 430070, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3021-5371","authenticated-orcid":false,"given":"Donglin","family":"Jing","sequence":"additional","affiliation":[{"name":"School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-2786-4857","authenticated-orcid":false,"given":"Chenxiang","family":"Ji","sequence":"additional","affiliation":[{"name":"School of Automation, Wuhan University of Technology, Wuhan 430070, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1007\/s10846-017-0760-x","article-title":"Traversable region estimation for mobile robots in an outdoor image","volume":"92","author":"Matsuzaki","year":"2018","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1023\/B:AURO.0000047286.62481.1d","article-title":"Obstacle detection and terrain classification for autonomous off-road navigation","volume":"18","author":"Manduchi","year":"2005","journal-title":"Auton. Robot."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1002\/rob.20279","article-title":"Learning terrain segmentation with classifier ensembles for autonomous robot navigation in unstructured environments","volume":"26","author":"Procopio","year":"2009","journal-title":"J. Field Robot."},{"key":"ref_5","unstructured":"Yuan, Y., Chen, X., and Wang, J. (2020, January 23\u201328). Object-contextual representations for semantic segmentation. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK. Proceedings, Part VI 16."},{"key":"ref_6","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Strudel, R., Garcia, R., Laptev, I., and Schmid, C. (2021, January 10\u201317). Segmenter: Transformer for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00717"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhu, H., and Jing, D. (2024). Optimizing slender target detection in remote sensing with adaptive boundary perception. Remote Sens., 16.","DOI":"10.3390\/rs16142643"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"6505805","DOI":"10.1109\/LGRS.2022.3144513","article-title":"FAR-Net: Fast anchor refining for arbitrary-oriented object detection","volume":"19","author":"Deng","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"162977","DOI":"10.1109\/ACCESS.2024.3491135","article-title":"Deep Semantic Segmentation for Identifying Traversable Terrain in Off-Road Autonomous Driving","volume":"12","author":"Rahi","year":"2024","journal-title":"IEEE Access"},{"key":"ref_11","unstructured":"Larsson, M., Stenborg, E., Toft, C., Hammarstrand, L., Sattler, T., and Kahl, F. (November, January 27). Fine-grained segmentation networks: Self-supervised segmentation for improved long-term visual localization. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ahn, J., and Kwak, S. (2018, January 18\u201323). Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00523"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1109\/TITS.2022.3218403","article-title":"An active and contrastive learning framework for fine-grained off-road semantic segmentation","volume":"24","author":"Gao","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., and Vaswani, A. (2021, January 20\u201325). Bottleneck transformers for visual recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01625"},{"key":"ref_15","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Viswanath, K., Singh, K., Jiang, P., Sujit, P., and Saripalli, S. (2021, January 23\u201327). Offseg: A semantic segmentation framework for off-road driving. Proceedings of the 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE), Lyon, France.","DOI":"10.1109\/CASE49439.2021.9551643"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sharma, S., Ball, J.E., Tang, B., Carruth, D.W., Doude, M., and Islam, M.A. (2019). Semantic segmentation with transfer learning for off-road autonomous driving. Sensors, 19.","DOI":"10.3390\/s19112577"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"8138","DOI":"10.1109\/LRA.2022.3187278","article-title":"Ga-nav: Efficient terrain segmentation for robot navigation in unstructured outdoor environments","volume":"7","author":"Guan","year":"2022","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_19","first-page":"1140","article-title":"Segnext: Rethinking convolutional attention design for semantic segmentation","volume":"35","author":"Guo","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Yang, Y., Wang, J., Xu, W., and Yuille, A.L. (2016, January 27\u201330). Attention to scale: Scale-aware semantic image segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.396"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"5615515","DOI":"10.1109\/TGRS.2023.3294520","article-title":"Toward hierarchical adaptive alignment for aerial object detection in remote sensing images","volume":"61","author":"Deng","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_22","unstructured":"Novosel, J., Viswanath, P., and Arsenali, B. (2019, January 8\u201314). Boosting semantic segmentation with multi-task self-supervised learning for autonomous driving applications. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS-Workshops), Vancouver, BC, Canada."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1239","DOI":"10.1007\/s11263-019-01188-y","article-title":"Self-supervised model adaptation for multimodal semantic segmentation","volume":"128","author":"Valada","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_24","unstructured":"Poudel, R.P., Liwicki, S., and Cipolla, R. (2019). Fast-scnn: Fast semantic segmentation network. arXiv."},{"key":"ref_25","unstructured":"Wu, H., Zhang, J., Huang, K., Liang, K., and Yu, Y. (2019). Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_27","unstructured":"Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018, January 8\u201314). Bisenet: Bilateral segmentation network for real-time semantic segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Nirkin, Y., Wolf, L., and Hassner, T. (2021, January 20\u201325). Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00405"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2375","DOI":"10.1007\/s11263-021-01465-9","article-title":"OCNet: Object context for semantic segmentation","volume":"129","author":"Yuan","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.isprsjprs.2022.06.008","article-title":"UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery","volume":"190","author":"Wang","year":"2022","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_35","first-page":"5601313","article-title":"Cross fusion net: A fast semantic segmentation network for small-scale semantic information capturing in aerial scenes","volume":"60","author":"Peng","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_37","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_40","unstructured":"Yang, J., Li, C., Zhang, P., Dai, X., Xiao, B., Yuan, L., and Gao, J. (2021). Focal self-attention for local-global interactions in vision transformers. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 10\u201317). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s41095-022-0274-8","article-title":"Pvt v2: Improved baselines with pyramid vision transformer","volume":"8","author":"Wang","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Liu, R., Deng, H., Huang, Y., Shi, X., Lu, L., Sun, W., Wang, X., Dai, J., and Li, H. (2021, January 10\u201317). Fuseformer: Fusing fine-grained information in transformers for video inpainting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01378"},{"key":"ref_44","first-page":"7281","article-title":"Hrformer: High-resolution vision transformer for dense predict","volume":"34","author":"Yuan","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhao, H., Zhang, Y., Liu, S., Shi, J., Loy, C.C., Lin, D., and Jia, J. (2018, January 8\u201314). Psanet: Point-wise spatial attention network for scene parsing. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_17"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1109\/TIP.2020.3042065","article-title":"Cgnet: A light-weight context guided network for semantic segmentation","volume":"30","author":"Wu","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Wang, P., Da, C., and Yao, C. (2022, January 23\u201327). Multi-granularity prediction for scene text recognition. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19815-1_20"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Ranftl, R., Bochkovskiy, A., and Koltun, V. (2021, January 10\u201317). Vision transformers for dense prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01196"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Feng, X., Du, H., Fan, H., Duan, Y., and Liu, Y. (2023, January 7\u201314). Seformer: Structure embedding transformer for 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v37i1.25139"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1109\/JRFID.2024.3389088","article-title":"MSSINet: Real-Time Segmentation Based on Multi-Scale Strip Integration","volume":"8","author":"Wang","year":"2024","journal-title":"IEEE J. Radio Freq. Identif."},{"key":"ref_51","unstructured":"Xu, G., Chen, J., Huang, W., Jia, W., Gao, G., and Qi, G.J. (2024). SCASeg: Strip Cross-Attention for Efficient Semantic Segmentation. arXiv."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/9\/1428\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:37:41Z","timestamp":1760035061000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/9\/1428"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,2]]},"references-count":51,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["sym17091428"],"URL":"https:\/\/doi.org\/10.3390\/sym17091428","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2025,9,2]]}}}