Skip to main content

Datasets

Standard Dataset

Diurnal Pulse-Aware Image Feature Extraction

Citation Author(s):
anonymous anonymous
Submitted by:
Zhaoyang Ma
Date Created:
Last updated:
DOI:
10.21227/bas5-2r66
AI-Powered Dataset Intelligence is available for this dataset exclusively to institutional subscribers.

Abstract

This dataset contains data and code for ariticle Diurnal Pulse-Aware Learning with Cross-Attention and Masked Convolution with for Tropical Cyclone Rapid Intensification Forecasting.  This study utilizes data spanning from January 2019 to September 2024, comprising a total of 13,336 samples. Among these, 3,641 samples correspond to rapidly intensifying TCs. Each sample contains TC data from four historical time steps (at 6-hour intervals) and intensity labels for four future time steps (24-hour prediction based on 24-hour of historical data). The historical data include satellite imagery, GPH, SST, SSS, TC location (latitude and longitude), intensity, and temporal information. The data cover global TC events across six basins: Western Pacific Ocean (WP), Eastern Pacific Ocean (EP), Southern Pacific Ocean (SP), North Indian Ocean (NI), South Indian Ocean (SI), and North Atlantic (NA). The geographical distribution of TCs used in this study is illustrated in Fig. 2.
TC track and intensity data are sourced from the International Best Track Archive for Climate Stewardship (IBTrACS) dataset, which provides global observational records of TCs, including MSW, central coordinates, TC classification, temporal information and so on.
Satellite imagery is obtained from the Gridded Satellite (GridSat-B1) dataset, which offers global geostationary satellite images with dimensions of 2000×5143. The images include three channels: infrared (IR, 11 μm), water vapor (WV, 6.7 μm), and visible (VIS, 0.6 μm). The IR channel is used in this study due to its high image quality reaching Climate Data Record (CDR) standards. Based on the TC center locations provided by IBTrACS, 256×256 pixel image patches are cropped and used as model inputs. To enable real-time intensity prediction, GPH data are obtained from the High Resolution Forecast (HRES) dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). The ECMWF HRES dataset delivers hourly atmospheric field forecasts with four daily runs (at 00:00, 06:00, 12:00, and 18:00 UTC). To minimize forecast error, the model utilizes 6-hour short-term GPH forecasts from HRES as input. GPH fields at six pressure levels (200 hPa, 300 hPa, 500 hPa, 700 hPa, 850 hPa, and 1000 hPa) are selected. For each TC, a 20°×20° GPH patch centered on the TC is cropped and resized to 256×256×6.
SST and SSS data are sourced from the Global Ocean Forecasting System (GOFS) 3.1 GLBy0.08 dataset provided by the Hybrid Coordinate Ocean Model (HYCOM) Consortium, covering the period from December 2018 to September 2024. Similarly, 20°×20° patches of SST and SSS are cropped based on TC center locations and resized to 256×256.  The code used in this study is also presented.                                                                                                                                                                                                                                              

Instructions:

This dataset contains four data files. x.npy has a shape of (13,336,4,256,256,9), containing 13,336 samples, each sample contains four time points, and has an image size of 256*256. x.npy has nine channels, channel 1 contains satellite image data ,channel 2-7contains GPH data, channel 8 contains SST data, and channel 9 contains SSS data. x_2d.npy has a shape of (13,336,4,6), contains 2D data of (MSW, lat,lon,time_year,time_month_time_day). dp.npy has a shape of (13,336,4,256,256), containing DP data with shape of 256*256. wind_y.npy has a shape of (13,336,4) containing the ground truth label, which is MSW of four future time steps. intensity.py is the code used in this study containing model structure and prediction results.