A new dataset containing data on dengue hospitalizations in Brazil between 1999 and 2021 has been published. The dataset, available on Zenodo, was created to improve the temporal granularity of the originally monthly data, making it more suitable for training AI models for epidemiological forecasting.

Dataset Details

The dataset harmonizes municipal-level time series related to dengue hospitalizations throughout Brazil and disaggregates them to weekly resolution (epidemiological weeks) using an interpolation protocol. This protocol includes a correction phase to preserve monthly totals.

Statistical and temporal validity of this disaggregation was assessed using a high-resolution reference dataset from the state of Sao Paulo (2024), which simultaneously provides monthly and epidemiological-week counts. Three strategies were compared: linear interpolation, jittering, and cubic spline. The results indicated that cubic spline interpolation achieved the highest adherence to the reference data, and this strategy was therefore adopted to generate weekly series for the 1999-2021 period.

Explanatory Variables

In addition to hospitalization time series, the dataset includes a comprehensive set of explanatory variables commonly used in epidemiological and environmental modeling, such as demographic density, CH4, CO2, and NO2 emissions, poverty and urbanization indices, maximum temperature, mean monthly precipitation, minimum relative humidity, and municipal latitude and longitude. These variables were temporally disaggregated following the same scheme to ensure multivariate compatibility.

Documentation and Quality

The paper documents the dataset's provenance, structure, formats, licenses, limitations, and quality metrics (MAE, RMSE, R2, KL, JSD, DTW, and the KS test), and provides usage recommendations for multivariate time-series analysis, environmental health studies, and the development of machine learning and deep learning models for outbreak forecasting.