Google has launched a flash flood forecasting model that utilizes a large language model to analyze news articles for historical flood data, addressing a significant gap in weather forecasting.
The new system tackles a critical issue, as flash floods kill more than 5,000 people annually, but are difficult to predict due to their short-lived and localized nature. Traditional monitoring methods often miss these events, limiting the effectiveness of deep learning models.
To develop the model, researchers used Gemini to process 5 million news articles and isolate reports of 2.6 million distinct floods. This process created a geo-tagged time series dataset called “Groundsource.” Gila Loike, a Google Research product manager, stated this is the first time the company has used language models for this type of work. The dataset was shared publicly on Thursday morning.
Google trained a Long Short-Term Memory (LSTM) neural network using Groundsource as a baseline. The model ingests global weather forecasts to generate flash flood probabilities. The forecasting system now highlights risks for urban areas in 150 countries on the Flood Hub platform and shares data with emergency response agencies.
The impact of the model is already being felt, with António José Beleza, an emergency response official at the Southern African Development Community, saying the model helped his organization respond to floods more quickly. Although the system operates at a low resolution, identifying risk across 20-square-kilometer areas, it is less precise than the U.S. National Weather Service’s system because it does not incorporate local radar data.
The project specifically targets regions without expensive weather-sensing infrastructure or extensive meteorological records. Juliet Rothenberg, a program manager on Google’s Resilience team, said the dataset helps rebalance the map by enabling extrapolation to areas with less information. The team hopes to apply this approach to other phenomena, such as heat waves and mudslides.
Experts in the field are taking note of Google’s efforts. Marshall Moutenot, CEO of Upstream Tech, said Google’s work contributes to a growing effort to address data scarcity in geophysics. Moutenot co-founded dynamical.org, a group curating machine learning-ready weather data. He stated that while there is too much Earth data overall, there is not enough truth data for evaluation, making this a creative approach to sourcing it.




