On March 12, 2026, Google Research announced Groundsource, a large-scale methodology that converts global news coverage into structured disaster data. The team also introduced an open urban flash flood dataset with 2.6 million records from locations worldwide. The announcement appeared on the Google Research blog.
Groundsource: turning news reports into data with Gemini
Groundsource was developed by software engineers Oleg Zlydenko and Rotem Mayo and research scientist Deborah Cohen at Google. Google describes Groundsource as a framework that extracts verified ground truth from unstructured text about natural disasters. Its first application produces a global urban flash flood dataset of 2.6 million events from more than 150 countries.
Groundsource analyzes news reports where flooding is the primary subject, using the Google Read Aloud user-agent to isolate article text. News coverage in 80 languages is translated into English using the Cloud Translation API. The Gemini large language model then classifies, dates, and locates flood events reported from 2000 onward under a structured schema.
Gemini filters out articles focused on forecasts, policy discussions, or general risk, keeping only reports of actual or ongoing floods. It resolves relative time phrases against publication dates to assign specific event dates. Locations are mapped to standardized geographic polygons using Google Maps Platform grounding tools.
Key details and dataset metrics
Google has released the Groundsource urban flash flood data as open access via the Zenodo repository, providing an open flash floods dataset. The methodology and validation results are documented in a research paper focused on the Groundsource framework. Key quantitative details from the announcement include:
- 2.6 million historical urban flash flood events extracted worldwide.
- Coverage spanning more than 150 countries between 2000 and early 2026.
- Manual review found 60 percent of extracted events accurate in both location and timing.
- Manual review found 82 percent accurate enough for practical analysis, such as correct district or event day.
- Groundsource captured between 85 and 100 percent of severe flood events recorded by GDACS from 2020 to 2026.
- The Global Disaster Alert and Coordination System database holds about 10,000 entries across hazard types.
The full Dataset is available for researchers, emergency managers, and other practitioners who need large-sample flash flood records.
Background context and sources
Existing disaster records include the satellite-based Global Flood Database and the archives of the Dartmouth Flood Observatory. These resources highlight inundation footprints but face constraints such as cloud cover, satellite revisit intervals, and event duration. The Global Disaster Alert and Coordination System (GDACS) focuses on high-impact events and, according to Google, contains approximately 10,000 entries.
Google positions Groundsource as a way to address data scarcity, especially for short-lived, localized flash floods. The new dataset significantly expands the count of recorded events relative to prior flood monitoring systems. A map shared by Google shows dense clusters of Groundsource events in many regions, with GDACS floods highlighted for comparison.
Groundsource feeds a system designed to provide near-global urban flash flood forecasts, described in a separate paper. According to Google, this system can predict flash floods up to 24 hours before they occur and is being deployed in Flood Hub. A related Google Research post on modeling and prediction of flash floods in urban areas details how Groundsource data underpins these AI-driven forecasts.
Google states that these forecasts align with its Flood Forecasting Initiative. The team reports ongoing work to refine the model, extend coverage to rural regions, and incorporate additional data sources. Google also plans to apply the Groundsource methodology to other hazard types, including droughts, landslides, and avalanches. Groundsource is part of the Google Earth AI portfolio of geospatial models and datasets.






