Predicting flood insurance claims with hydrologic and socioeconomic demographics via machine learning: Exploring the roles of topography, minority populations, and political dissimilarity


Current research on flooding risk often focuses on understanding hazards, de-emphasizing the complex pathways of exposure and vulnerability. We investigated the use of both hydrologic and social demographic data for flood exposure mapping with Random Forest (RF) regression and classification algorithms trained to predict both parcel- and tract-level flood insurance claims within New York State, US. Topographic characteristics best described flood claim frequency, but RF prediction skill was improved at both spatial scales when socioeconomic data was incorporated. Substantial improvements occurred at the tract-level when the percentage of minority residents, housing stock value and age, and the political dissimilarity index of voting precincts were used to predict insurance claims. Census tracts with higher numbers of claims and greater densities of low-lying tax parcels tended to have low proportions of minority residents, newer houses, and less political similarity to state level government. We compared this data-driven approach and a physically-based pluvial flood routing model for prediction of the spatial extents of flooding claims in two nearby catchments of differing land use. The floodplain we defined with physically based modeling agreed well with existing federal flood insurance rate maps, but underestimated the spatial extents of historical claim generating areas. In contrast, RF classification incorporating hydrologic and socioeconomic demographic data likely overestimated the flood-exposed areas. Our research indicates that quantitative incorporation of social data can improve flooding exposure estimates.

Publication Type
Journal Article
Brian Buchanan
Christian Guzman
Rebecca Elliott
Eric White, Oregon State University (US Forest Service)
Brian Rahm
Journal of Environmental Management