This study analyzes cost overrun occurrence (COO) in the context of socioeconomic conditions leveraging machine learning techniques and geographic information systems due to little information about the relationship between SE factors and cost overruns in transportation infrastructure improvement projects. We extract socio-geospatial features in multiple sources of data sets and establish a random forest model to discover their associations with COO. The developed models reveal highly significant features affecting COO, which include original amounts, original duration, management districts, number of lanes, population over 16-years-old, commuting behavior, industrial topography, and average temperature, indicating that socioeconomic conditions play an important role in actual project expenses. Our findings will assist practitioners and decision-makers to better forecast and reflect the likely impacts of socioeconomic conditions surrounding the project in their planning, budgeting, and operation and maintenance. The software for the statistical analysis can be found in github.com/jonghyun-yun/dico.
More detail can easily be written here using Markdown and \(\rm \LaTeX\) math code.