Binary logistic regression model applied to data on accidents occurred on federal highways in Brazil

Authors

DOI:

https://doi.org/10.33448/rsd-v11i15.36833

Keywords:

Supervised analysis; Machine learning; Odds ratio; Lethality of accidents; Highway accidents.

Abstract

Accidents on federal highways in Brazil lead to social and economic impacts on the country. Data from the Federal Highway Police reveal that thousands of people lose their lives in these accidents year after year. This paper aims to examine the factors that influence the probability of death based on the occurrence of the accident. The estimation of a binary logistic regression model took place, in which the event of interest is the circumstance of death in an accident with data from 2021. Following variable selection procedures, it was possible to obtain the final model, which was later validated with data from 2022. The accuracy of the model for both 2021 and 2022 data was around 70%. Then, the odds ratio was calculated between some distinct categories, and how much of an increase in accident lethality it generates compared to the reference category. For example, in a crash, a pedestrian is 15.6 times more likely to die when compared to the driver, while a cyclist is 5.3 times more likely to die. Although most accidents have a human cause, some results show the need of public policies that can help reduce these tragedies. To explain the model, a dashboard was created in a way that the user is able to obtain the probability of death by selecting specific accident characteristics and those involved.

Author Biography

Yuri Machado de Souza, Universidade de São Paulo

Master in Applied Economics

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6), 716-723.

Carvalho, M. S. et al. (2011). Análise de sobrevivência: teoria e aplicações em saúde. FIOCRUZ.

Colosimo, E. A. & Giolo, S.R. (2006). Análise de sobrevivência aplicada. Edgard Blucher.

Core Team. (2021). A language and environment for statistical computing. Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org.

Core Team. (2021). Core Team and contributors worldwide stats: The R Stats Package. R package version 4.2.0. 2021.. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html.

CNT. Confederação Nacional do Transporte (2021). Painel CNT de Consultas Dinâmicas dos Acidentes Rodoviários. https://www.cnt.org.br/painel-acidente.

Fávero, L. P. & Belfiore, P. (2017). Manual de análise de dados: estatística e modelagem multivariada com Excel®, SPSS® e Stata®. Elsevier.

Giolo, S. R. (2017). Introdução à Análise de Dados Categóricos com Aplicações. Projeto Fisher ABE.

Izbicki, R. & dos Santos, T. M. (2020). Aprendizado de máquina: uma abordagem estatística. Rafael Izbicki,.

Junior, G. T. B., Bertho, A. C. S. & Veiga, A. C. (2019). A letalidade dos acidentes de trânsito nas rodovias federais brasileiras. Revista Brasileira de Estudos de População, 36, 1-22.

Miranda, R., Silva, W. P. & Dutt-Ross, S. (2021). Identificação de fatores determinantes da severidade das lesões sofridas por pedestres nas rodovias federais brasileiras entre 2017 e 2019: Análise via regressão logística multinomial. Scientia Plena, 17 (4).

McCullagh, P. & Nelder, J. A. (1989). Generalized Linear Models. London – New York. Second edition, Chapman and Hall, 1989.

PRF. Polícia Rodoviária Federal. (2021). https://arquivos.prf.gov.br/arquivos/index.php/s/n1T3lymvIdDOzzb.

Roquim, F. V., Nakamura, L. R., Ramires, T. G. & Lima, R. R. (2019). Regressão logística: o que leva um acidente rodoviário a ser uma tragédia? Sigmae, 8 (2), 19-28.

Santos, D. F. (2017). Modelo de regressão log-logístico discreto com fração de cura para dados de sobrevivência. (Dissertação de Mestrado) . Universidade de Brasília, Brasília, Brasil.

Schwarz, G. (1978). Estimating the dimensional of a model. Annals of Statistics, 6, 461-464.

Sugiura, N. (1978). Further analysts of the data by Akaike’s information criterion and the finite corrections: Further analysis of the data by Akaike’s. Communications in Statistics – Theory and Methods, 7 (1), 13-26.

Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc, 462-482.

WHO. World Health Organization. (2015). Global status report on road safety 2015. https://shortest.link/whointviolenceinjuryprevention.

Downloads

Published

12/11/2022

How to Cite

SANTOS, D. F. dos .; SOUZA, Y. M. de . Binary logistic regression model applied to data on accidents occurred on federal highways in Brazil. Research, Society and Development, [S. l.], v. 11, n. 15, p. e120111536833, 2022. DOI: 10.33448/rsd-v11i15.36833. Disponível em: https://www.rsdjournal.org/index.php/rsd/article/view/36833. Acesso em: 25 apr. 2024.

Issue

Section

Exact and Earth Sciences