Hostname: page-component-848d4c4894-2pzkn Total loading time: 0 Render date: 2024-05-28T06:45:27.473Z Has data issue: false hasContentIssue false

Boosted Poisson regression trees: a guide to the BT package in R

Published online by Cambridge University Press:  15 January 2024

Gireg Willame*
Affiliation:
Actuarial Expert Consultant, Detralytics, Brussels, Belgium
Julien Trufin
Affiliation:
Department of Mathematics, Université Libre de Bruxelles (ULB), Brussels, Belgium
Michel Denuit
Affiliation:
Institute of Statistics, Biostatistics and Actuarial Science, UCLouvain, Louvain-la-Neuve, Belgium
*
Corresponding author: Gireg Willame; Email: g.willame@detralytics.eu

Abstract

Thanks to its outstanding performances, boosting has rapidly gained wide acceptance among actuaries. Wüthrich and Buser (Data Analytics for Non-Life Insurance Pricing. Lecture notes available at SSRN. http://dx.doi.org/10.2139/ssrn.2870308, 2019) established that boosting can be conducted directly on the response under Poisson deviance loss function and log-link, by adapting the weights at each step. This is particularly useful to analyze low counts (typically, numbers of reported claims at policy level in personal lines). Huyghe et al. (Boosting cost-complexity pruned trees on Tweedie responses: The ABT machine for insurance ratemaking. Scandinavian Actuarial Journal. https://doi.org/10.1080/03461238.2023.2258135, 2022) adopted this approach to propose a new boosting machine with cost-complexity pruned trees. In this approach, trees included in the score progressively reduce to the root-node one, in an adaptive way. This paper reviews these results and presents the new BT package in R contributed by Willame (Boosting Trees Algorithm. https://cran.r-project.org/package=BT; https://github.com/GiregWillame/BT, 2022), which is designed to implement this approach for insurance studies. A numerical illustration demonstrates the relevance of the new tool for insurance pricing.

Type
Actuarial Software
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ciatto, N., Denuit, M., Trufin, J. & Verelst, H. (2023). Does autocalibration improve goodness of lift? European Actuarial Journal, 13, 479486.10.1007/s13385-022-00330-4CrossRefGoogle Scholar
Denuit, M., Hainaut, D. & Trufin, J. (2020). Effective statistical learning methods for actuaries II: Tree-based methods and extensions. Springer actuarial lecture notes series. Springer.10.1007/978-3-030-57556-4CrossRefGoogle Scholar
Denuit, M., Sznajder, D. & Trufin, J. (2019). Model selection based on Lorenz and concentration curves, Gini indices and convex order. Insurance: Mathematics and Economics, 89, 128139.Google Scholar
Dutang, C. & Charpentier, A. (2020). CASDatasets: Insurance datasets. https://github.com/dutangc/CASdatasets Google Scholar
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29, 11891232.10.1214/aos/1013203451CrossRefGoogle Scholar
Guelman, L. (2012). Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications, 39, 36593667.10.1016/j.eswa.2011.09.058CrossRefGoogle Scholar
Hainaut, D., Trufin, J. & Denuit, M. (2022). Response versus gradient boosting trees, GLMs and neural networks under Tweedie loss and log-link. Scandinavian Actuarial Journal, 841, 20222866.Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. (2008). The elements of statistical learning (2nd ed.). Springer.Google Scholar
Henckaerts, R., Cote, M.-P., Antonio, K. & Verbelen, R. (2021). Boosting insights in insurance tariff plans with tree-based machine learning methods. North American Actuarial Journal, 25, 255285.10.1080/10920277.2020.1745656CrossRefGoogle Scholar
Huyghe, J., Trufin, J. & Denuit, M. (2022). Boosting cost-complexity pruned trees on Tweedie responses: The ABT machine for insurance ratemaking. Scandinavian Actuarial Journal. https://doi.org/10.1080/03461238.2023.2258135 CrossRefGoogle Scholar
Lee, S. C. & Lin, S. (2018). Delta boosting machine with application to general insurance. North American Actuarial Journal, 22, 405425.10.1080/10920277.2018.1431131CrossRefGoogle Scholar
Liu, Y., Wang, B. & Lv, S. (2014). Using multi-class AdaBoost tree for prediction frequency of auto insurance. Journal of Applied Finance and Banking, 4, 4553.Google Scholar
Pesantez-Narvaez, J., Guillen, M. & Alcaniz, M. (2019). Predicting motor insurance claims using telematics data- XGBoost versus logistic regression. Risks, 7, 116.10.3390/risks7020070CrossRefGoogle Scholar
Therneau, T. M. & Atkinson, B. (2018). rpart: Recursive partitioning and regression trees. https://cran.r-project.org/package=rpartGoogle Scholar
Tevert, D. (2013). Exploring model lift: Is your model worth implementing. Actuarial Review, 40, 1013.Google Scholar
Willame, G. (2022). BT: (Adaptive) boosting trees algorithm. https://cran.r-project.org/package=BT and https://github.com/GiregWillame/BT Google Scholar
Wüthrich, M. V. (2023). Model selection with Gini indices under auto-calibration. European Actuarial Journal, 13, 469477.10.1007/s13385-022-00339-9CrossRefGoogle Scholar
Wüthrich, M. V. & Buser, C. (2019). Data analytics for non-life insurance pricing. Lecture notes available at SSRN. http://dx.doi.org/10.2139/ssrn.2870308 CrossRefGoogle Scholar
Yang, Y., Qian, W. & Zou, H. (2018). Insurance premium prediction via gradient tree-boosted Tweedie compound Poisson models. Journal of Business & Economic Statistics, 36, 456470.10.1080/07350015.2016.1200981CrossRefGoogle Scholar