MACHINE LEARNING CLASSIFICATION OF THE EPIDEMIOLOGIC STAGES OF INFLAMMATORY BOWEL DISEASE ACROSS GEOGRAPHY AND TIME

Author:

Hracs Lindsay,Windsor Joseph,Gorospe Julia,Buie Michael,Quan Joshua,Caplan Lea,Markovinovic Ante,Cummings Michael,Goddard Quinn,Williamson Tyler,Abbey Yvonne,Abreu Maria,Ali Raja,Abdullah Murdani,Altuwaijri Mansour,Ahuja Vineet,Balderramo Domingo,Banerjee Rupa,Benchimol Eric,Bernstein Charles,Brunet-Mas Eduard,Burisch Johan,Chong Vui Heng,Dotan Iris,Dutta Usha,El Ouali Sara,Forbes Angela,Forss Anders,Gearry Richard,Dao Viet Hang,Hartono Juanda,Hilmi Ida,Julian-Banos Fabian,Kaibullayeva Jamilya,Kelly Paul,Kotze Paulo,Lakatos Peter,Lees Charlie,Limsrivilai Julajak,Loftus Edward,Ludvigsson Jonas,Mak Joyce,Ng Ka Kei,Olen Ola,Panaccione Remo,Paudel Mukesh,Quaresma Abel,Rubin David,Simadibrata Marcellus,Sun Yang,Suzuki Hidekazu,Toro Martin,Turner Dan,Vergara Beatriz Iade,Wei Shu-Chen,Yamamoto-Furusho Jesus,Yang Kuk-Kyun,Ng Siew,Coward Stephanie,Kaplan Gilaad

Abstract

Abstract BACKGROUND Epidemiologic stages of inflammatory bowel disease (IBD) have been proposed: 1. Emergence (low incidence and prevalence); 2. Acceleration in Incidence (rapidly rising incidence, low prevalence); and 3. Compounding Prevalence (stabilizing incidence, rapidly rising prevalence). To date, these stages have been theoretical without quantified definitions of incidence and prevalence. AIM To use machine learning to determine incidence and prevalence ranges corresponding to the epidemiologic stages and provide stage classifications across time for global regions. METHODS We built a supervised random forest classifier in R to determine epidemiologic stages of IBD from population-based studies (n=340), a subset derived from a systematic review on the incidence and prevalence of IBD. A labelled training data set comprising rates of incidence and prevalence of Crohn’s disease (CD) and ulcerative colitis (UC) extracted from the systematic review was used to predict classifications of stage 1, stage 2, or stage 3 for each region, stratified by decade (1960–2019). Model accuracy was measured using a blind validation data set. The validated model was then used to predict stage classifications for regions in the data set. Interquartile ranges for incidence and prevalence of CD and UC were calculated on the random forest output, and the distributions were compared using negative binomial regression. RESULTS The random forest’s classification accuracy on the blinded validation data was 93.7% (95%CI: 90.6, 96.1) indicating an appropriate model fit and performance. Significant differences between all stages for the incidence and prevalence of CD and UC (p<0.001) were found. The clear distinction across stages defines the incidence and prevalence ranges (25th–75th, per 100,000) for IBD as: CD incidence 0.0–0.3, UC incidence 0.2–0.7, CD prevalence 0.3–2.2, and UC prevalence 1.7–8.1 for stage 1; CD incidence 1.0–4.4, UC incidence 2.3–6.3, CD prevalence 9.0–33.9, and UC prevalence 22.8–73.3 for stage 2; and CD incidence 6.6–14.0, UC incidence 10.1–18.1, CD prevalence 163.2–274.7, and UC prevalence 189.1–323.2 for stage 3 (Figure 1). A decade-by-decade analysis shows global regions transitioning across the epidemiologic stages (Figure 2). By the 2010s, North America, Scandinavia, Western Europe, Australia, and New Zealand were in stage 3. Most regions in Asia and Latin America were in stage 1 in the last half of the 20th century, with many transitioning to stage 2 in the 2010s. DISCUSSION Temporal incidence and prevalence data show that regions transition across epidemiologic stages. Numerical definitions of the epidemiologic stages can be used to establish the anticipated burden growth of IBD by providing estimated rates of the number of incident and prevalent IBD cases a region can expect as it transitions between IBD epidemiologic stages in the future. Figure 1 Coalescing ranges for incidence (panel A) and prevalence (panel B) by Crohn’s disease and ulcerative colitis at epidemiologic stage 1, stage 2, and stage 3. Data were categorized by data type (incidence or prevalence), disease type (Crohn’s disease or ulcerative colitis), and epidemiologic stage, as per results from the random forest classifier. The 25th and 75th percentiles were calculated using the rates across all regions included in the analysis for all available time points for each box group. Figure 2 Global maps depicting epidemiologic stages of IBD evolution from 1960 to 2019 broken down by decade, as predicted by the random forest model. Panel A contains stage classifications from 1960 to 1969; panel B contains stage classifications from 1970 to 1979; panel C contains stage classifications from 1980 to 1989; panel D contains stage classifications from 1990 to 1999; panel E contains stage classifications from 2000 to 2009; and panel F contains stage classifications from 2010 to 2019.

Publisher

Oxford University Press (OUP)

Subject

Gastroenterology,Immunology and Allergy

全球学者库

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"全球学者库"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前全球学者库共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3