Transforming Thyroid Cancer Diagnosis and Staging Information from Unstructured Reports to the Observational Medical Outcome Partnership Common Data Model

Author:

Yoo Sooyoung1,Yoon Eunsil1,Boo Dachung1,Kim Borham1,Kim Seok1,Paeng Jin Chul2,Yoo Ie Ryung3,Choi In Young45,Kim Kwangsoo6,Ryoo Hyun Gee78,Lee Sun Jung45,Song Eunhye9,Joo Young-Hwan10,Kim Junmo11,Lee Ho-Young12

Affiliation:

1. Office of eHealth Research and Business, Healthcare Innovation Park, Seoul National University Bundang Hospital, Seongnam, South Korea

2. Department of Nuclear Medicine, Seoul National University, College of Medicine, Seoul, South Korea

3. Division of Nuclear Medicine, Department of Radiology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, South Korea

4. Department of Medical Informatics, The Catholic University of Korea, College of Medicine, Seoul, South Korea

5. Department of Biomedicine and Health Sciences, The Catholic University of Korea, College of Medicine, Seoul, South Korea

6. Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, South Korea

7. Department of Nuclear Medicine, Seoul National University Hospital, Seoul, South Korea

8. Department of Nuclear Medicine, Seoul National University Bundang Hospital, Seongnam, South Korea

9. Department of Data Science Research, Innovative Medical Technology Research Institute, Seoul National University Hospital, Seoul, South Korea

10. Biomedical Research Institute, Seoul National University Hospital, Seoul, South Korea

11. Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, South Korea

Abstract

Abstract Background Cancer staging information is an essential component of cancer research. However, the information is primarily stored as either a full or semistructured free-text clinical document which is limiting the data use. By transforming the cancer-specific data to the Observational Medical Outcome Partnership Common Data Model (OMOP CDM), the information can contribute to establish multicenter observational cancer studies. To the best of our knowledge, there have been no studies on OMOP CDM transformation and natural language processing (NLP) for thyroid cancer to date. Objective We aimed to demonstrate the applicability of the OMOP CDM oncology extension module for thyroid cancer diagnosis and cancer stage information by processing free-text medical reports. Methods Thyroid cancer diagnosis and stage-related modifiers were extracted with rule-based NLP from 63,795 thyroid cancer pathology reports and 56,239 Iodine whole-body scan reports from three medical institutions in the Observational Health Data Sciences and Informatics data network. The data were converted into the OMOP CDM v6.0 according to the OMOP CDM oncology extension module. The cancer staging group was derived and populated using the transformed CDM data. Results The extracted thyroid cancer data were completely converted into the OMOP CDM. The distributions of histopathological types of thyroid cancer were approximately 95.3 to 98.8% of papillary carcinoma, 0.9 to 3.7% of follicular carcinoma, 0.04 to 0.54% of adenocarcinoma, 0.17 to 0.81% of medullary carcinoma, and 0 to 0.3% of anaplastic carcinoma. Regarding cancer staging, stage-I thyroid cancer accounted for 55 to 64% of the cases, while stage III accounted for 24 to 26% of the cases. Stage-II and -IV thyroid cancers were detected at a low rate of 2 to 6%. Conclusion As a first study on OMOP CDM transformation and NLP for thyroid cancer, this study will help other institutions to standardize thyroid cancer–specific data for retrospective observational research and participate in multicenter studies.

Funder

Korea Health Technology R&D Project through the Korea Health Industry Development Institute

Ministry of Health & Welfare, Republic of Korea

Technology Innovation Program

Ministry of Trade, Industry & Energy

Publisher

Georg Thieme Verlag KG

Subject

Health Information Management,Computer Science Applications,Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3