Normal Workflow and Key Strategies for Data Cleaning Toward Real-World Data: Viewpoint-Reference-Cited by-同舟云学术

Normal Workflow and Key Strategies for Data Cleaning Toward Real-World Data: Viewpoint

Published:2023-09-21 Issue: Volume:12 Page:e44310
ISSN:1929-073X
Container-title:Interactive Journal of Medical Research
language:en
Short-container-title:Interact J Med Res

Author:

Guo Manping^ORCID,Wang Yiming^ORCID,Yang Qiaoning^ORCID,Li Rui^ORCID,Zhao Yang^ORCID,Li Chenfei^ORCID,Zhu Mingbo^ORCID,Cui Yao^ORCID,Jiang Xin^ORCID,Sheng Song^ORCID,Li Qingna^ORCID,Gao Rui^ORCID

Abstract

With the rapid development of science, technology, and engineering, large amounts of data have been generated in many fields in the past 20 years. In the process of medical research, data are constantly generated, and large amounts of real-world data form a “data disaster.” Effective data analysis and mining are based on data availability and high data quality. The premise of high data quality is the need to clean the data. Data cleaning is the process of detecting and correcting “dirty data,” which is the basis of data analysis and management. Moreover, data cleaning is a common technology for improving data quality. However, the current literature on real-world research provides little guidance on how to efficiently and ethically set up and perform data cleaning. To address this issue, we proposed a data cleaning framework for real-world research, focusing on the 3 most common types of dirty data (duplicate, missing, and outlier data), and a normal workflow for data cleaning to serve as a reference for the application of such technologies in future studies. We also provided relevant suggestions for common problems in data cleaning.

Publisher

JMIR Publications Inc.

Subject

General Medicine

Reference29 articles.

1. A simplified guide to randomized controlled trials

2. The Emergence of the Randomized, Controlled Trial

3. Correspondence

4. Real-World Evidence and Real-World Data for Evaluating Drug Safety and Effectiveness

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Guidance for the Reporting of Bibliometric Analyses: A Scoping Review;2024-08-28

2. The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare;Robotics;2024-07-23

3. Characteristics analysis of Internet pharmacy consultation services for children in southwest China during the post-epidemic era: A cross-sectional study;International Journal of Medical Informatics;2024-06