RepairNatrix - a Snakemake workflow for processing DNA sequencing data for DNA storage
Author:
Schwarz Peter Michael1,
Welzel Marius1,
Heider Dominik1,
Freisleben Bernd1
Affiliation:
1. University of Marburg Department of Mathematics and Computer Science, , Marburg, 35032, Germany
Abstract
Abstract
Motivation
There has been rapid progress in the development of error-correcting and constrained codes for DNA storage systems in recent years. However, improving the steps for processing raw sequencing data for DNA storage has a lot of untapped potential for further progress. In particular, constraints can be used as prior information to improve the processing of DNA sequencing data. Furthermore, a workflow tailored to DNA storage codes enables fair comparisons between different approaches while leading to reproducable results.
Results
We present RepairNatrix, a read-processing workflow for DNA storage. RepairNatrix supports preprocessing of raw sequencing data for DNA storage applications and can be used to flag and heuristically repair constraint-violating sequences to further increase the recoverability of encoded data in the presence of errors. Compared to a preprocessing strategy without repair functionality, RepairNatrix reduced the number of raw reads required for the successful, error-free decoding of the input files by a factor of 25 to 35 across different datasets.
Availability
RepairNatrix is available on Github: https://github.com/umr-ds/repairnatrix
Publisher
Oxford University Press (OUP)
Subject
Computer Science Applications,Genetics,Molecular Biology,Structural Biology
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Data recovery methods for DNA storage based on fountain codes;Computational and Structural Biotechnology Journal;2024-12