Abstract
AbstractPrevious NSCLC genomic studies were mostly based on the next-generation sequencing of short reads, which is an efficient approach for identifying single nucleotide variants and small indels but ineffective for identifying structural variants, especially large-scale insertions. Here, we studied 151 lung adenocarcinoma (LUAD) and 106 lung squamous cell carcinoma (LUSC) samples and paired blood samples using nanopore sequencing technology. We developed a rigorous computational pipeline and characterized the landscape of large-scale somatic insertions in NSCLC. Combining other omics data, we report three findings: 1. we identified an LUSC-enriched somatic simple repeat expansion shared by approximately 40% of LUSC patients that regulatesPTPRZ1gene expression through distal enhancers; 2. the somatic insertion of transposable elements (TEs) in NSCLC were mostly ‘complex TEs’ consisting of multiple TE elements; and 3. the insertion of short interspersed nuclear elements, especially from the Alu family in young lineages, is a frequent somatic mutation type that shapes the transcriptome of NSCLC through the expression of these elements.
Publisher
Cold Spring Harbor Laboratory