1College of Horticulture and Landscape Architecture, Northeast Agricultural University, Harbin 150030, China 2State Key Laboratory of Forage Breeding-by-Design and Utilization, Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China 3National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China 4Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen Key Laboratory of Agricultural Synthetic Biology, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China 5Wuhan Jianbing Technology Co., Ltd., Wuhan, China 6College of Life Sciences, Northeast Agricultural University, Harbin 150030, China 7Shenzhen CEM Biomedical Technology Ltd., Shenzhen, China 8Academician Workstation of Agricultural High-tech Industrial Area of the Yellow River Delta, National Center of Technology Innovation for Comprehensive Utilization of Saline-Alkali Land, Dongying 257300, China 9Lead contact *Corresponding author. E-mail: Wg0003@neau.edu.cn,zhouyao@ibcas.ac.cn,axwang@neau.edu.cn †Xue Cui,Yuxin Liu,Miao Sun and Qiyue Zhao contributed equally to the study.
Received: 31 Jan 2025 Accepted: 06 Apr 2025 Published online: 16 Apr 2025
Abstract
Structural variations (SVs) in repetitive sequences could only be detected within a broad region due to imprecise breakpoints, leading to classification errors and inaccurate trait analysis. Through manual inspection at 4532 variant regions identified by integrating 14 detection pipelines between two tomato genomes, we generated an SV benchmark at base-pair resolution. Evaluation of all pipelines yielded F1-scores below 53.77% with this benchmark, underscoring the urgent need for advanced detection algorithms in plant genomics. Analyzing the alignment features of the repetitive sequences in each region, we summarized four patterns of SV breakpoints and revealed that deviations in breakpoint identification were primarily due to copy misalignment. According to the similarities among copies, we identified 1635 bona fide SVs with precise breakpoints, including substitutions (223), which should be taken as a fundamental SV type, alongside insertions (780), deletions (619), and inversions (13), all showing preferences for SV occurrence within AT-repeat regions of regulatory loci. This precise resolution of complex SVs will foster genome analysis and crop improvement.