1State Key Laboratory of Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences 518000, Shenzhen, China 2Xinjiang Key Laboratory of Biological Resources and Genetic Engineering, College of Life Science and Technology, Xinjiang University, Urumqi, Xinjiang 830046, China 3Department of Horticulture, Hainan Institute of Northwest A&F University, Sanya 572024, China 4College of Forestry, Beijing Forestry University, 100083 Beijing, China 5State Key Laboratory of Tropical Crop Breeding, Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou 571100, China 6Key Laboratory of Ecology of Rare and Endangered Species and Environmental Protection (Ministry of Education) & Guangxi Key Laboratory of Landscape Resources Conservation and Sustainable Utilization in Lijiang River Basin, Guangxi University Engineering Research Center of Bioinformation and Genetic Improvement of Specialty Crops, Guangxi 541006, China 7Guangxi Subtropical Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530001,China 8Sanya Research Institute of Chinese Academy of Tropical Agricultural Sciences, Sanya 572025, China *Corresponding author. E-mail: xiaohua01@caas.cn,Huangjian1984xy@163.com,zhouyongfeng@caas.cn,tianxm333333@foxmail.com †Bilal Ahmad,Ying Su,Yani Hao contributed equally to the study.
Received: 01 Mar 2025 Accepted: 16 Jun 2025 Published online: 01 Jul 2025
Abstract
Most genomic studies start by mapping sequencing data to a reference genome. The quality of reference genome assembly, genetic relatedness to the studied population, and the mapping method employed directly impact variant calling accuracy and subsequent genomic analyses, introducing reference bias and resulting in erroneous conclusions. However, the impacts of reference bias have gained limited attention. This study compared population genomic analyses using four different reference genomes of mango (Mangifera indica), including the two haploid assemblies of haplotype-resolved telomere-to-telomere (T2T) genome assembly, a pangenome, and an older version of the reference genome available on NCBI. The choice of reference genome dramatically impacted the mapping efficiency and resulted in notable differences in calling the genetic variants, particularly structural variations (SVs). Phylogenetic analysis was more sensitive to the reference genome compared to genetic differentiation. Population genomic analyses of artificial selection in domestication and SV hotspot regions varied across reference genomes. Notably, the gene enrichment analyses showed significant differences in the top enriched biological processes depending on the reference genome used. Overall, the mango pangenome outperformed the other reference genomes across various metrics, followed by T2T reference genomes, as they captured greater diversity and effectively reduced reference bias. Our findings highlight the role of the mango pangenome in reducing reference bias and underscore the critical role of reference genome selection, suggesting that it is one of the most important factors in population genomic studies.