TEsorter: An accurate and fast method to classify LTR-retrotransposons in plant genomes
Ren-Gang Zhang1,2,,† , Guang-Yuan Li2,† , Xiao-Ling Wang3 , Jacques Dainat4 , Zhao-Xuan Wang5 and Shujun Ou6, , Yongpeng Ma,1,
1Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China 2Department of Bioinformatics, Ori (Shandong) Gene Science and Technology Co., Ltd., Weifang, Shandong 261322, China 3BGI-Shenzhen, Shenzhen 518083, China 4Department of Medical Biochemistry and Microbiology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala, Sweden 5Shijiazhuang People’s Medical College, Shijiazhuang, Hebei 050091, China 6Department of Ecology, Evolution, and Organismal Biology (EEOB), Iowa State University, Ames, IA 50010, USA *Corresponding author. E-mail: zhangrengang@ori-gene.cn,oushujun@iastate.edu,mayongpeng@mail.kib.ac.cn †Both authors contributed equally to the study.
Received: 07 Oct 2021 Accepted: 23 Dec 2021 Published online: 19 Feb 2022
Abstract
Dear Editor,
Transposable elements (TEs) constitute the largest portion of repetitive sequences in many eukaryotic genomes, with long terminal repeat retrotransposons (LTR-RTs) being predominant in plant genomes. Various tools have been developed for the identification and classification of TEs, including RepeatModeler [1], REPET [2], LTR_retriever (https://github.com/oushujun/LTR_retriever), and TERL (https://github.com/muriloHoracio/TERL). To our knowledge, most existing software can only classify TEs to the superfamily level, in particular the LTR-RT Copia and Gyspy superfamilies in plants, leaving a significant knowledge gap. Moreover, although approaches for automated classification of LTR lineages using amino acid hidden Markov models (HMMs) do exist, these are typically comprised of collections of scripts that are not curated or specifically designed to be user-friendly.