TY - JOUR
T1 - Resolving unknown nucleotides in the IPD-IMGT/HLA database by extended and full-length sequencing of HLA class I and II alleles
AU - Voorter, Christina E.M.
AU - Groeneweg, Mathijs
AU - Olieslagers, Timo I.
AU - Fae, Ingrid
AU - Fischer, Gottfried F.
AU - Andreani, Marco
AU - Troiano, Maria
AU - Vidan-Jeras, Blanka
AU - Montanic, Sendi
AU - Hepkema, Bouke G.
AU - Bungener, Laura B.
AU - Tilanus, Marcel G.J.
AU - Wieten, Lotte
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/4
Y1 - 2024/4
N2 - In the past, identification of HLA alleles was limited to sequencing the region of the gene coding for the peptide binding groove, resulting in a lack of sequence information in the HLA database, challenging HLA allele assignment software programs. We investigated full-length sequences of 19 HLA class I and 7 HLA class II alleles, and we extended another 47 HLA class I alleles with sequences of 5′ and 3′ UTR regions that were all not yet available in the IPD-IMGT/HLA database. We resolved 8638 unknown nucleotides in the coding sequence of HLA class I and 2139 of HLA class II. Furthermore, with full-length sequencing of the 26 alleles, more than 90 kb of sequence information was added to the non-coding sequences, whereas extension of the 47 alleles resulted in the addition of 5.5 kb unknown nucleotides to the 5′ UTR and > 31.7 kb to the 3′ UTR region. With this information, some interesting features were observed, like possible recombination events and lineage evolutionary origins. The continuing increase in the availability of full-length sequences in the HLA database will enable the identification of the evolutionary origin and will help the community to improve the alignment and assignment accuracy of HLA alleles.
AB - In the past, identification of HLA alleles was limited to sequencing the region of the gene coding for the peptide binding groove, resulting in a lack of sequence information in the HLA database, challenging HLA allele assignment software programs. We investigated full-length sequences of 19 HLA class I and 7 HLA class II alleles, and we extended another 47 HLA class I alleles with sequences of 5′ and 3′ UTR regions that were all not yet available in the IPD-IMGT/HLA database. We resolved 8638 unknown nucleotides in the coding sequence of HLA class I and 2139 of HLA class II. Furthermore, with full-length sequencing of the 26 alleles, more than 90 kb of sequence information was added to the non-coding sequences, whereas extension of the 47 alleles resulted in the addition of 5.5 kb unknown nucleotides to the 5′ UTR and > 31.7 kb to the 3′ UTR region. With this information, some interesting features were observed, like possible recombination events and lineage evolutionary origins. The continuing increase in the availability of full-length sequences in the HLA database will enable the identification of the evolutionary origin and will help the community to improve the alignment and assignment accuracy of HLA alleles.
KW - Extended sequences
KW - Full-length sequencing
KW - Group-specific Sanger sequencing
KW - Human leucocyte antigen
KW - NGS
UR - http://www.scopus.com/inward/record.url?scp=85187245193&partnerID=8YFLogxK
U2 - 10.1007/s00251-024-01333-z
DO - 10.1007/s00251-024-01333-z
M3 - Article
C2 - 38400869
AN - SCOPUS:85187245193
SN - 0093-7711
VL - 76
SP - 109
EP - 121
JO - Immunogenetics
JF - Immunogenetics
ER -