Jump to main content
US EPA
United States Environmental Protection Agency
Search
Search
Main menu
Environmental Topics
Laws & Regulations
About EPA
Health & Environmental Research Online (HERO)
Contact Us
Print
Feedback
Export to File
Search:
This record has one attached file:
Add More Files
Attach File(s):
Display Name for File*:
Save
Citation
Tags
HERO ID
7214345
Reference Type
Journal Article
Title
Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences
Author(s)
Smakaj, E; Olson, B; Reddy, Sai; Greiff, V; Truck, J; Marquez, S; Corretto, E; Antonielli, L; Sessitsch, M; Hoefer, ChristophC; Briney, M; Tosoni, S; Galli, K; Grobelsek, G; D'Angelo, I; ,
Year
2020
Is Peer Reviewed?
1
Journal
Bioinformatics
ISSN:
1367-4803
EISSN:
1367-4811
Publisher
OXFORD UNIV PRESS
Location
OXFORD
Page Numbers
1731-1739
PMID
31873728
DOI
10.1093/bioinformatics/btz845
Web of Science Id
WOS:000538696800012
Abstract
A Summary: Antibody repertoires reveal insights into the biology of the adaptive immune system and empower diagnostics and therapeutics. There are currently multiple tools available for the annotation of antibody sequences. All downstream analyses such as choosing lead drug candidates depend on the correct annotation of these sequences; however, a thorough comparison of the performance of these tools has not been investigated. Here, we benchmark the performance of commonly used immunoinformatic tools, i.e. IMGT/HighV-QUEST, IgBLAST and MiXCR, in terms of reproducibility of annotation output, accuracy and speed using simulated and experimental high-throughput sequencing datasets.We analyzed changes in IMGT reference germline database in the last 10 years in order to assess the reproducibility of the annotation output. We found that only 73/183 (40%) V, D and J human genes were shared between the reference germline sets used by the tools. We found that the annotation results differed between tools. In terms of alignment accuracy, MiXCR had the highest average frequency of gene mishits, 0.02 mishit frequency and IgBLAST the lowest, 0.004 mishit frequency. Reproducibility in the output of complementarity determining three regions (CDR3 amino acids) ranged from 4.3% to 77.6% with preprocessed data. In addition, run time of the tools was assessed: MiXCR was the fastest tool for number of sequences processed per unit of time. These results indicate that immunoinformatic analyses greatly depend on the choice of bioinformatics tool. Our results support informed decision-making to immunoinformaticians based on repertoire composition and sequencing platforms.
Home
Learn about HERO
Using HERO
Search HERO
Projects in HERO
Risk Assessment
Transparency & Integrity