T cell class I

The T cell class I tool combines MHC class I binding, TAP processing, and immunogenicity tools into one application. This allows users to easily run the individual predictors on the same input dataset and compile the results into one table. Below, we describe all of the available methods.

MHC-binding predictions

The tools in this group take amino acid sequences and MHC alleles as input to predict the strength of the peptide: MHC interaction. Depending upon the tool selected, the main output will be either an IC50 value (lower indicates stronger predicted binding) or a binding score. To make comparisons between alleles and methods more standardized, a percentile rank score is also returned where lower indicates stronger binding. The percentile rank is the fraction of peptides drawn randomly from Uniprot that would bind as well as or better than the current peptide.

Methods

  • NetMHCpan EL 4.1 (IEDB recommended epitope predictor 2023.09)

    • Predicts elution of peptides from MHC molecules using artificial neural networks (ANNs). The EL data covers 177 MHC molecules from human (HLA-A, B, C, E), mouse (H-2), cattle (BoLA), primates (Patr, Mamu, Gogo), swine (SLA), equine (Eqca) and dog (DLA).
      [PMID: 32406916]

  • NetMHCpan BA 4.1 (IEDB recommended binding predictor 2023.09)

    • Predicts binding of peptides to MHC molecules using artificial neural networks (ANNs). The BA data covers 170 MHC molecules from human (HLA-A, B, C, E), mouse (H-2), cattle (BoLA), primates (Patr, Mamu, Gogo), swine (SLA) and equine (Eqca).
      [PMID: 32406916]

  • Consensus (legacy)

    • Selecting this method will cause up to 3 different algorithms to be run (ANN, SMM, and/or Comblib_Sidney2008), depending upon which methods are available for the selected allele. [PMID: 16767078]

  • ANN 4.0

    • Artificial Neural Networks (ANN) are capable of performing sensitive, quantitative predictions of peptide binding to the MHC class I molecule.
      [PMID: 26515819]

  • SMM 1.0

    • Predicts peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences.
      [PMID: 15927070]

  • SMMPMBEC 1.0

    • MHCI binding prediction method that uses amino acid similarity matrix (PMBEC) as a Bayesian prior.
      [PMID: 19948066]

  • Comblib_Sidney2008 1.0

    • The positional scanning combinatorial library approach for describing MHC class I binding specificity and identifying high-affinity binding peptides.
      [PMID: 18221540]

  • MHCFlurry 2.0

    • Predicts MHC class I presentation that combines new models for MHC class I binding and antigen processing.
      [PMID: 18221540]

  • MHC-NP

    • Predicts peptides naturally processed by the MHC Class I pathway (“eluted peptides”) for each target MHC molecule.
      [PMID: 18221540]

Parameter selection

By default, a peptide length of 9 and the HLA-A*02:01 allele are selected. However, by using the peptide length slider and MHC Allele(s) autocomplete, many combinations of peptide lengths and alleles may be submitted simultaneously. Note that additional lengths and alleles will result in predictions taking longer.

  • Peptide Length

    • Peptide length can be selected using the slider, which ranges from 8 to 15. Upon submission, protein sequences in the Input Sequence box will be broken up into overlapping peptides of the selected lengths.

    • as-is - The ‘as-is’ checkbox disables the peptide length slider and results in the sequences from the Input Sequence box being sent directly to the predictors without first breaking up into smaller sequences. This option is recommended to be used when peptides are submitted for prediction as opposed to full-length proteins.

  • MHC Allele(s)

    • Typing into the MHC Allele(s) text box will autocomplete based on official allele names and synonyms from the MHC Restriction Ontology (MRO). As alleles are selected, they will appear as ‘chips’ in the box immediately below. Alleles can be deselected by clicking on the ‘x’ inside the chips.

    • The Allele Finder provides some additional controls, including selecting the HLA allele reference panel of 27 alleles as shown below.

    Sequence input

Thresholds and interpreting scores

The IEDB currently recommends using the percentile rank as the metric for ranking binding predictions. A percentile rank of <= 1% has been demonstrated to cover 80% of the immune response for many alleles. For more information on selecting thresholds, please consult these guidelines.

Once the prediction is completed, an output table will be displayed similar to the one shown below. Prediction Output Table

Each row in this table corresponds to one peptide and allele combination. Several columns will always be present, including the seq #, peptide, peptide length, start, end, and allele. Any additional columns returned will depend upon the binding methods that were selected. A description of each field can be found by clicking on the ‘Display Columns’ button.

When the output is returned as IC50 values, a lower number indicates higher affinity. As a rough guideline, peptides with IC50 values <50 nM are considered high affinity, <500 nM intermediate affinity and <5000 nM low affinity. Most known epitopes have a high or intermediate affinity. Some epitopes have a low affinity, but no known T-cell epitope has an IC50 value greater than 5000.

While the output of the predictions is quantitative, there are systematic deviations from experimental IC50 values. For example, the makeup of the training data and the prediction methods used have a non-trivial impact on the range of predicted IC50 values.

In addition to the predicted IC50 values or scores, a percentile rank is generated by comparing the peptide’s IC50 against those of a set of random peptides from the UniProt/Swiss-Prot database. The percentile rank is the fraction of peptides drawn randomly from UniProt that would bind as well as or better than the current peptide. Therefore, a low percentile rank indicates high affinity.

When more than one binding method is selected, the median percentile rank of the methods used is also reported.

Percentile rank dataset

To establish percentile ranks, the complete ”Reviewed (Swiss-Prot)” dataset was downloaded on 10/29/2018. The file contained 558,712 sequences. After filtering for length (minimum 50aa) and predictability (valid amino acids), 544,147 sequences remained. Of those, 10,000 were selected at random as a source of peptides. One peptide of each length (from 8-15) from a random location in each of these protein sequences was drawn to serve as the final background dataset, which is made available here.


Immunogenicity predictions

This tool uses amino acid properties as well as their position within the peptide to predict the immunogenicity of a peptide: MHC (pMHC) complex.

Parameter selection

  • Sequences

    • Ideally, peptides should be the same length, presented on the same HLA class I molecule, and 9-mers.

      • When longer peptides are provided, the extra amino acids will be evaluated as if they are inserted after position 5, and as if they have the same properties as an amino acid at position 5 in terms of weighting and masking.

  • Masking Position

    • This will mask designated positions from the immunogenicity score. The masked positions are dependent on the HLA molecule on which the peptide is presented. As a default, the first, second, and C-terminal positions are masked. For most common HLA class I molecules the default masking scheme can be selected. Additional anchor positions are used for some HLA molecules, which can be used by selecting ‘Allele Specific’. In some cases, a custom mask may be desired and can be entered by selecting ‘Custom’ and inputting comma-separated numbers.

NOTE: The custom masking option will only work when peptides of the same length are provided.

NOTE: This method has only been validated on 9mers, though predictions may be made for any length.

Thresholds and interpreting scores

Immuno Prediction Output Table

Running an Immunogenicity prediction will add a column called immunogenicity score to the output. Scores greater than 0 indicate the peptide/allele combination is more likely than not to elicit an immune response, while a score less than 0 indicates the inverse.


MHC-I processing pathway predictions

Methods

  • Basic

    • A stabilized matrix method (SMM) algorithm for predicting TAP transport efficiency from the amino acid sequence.
      [PMID: 12902473]

  • NetChop

    • Produces neural network predictions for cleavage sites of the human proteasome.
      [PMID: 15744535]

  • NetCTL

  • NetCTLPan

    • An update to the original NetCTL that allows for prediction of CTL epitope with restriction to any MHC molecules of a known protein sequence.
      [PMID: 20379710]

Parameter selection

Basic

TAP Processing Parameter Basic

  • MHC-I Binding Methods

    • Select the MHC-I binding method that will be used to calculate an IC50 value for incorporation into the combined processing score.

  • Proteosome Cleavage

    • There are two types of proteasomes, the constitutively expressed ‘house-keeping’ type, and immunoproteasomes that are induced by IFN-γ secretion. The latter is thought to increase the efficiency of antigen presentation. If you are unsure, select the immunoproteasome type to make a prediction. The predictions are based on in vitro proteasomal digests of the enolase and casein proteins as described here.

  • Max Precursor Extension

    • The maximum number of additional amino acids to consider at the N terminus when calculating the TAP score for a given peptide.

  • Alpha Factor

    • The factor by which to down-weight the N terminus score with respect to the C terminus. A default of 0.2 worked best in the original publication.

NOTE: The proteasome and TAP predictions were developed using experimental data for human versions of the molecule. At least for TAP molecules, there are known to be some species-dependent differences in specificity. Therefore, using these predictions for epitope processing in non-human cells should only be done with extra caution in interpreting results.

NetChop

TAP Processing Parameter NetChop

  • Network Method

    • NetChop was trained on two different datasets:

      • C Term 3.0: A dataset of MHC Class I ligands from 188 human proteins.

      • 20S 30: In vivo degradation data by human 20S constitutive proteasome for two proteins, enolase and β-casein.

  • Threshold

    • Threshold for plotting prediction scores. This threshold can be adjusted after predictions are complete.

NetCTL

TAP Processing Parameter NetCTL

  • Weight on C terminal cleavage

    • Relative weight on proteasomal cleavage.

  • Weight on TAP transport efficiency

    • Relative weight on TAP transport efficiency predicted using the weight matrix-based method described by Peters et al., 2003.

  • Threshold

    • Threshold for plotting prediction scores. This threshold can be adjusted after predictions are complete.

NetCTLPan

TAP Processing Parameter NetCTLPan

  • Weight on C terminal cleavage

    • Relative weight on proteasomal cleavage.

  • Weight on TAP transport efficiency

    • Relative weight on TAP transport efficiency predicted using the weight matrix-based method described by Peters et al., 2003.

  • Percentile Rank Threshold

    • Percentile threshold for plotting prediction scores. This threshold can be adjusted after predictions are complete.

[//]: TODO: add a section on Thresholds and interpreting scores

Results table and plots

NetChop

A NetChop prediction result will include the tabular and graphical output. The table will include the NetChop Prediction Score and look similar to: NetChop Result Table

The plot will be displayed on the ‘Processing Plots’ tab and will show the NetChop score vs. residue position. Scores that are greater than the selected threshold will be displayed in green, while those lower than the threshold will be in red.

NetCHOP Result Graph

NetCTL

NetCTL will add several columns to the output table, which will look similar to:

NetCTL Result Table

The main metric is the NetCTL Predictions score where higher indicates a stronger probability of being a naturally processed epitope. Descriptions of the additional columns can be found by clicking on ‘Display Columns’.

The plot will be displayed on the ‘Processing Plots’ tab and will show the NetCTL score vs. residue position. Scores that are greater than the selected threshold will be displayed in green, while those lower than the threshold will be in red.

NetCTL Result Graph

NetCTLPan

NetCTLpan will add several columns to the output table, which will look similar to:

NetCTLPan Result Table

The main outputs of concern are the NetCTLpan combined score and NetCTLpan percentile rank. Descriptions of all columns may be found by clicking on ‘Display Columns’.

The plot will be displayed on the ‘Processing Plots’ tab and will show the NetCTLpan score vs. residue position. Scores that are greater than the selected threshold will be displayed in green, while those lower than the threshold will be in red.

NetCTLPan Result Graph