Mutated Peptide Generator

The Mutated Peptide Generator tool will take a SNPEff-annotated VCF as input and generate predicted neo-peptides and the reference / WT peptides with which they pair.

VCF Input Files

This tool accepts VCF files as input. We list a few recommended steps to take with input VCFs before running this tool.

The last line of the VCF headers should looks similar to:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  {SAMPLE1}   {SAMPLE2}  ... {SAMPLEN}

The ‘SAMPLE’ columns are not required.

Parameter Selection

MPG Parameters

  • Peptide Length

    • The number of amino acids in the peptide

  • Peptide 1 and 2 Mutation Position

    • Position in the peptide where the mutation should be located. By default, only one peptide per variant will be created. If you wish to create an additional peptide for a given variant with the SNP at a different position, fill in the ‘Peptide 2 Mutation Position’ field.

  • Frameshift Overlap

    • Frameshift mutations often result in a relatively long stretch of amino acids that are different from the reference. This tool will break that long stretch into overlapping peptides with an overlap length specified by this parameter.

  • Maximum Peptide Length

    • There are several instances where creating a peptide longer than the ‘Peptide Length’ parameter may be desirable. For instance, in-frame insertions of a few amino acids might require extending the peptide to keep the C termini of the reference and mutant peptides aligned. Additionally, if a frameshift results in a mutated sequence that is only several amino acids longer than the ‘peptide length’, it might be desirable to create a longer peptide rather than break it up into highly overlapping peptides. Finally, when a variant is near the termini and multiple positions are selected, simply generating a single longer peptide rather than multiple, short, highly-overlapping peptides may be desirable.

  • Reference Genome

    • Options include GRCh38 (default), GRCh37, and GRCm38/mm10. The selected genome should match the genome used to generate the VCF file.

  • run SNPeff annotation

    • If checked, SNPEff will be executed against the VCF file before running through the peptide generation tool. We recommend that users take care of this step themselves before uploading the VCF, as it can be time-consuming.

Results

Three tables are included in the output:

  • SNPs - One row per SNP.

  • Peptide - One row per SNP and affected transcript.

  • Unique Peptide - One row per SNP and unique peptide. Peptides that can be produced by multiple transcripts will be collapsed in this output and a representative transcript is selected.

Each of the outputs contains many columns, which are described in detail below.

Column Definitions

output table column definition example1 example2
all chr chromosome chr1 chr19
all position chromosomal position of mutation 49045703 6477239
all reference nucleotide reference nucleotide C A
all mutated nucleotide mutant nucleotide T AG
all mutation effect predicted mutation effect (e.g., missense_variant, frameshift_variant, inframe_insertion, inframe_deletion, etc.) missense_variant frameshift_variant
all gene name HGNC gene symbol AGBL4 DENND1C
all Ensembl gene accession Ensembl gene identifier ENSG00000186094 ENSG00000205744
all Ensemble transcript accession Ensembl transcript identifier ENST00000416121 ENST00000381480
all reference aa reference amino acid Asp Thr
all mutated aa mutated amino acid Asn fs
all protein position mutation position in protein 4/298 164/801
all variant id an internal unique identifier assigned to each variant 11089 116919
all mutation impact SNPEff-predicted variant impace LOW/MODERATE/HIGH. MODERATE HIGH
all transcript biotype a classification of the transcript type. These will include protein_coding, the different IG _ and TR _ types, as well as nonsense_mediated_decay, non_stop_decay, pseudogene, etc. protein_coding protein_coding
all transcript mutation code mutation in hgvs format (nucleotide level) with coordinates based on the transcript c.10G>A c.491dupC
all protein mutation code mutation in hgvs format (amino acid level) with coordinates based on the protein p.Asp4Asn p.Thr165fs
all cdna position mutation position in cdna 12/3938 604/2816
all cds position mutation position in cds 10/897 491/2406
SNP peptide pairs list of reference-mutant peptide pairs derived from this SNP, along with peptide mutation position. Corresopnding peptides will be found in the peptide output tables. [('REEDIYQFAYCYPYTYTRFQ', 'REENIYQFAYCYPYTYTRFQ', 4, []), ('REEDIYQFAYCYPYTYTRFQ', 'REENIYQFAYCYPYTYTRFQ', 4, [])] [('LGSGVTVSSGQGIPPPTRGN', 'LGSGVTVSSGQGIPPPYPGE', 17, []), ('LGSGVTVSSGQGIPPPTRGN', 'LGSGVTVSSGQGIPPPYPGE', 17, [])]
SNP peptide warnings warnings from peptide generation for each variant. This will include all warnings for successfully generated peptides, as well as warnings for variants where peptides could not be generated protein sequence start with X, mutation position in peptide not desired because the mutation is near the start codon Reached end of frameshift mutation position may vary
peptide peptide pair id A serial number for peptide pairs in the peptide-output table.
peptide transcript reference allele reference allele (nucleotide) decoded from hgvs_dna C C
peptide transcript mutant allele tumor allele (nucleotide) decoded from hgvs_dna T CC
peptide reference peptide reference peptide with requested PEPTIDELENGTH REEDIYQFAYCYPYTYTRFQ LGSGVTVSSGQGIPPPTRGN
peptide mutated peptide mutant peptide with requested PEPTIDELENGTH REENIYQFAYCYPYTYTRFQ LGSGVTVSSGQGIPPPYPGE
peptide peptide mutation position peptide mutation position 4 17
peptide strand transcript strand: 1 for sense and -1 for anti-sense strand -1 -1
peptide warnings any warnings associated with peptide generation for each reference-mutant peptide pair protein sequence start with X, mutation position in peptide not desired because the mutation is near the start codon Reached end of frameshift mutation position may vary