Mutant Peptide Generator

The Mutant Peptide Generator tool will take a two-sample, SNPEff-annotated VCF as input and generate predicted neo-peptides and the reference / WT peptides with which they pair.

VCF input files

This tool accepts VCF files as input. The VCF files must meet several specific requirements listed below. We also list a few recommended steps to take with input VCFs before running this tool. The requirements and recommendations are listed below in the order in which they should be met / applied.

Requirement: Two-sample VCF

The VCF file must be a two-sample VCF with the normal/healthy sample in the first set of columns and the tumor sample in the second set. The headers should looks similar to:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  {NORMAL_SAMPLE}       {TUMOR_SAMPLE}

Here, NORMAL_SAMPLE and TUMOR_SAMPLE should be replaced with the corresponding sample names.

Requirement: SNPEff annotation

Currently, only VCFs that have been annotated with SNPEff are supported. In the future, SNPeff annotation may be integrated into this tool.

In order to run SNPEff, you will need to prepare a small file that includes the name of the ‘normal’ and ‘tumor’ samples as they appear in the VCF file. In the command below, this is our ‘in.samples’ file. Here is the recommended command, for annotation with GRCh38.86 reference genome:

echo -e "normal_sample\ttumor_sample" > in.samples
zcat < infile.stripped.decomposed.normalized.vcf.gz \
| java -Xmx16G -jar snpEff.jar -cancer -cancerSamples in.samples GRCh38.86 \
| > infile.ann.vcf

Parameter selection

MPG Parameters

  • Peptide Length

    • The number of amino acids in the peptide

  • Peptide 1 and 2 Mutation Position

    • Position in the peptide where the mutation should be located. By default, only one peptide per variant will be created. If you wish to create an additional peptide for a given variant with the SNP at a different position, fill in the ‘Peptide 2 Mutation Position’ field.

  • Frameshift Overlap

    • Frameshift mutations often result in a relatively long stretch of amino acids that are different from the reference. This tool will break that long stretch into overlapping peptides with an overlap length specified by this parameter.

  • Maximum Peptide Length

    • There are several instances where creating a peptide longer than the ‘Peptide Length’ parameter may be desirable. For instance, in-frame insertions of a few amino acids might require extending the peptide to keep the C termini of the reference and mutant peptides aligned. Additionally, if a frameshift results in a mutated sequence that is only several amino acids longer than the ‘peptide length’, it might be desirable to create a longer peptide rather than break it up into highly overlapping peptides. Finally, when a variant is near the termini and multiple positions are selected, simply generating a single longer peptide rather than multiple, short, highly-overlapping peptides may be desirable.

Results

Three tables are included in the output:

  • SNPs - One row per SNP.

  • Peptide - One row per SNP and affected transcript.

  • Unique Peptide - One row per SNP and unique peptide. Peptides that can be produced by multiple transcripts will be collapsed in this output and a representative transcript is selected.

Each of the outputs contains many columns, which are described in detail below.

Column definitions

output table column definition example1 example2
all chr chromosome chr1 chr19
all position chromosomal position of mutation 49045703 6477239
all reference nucleotide reference nucleotide C A
all mutated nucleotide mutant nucleotide T AG
all mutation effect predicted mutation effect (e.g., missense_variant, frameshift_variant, inframe_insertion, inframe_deletion, etc.) missense_variant frameshift_variant
all gene name HGNC gene symbol AGBL4 DENND1C
all Ensembl gene accession Ensembl gene identifier ENSG00000186094 ENSG00000205744
all Ensemble transcript accession Ensembl transcript identifier ENST00000416121 ENST00000381480
all reference aa reference amino acid Asp Thr
all mutated aa mutated amino acid Asn fs
all protein position mutation position in protein 4/298 164/801
all variant id an internal unique identifier assigned to each variant 11089 116919
all mutation impact SNPEff-predicted variant impace LOW/MODERATE/HIGH. MODERATE HIGH
all transcript biotype a classification of the transcript type. These will include protein_coding, the different IG _ and TR _ types, as well as nonsense_mediated_decay, non_stop_decay, pseudogene, etc. protein_coding protein_coding
all transcript mutation code mutation in hgvs format (nucleotide level) with coordinates based on the transcript c.10G>A c.491dupC
all protein mutation code mutation in hgvs format (amino acid level) with coordinates based on the protein p.Asp4Asn p.Thr165fs
all cdna position mutation position in cdna 12/3938 604/2816
all cds position mutation position in cds 10/897 491/2406
SNP peptide pairs list of reference-mutant peptide pairs derived from this SNP, along with peptide mutation position. Corresopnding peptides will be found in the peptide output tables. [('REEDIYQFAYCYPYTYTRFQ', 'REENIYQFAYCYPYTYTRFQ', 4, []), ('REEDIYQFAYCYPYTYTRFQ', 'REENIYQFAYCYPYTYTRFQ', 4, [])] [('LGSGVTVSSGQGIPPPTRGN', 'LGSGVTVSSGQGIPPPYPGE', 17, []), ('LGSGVTVSSGQGIPPPTRGN', 'LGSGVTVSSGQGIPPPYPGE', 17, [])]
SNP peptide warnings warnings from peptide generation for each variant. This will include all warnings for successfully generated peptides, as well as warnings for variants where peptides could not be generated protein sequence start with X, mutation position in peptide not desired because the mutation is near the start codon Reached end of frameshift mutation position may vary
peptide peptide pair id A serial number for peptide pairs in the peptide-output table.
peptide transcript reference allele reference allele (nucleotide) decoded from hgvs_dna C C
peptide transcript mutant allele tumor allele (nucleotide) decoded from hgvs_dna T CC
peptide reference peptide reference peptide with requested PEPTIDELENGTH REEDIYQFAYCYPYTYTRFQ LGSGVTVSSGQGIPPPTRGN
peptide mutated peptide mutant peptide with requested PEPTIDELENGTH REENIYQFAYCYPYTYTRFQ LGSGVTVSSGQGIPPPYPGE
peptide peptide mutation position peptide mutation position 4 17
peptide strand transcript strand: 1 for sense and -1 for anti-sense strand -1 -1
peptide warnings any warnings associated with peptide generation for each reference-mutant peptide pair protein sequence start with X, mutation position in peptide not desired because the mutation is near the start codon Reached end of frameshift mutation position may vary