Our newest annotation analysis pipeline allows you to submit peptide sequences for annotation, analysis and clustering. The sequences you submit can be nucleotide or protein sequences of any kind, including non-antibody sequences. This article outlines all of the main options available in the Peptide Annotator.
- What is the Peptide Annotator?
- How do I run Antibody Annotator?
- Saving different settings as Profiles
What is the Peptide Annotator?
The Peptide Annotator annotates and clusters your input query sequences. You can choose to annotate the query sequences without a reference, or you can use a known template sequence as your reference - see Understanding Reference Databases.
The Peptide Annotator accepts both protein and nucleotide sequences of any nature - it is not dependent on a specific molecule or protein type. Some examples of sequences may include:
- Panning rounds of peptides 5-40+ amino acids long
- Domains/regions of proteins
- Lists of HCDR3 sequences
In addition to Clustering your input sequences, the Peptide Annotator also offers deeper analysis capabilities:
- Visualisation and recording of variants present compared to the reference database (if a reference database is used)
- Identification of sequence-based liabilities and a liability and sequence quality scoring system
- Custom clustering options
- A rich visualisation suite including a sequence explorer, alignment views, and a broad range of graphs to help you gain further observations from your data
To view a tutorial demonstrating the use of the Peptide Annotator, see Peptide Tutorial 1. Phage Display Libraries.
How do I run Peptide Annotator?
To run the Peptide Annotator, select a file in your folder and go to Annotation > Peptide Annotator from the dropdown:
To start the Peptide Annotator operation, adjust your options in the pop-up as desired and then click Run. This operation will output a Biologics Annotator Result file.
The following sections outline the steps to successfully carry out an analysis using the Peptide Annotator and how each section and option works.
The Peptide Annotator supports using no reference database, or using a General Template Reference Database. See General Template Databases to easily make your own.
- Multiple reference databases can be selected.
- This option will be automatically filled in if you used Batch Assemble Sanger Sequences with a Name Scheme. Name schemes are highly customisable, and allow you to use information contained within the sequence names (eg. well or donor/sample) to classify sequences and groups of sequences, or pull out this information into columns in your result.
Annotate variants from reference database
- To annotate and see the differences between your input sequences and reference sequences, select this option. The nucleotide and amino acid differences relative to your reference database(s) will be annotated on your input sequences and recorded in the All Sequences Table. For more information about viewing these differences and what they mean, see this article.
Calculate protein statistics
- This will calculate the Molecular Weight (kDa), the Isoelectric point, the charge at pH 7 and the Extinction Coefficient across the Full Sequence (no reference database) or the Template Region (when using a reference database).
- If a full Template Region can not be found these values will not be calculated.
Find liabilities and assets:
- To search and score amino acid or nucleotide motifs associated with deleterious post-translational modifications or any type of reduced antibody function or desirable motifs, select this option. The Peptide Annotator pipeline has a default set of sequence liability checks. These can also be customized, see how to Customize Sequence Liabilities and Assets.
Clustering provides a way to group your sequences based on shared identity/similarity across your sequences. To learn more about clustering and how it can help with interpreting your dataset, see Understanding "Clusters".
Several default clusters will already be listed, and further clusters can be added using the blue "Plus" sign as seen above.
It is possible to cluster up to six regions together based on shared identity across sequences in the regions selected. It is also possible to allow mismatches across a region and to cluster based on amino acid similarity. To learn more about configuring this option, please refer to Clustering Options.
Sequencing data can contain low quality sequences and noise. In order to improve the meaningfulness of clusters in your results, select one or more of the following options:
- Only cluster results with asset and liability score of at least - This will cluster the sequences based on the score specified. For example, if you specify a score of -1000, only sequences that have a liability and asset score of -1000 or more will be included in the clusters.
Only cluster results which are - This will cluster the sequences that are either: Fully annotated, Fully annotated and In Frame, or Fully annotated, In Frame, and Without Stop Codons. For example, if you chose to cluster Fully annotated and In Frame sequences, only sequences that meet the specifications of being fully annotated and in frame will be clustered, sequences that are not fully annotated or have frameshifts will not be included in the clustering operation.
- The Genetic code dropdown allows you to select the genetic code to use for translating nucleotide sequences. The codes are obtained directly from NCBI. One additional Genetic code, "Amber readthrough" allows certain stop codons not to be treated as stop codons during translation.
Record equal reference sequence match as:
- Each sequence with partial frequency - This will assign the query sequence to all matching references with partial frequency. Based on the example of a query sequence matching to two unique reference sequences, the query sequence will add 0.5 to the total count for both Reference-1 and Reference-2.
- Groups of sequences - This will create a separate entry in the list of reference matches that represents this combination of references sequences. Based on the example of a query sequence matching two references: Reference-1 and Reference-2, the query sequence will contribute 0 towards the total for each of Reference-1 and Reference-2, and instead add 1 to the total for a reference called "Reference-1/Reference-2".
- Unknown - This will treat this as an unknown match. Based on the example of a query sequence matching two references equally: Reference-1 and Reference-2, the query sequence will add nothing to the totals of Reference-1 and Reference-2.
Trim each side of fully annotated region if over:
- This setting trims off extra bases on either side of the sequence region of interest. The default is 10, which means that your annotated sequence will have 10 base pairs flanking the 5' and 3' ends of the sequence.
Note that trimming only applies to fully annotated sequences whereby sequences that are classified as not fully annotated by the Peptide Annotator operation are left untrimmed.
Saving different settings as Profiles
Geneious Biologics allows you to save Profiles which can be used to record and re-run alternative settings depending on the dataset. This means that you can specify custom sequence liabilities, custom clusters and other settings depending on what dataset you are working with.
Profiles can be saved and applied at the bottom of all our Annotation analysis pipelines:
Like any other Biologics Annotator Result document, you can also:
- Filter your Sequences
- Go to the Graphs Tab to view visualisations of your results like Sequence Logos
- Perform Sequence Alignments
- View the "Clusters" in your dataset
- Add new Clusters to your Results
- Subset your sequences and re-calculate clusters
- Add Assay Data to your Analysis Results
- Compare Results across Multiple Experiments to monitor enrichment across panning rounds or to identify sequences present across multiple datasets.
- Edit your Sequences