Peptide Tutorial 1. Phage Display Libraries

January 14, 2026 00:25
Updated

In this tutorial, you will learn how to annotate and cluster short peptide sequences and compare panning rounds over an experiment. See our article on the Peptide Annotator to learn more about what kinds of datasets you can use this analysis pipeline for.

The peptide annotator is an agnostic tool; if you are interested in antibody analysis, see our Getting Started page for antibody-specific tutorials.

This tutorial will cover the following exercises:

Sequence annotation
Viewing the results
1. Adding Clusters
2. Graphs
Comparing panning rounds
1. Viewing the comparison results

Get Started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organization may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics. Note that the data and images used in this tutorial are obtained from this research article.

Sequence Annotation

In this exercise, you will learn how to analyze sequences from multiple Illumina MiSeq biopanning libraries. These sequence libraries have already been trimmed to full regions that are 36 nucleotides long (12 amino acids). To annotate these short peptide sequences, select the pan1_SRR2050319 document in the Input data folder and click Annotation > Peptide Antibody Annotator

Select the following options from the Peptide Annotator dialog box and click Run to start the analysis (see sections and image below).

Main Options

Select the following options:

Reference database(s): No database
Name Scheme: None
Handle Input Sequences: Collapse duplicates (e.g. NGS)

peptide anno mainoptions.png

Collapsing and Filtering Options

Leave defaults - Collapse sequences at least: 100% identical

Only keep reads longer than: 30 bp

Analysis Options

Leave all defaults

Clustering Options

Leave all defaults

Advanced Options

Leave all defaults

Click Run to start the analysis. This operation will produce a pan1_SRR2050319 Annotated & Clustered Biologics Annotator Result document (also available in the Sequence Annotation folder).

Repeat this step with the other libraries (pan2-4). Ultimately, this will result in 4 individual Biologics Annotator Result documents (one per library).

**Note that you will have to run each panning round individually as running them all at one go will result in the loss of library categorization as all the libraries will be analyzed as one document.

Viewing the results

To view a Peptide Annotation result, select one of the SRR2050319 Annotated & Clustered Biologics Annotator Result documents. This will bring up the info tab, which documents the analysis and also allows you to open the document:

peptide anno tut1.png

Clicking Open Full Document will open the result which consists of a tabular view with your sequences organised into rows - the Sequences Table. Clicking on a sequence will allow you to view the sequence in the Sequence Viewer below.

peptide anno tut 1-1.png

The Sequences Table contains details for each individual sequence such as the amino acid sequence and (if selected under Analysis Options) the reference database mismatches, score and protein statistics. On the other hand, the Sequence Viewer allows you to view the annotated sequences and search for motifs and annotations.

Other options for sequence analysis within a result include:

Filtering your results to pull out sequences that meet certain metrics you specify
Extract and Re-cluster to take a subset of sequences out of an existing Biologics Annotator Result Document and make a new document with re-calculated clusters
Adding Assay Data (ELISA values etc.) to further inform your results: Adding Assay Data to your Analysis Results
Aligning sequences to compare the amino acid diversity across a region or multiple regions: Sequence Alignment
Editing your Sequences to perform point mutations that might increase developability

Adding Clusters

Clustering is used to group sequences together based on shared identity, and allows you to view the counts of unique or related sequences. If you are unsure what a cluster is, see Understanding "Clusters".

To group together sequences that have a single amino acid mismatch, go to Post-processing > Add Clusters:

Screenshot 2025-02-25 at 4.34.56 PM.png

This will bring up the Add Clusters dialogue. Click the blue "+" sign to add a new cluster. Then, switch to the Advanced tab and make sure to select the region as "Full Sequence", Cluster Method
"Identity (by count)".

Screenshot 2025-02-25 at 4.37.14 PM.png

After adding the cluster, select Run.

See our Add Cluster page for more on these options.

Viewing the new Cluster

To view the new cluster, change the Cluster Table: dropdown to the new cluster:

Screenshot 2025-02-25 at 4.48.47 PM.png

When viewing this cluster table, we can see that the most common sequence (SGVYKVAYDWQH) also had 5 related sequences that had a single amino acid difference.

The Cluster Contents columns will list the top 100 related amino acid sequences and their percent abundance or count in the cluster, while the # Exact Clusters column will list the number of unique sequences in the cluster.

Screenshot 2025-02-25 at 4.55.29 PM.png

Graphs

When viewing the any of the tables, you can also switch to the Graphs tab in the Sequence Viewer panel, as shown below. Of particular interest is the the Cluster Similarity Network plot which will enable you to investigate the relationships between clusters of varying size/abundance. To view this, navigate to the Full Sequence cluster in the Cluster Table: dropdown and select the appropriate graph:

peptide anno cluster similarity network.png

Each node represents a cluster, and clusters that are more similar in terms of their sequence will be connected together on the network. The relative size of the nodes represents the # of Sequences.

You can learn more about this plot here: Network and Tree plots: Identifying clonotype and sequence relationships or learn more about our graphs here: Using Graphs to interpret Clusters and Clonotypes.

Comparing panning rounds

This dataset consists of four rounds of sequencing on a phage display library. To compare the relative frequencies of peptide sequences within each panning round to find enriched peptides, we can use Compare Results. Exit out of the result document and select all four results in the main folder, go to Post-processing and click Compare Results:

Screenshot 2024-02-12 at 5.12.54 PM.png

Select the following settings and click run:

Filtering
- Filter out sequences where the sum of counts for all samples is lower than: 5
Normalization
- Method: Total count
Additional Clustering
- Group similar sequences across all samples: ON
- Method: Identity-based clustering
- Threshold: 100%
- Region: Full Sequence
Experiment
- Reference sample: pan1_SRR2050319 Annotated & Clustered

To learn more about these settings and what they mean, see our main article Comparing Results across Multiple Experiments.

Viewing the comparison results

After the comparisons result has finished running, opening the document will bring up the Summary Table. Navigate to the Full Sequence (100% Identity) table as shown below:

peptide tut comparison result1.png

Automatically, a frequency plot will populate showing the top 10 peptide sequences by score and the rates they were found at within each of the four samples. These graphs are interactable, and mousing over the columns reveals that the sequence WPTDHQMLRIPM made up around 44% of pan four, while only making up 0.024% of pan one.

You can also make more complex scatterplot graphs. The below image shows a plot of the Normalized count versus the log2 fold change in sequences, after selecting Graph Type: Scatterplot from the left-hand drop-down.

peptide tut comparison result2.png

This graph is useful for determining which sequences were enriched relative to the first panning round and were also found at high counts in the last panning round.