In this tutorial, you will learn how to annotate and compare single-chain variable fragment (scFv) libraries generated from multiple rounds of biopanning followed by next-generation sequencing with the Ion Torrent Personal Genome Machine (PGM).
Note: A previous version of this tutorial using Antibody Annotator instead of NGS Antibody Annotator can be found here.
This tutorial will cover the following exercises:
Get Started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organization may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics. Note that the data and images used in this tutorial are obtained from this research article.
The videos in our Getting Started series may also be helpful, linked here. Below is our video on Using the Annotation Tools.
Sequence Annotation
The NGS Antibody Annotator identifies immunoglobulin framework regions, complementary determining regions and V(D)JC genes, and annotates input sequences against a selected reference database.
In this exercise, you will learn how to annotate scFv sequences from multiple biopanning libraries. The scFv libraries with an expected length of 800-900 bp were sequenced on the Ion Torrent system. To annotate scFv sequences, select the focused_library_trimer_pan0 document in the Input data folder and click Annotation > NGS Antibody Annotator (see image below).
Select the following options from the NGS Antibody Annotator dialog box and click Run to start the analysis (see sections and image below).
Main Options
Select the following options:
- Reference database: Human Ig
- Selected sequences are: Both chains in a single sequence with linker (scFv)
-
Sequence region of interest is between: CDR2 and FR4 inclusive
- This was chosen as the FR1->FR2 Heavy sequence (the first chain in the scFv) from this run was quite poor quality.
- Collapse sequences at least: 98% identical
Analysis Options
Leave all defaults (none selected)
Clustering Options
Leave all defaults
Advanced Options
Leave all defaults
Click Run to start the analysis. This operation will produce a focused_library_trimer_pan0 Annotated & Clustered Biologics Annotator Result document in the Sequence annotation folder.
Repeat this step with the other libraries (pan1-4). Ultimately, this will result in 5 individual Biologics Annotator Result documents (one per library).
**Note that you will have to run each library individually as running them all at one go will result in the loss of library categorization as all the libraries will be analyzed as one document.
Comparing Results
Results comparison allows you to efficiently compare results from multiple experiments to identify differences between experiments and monitor clone enrichment. It can also be used to identify sequences that are shared across samples, see Comparing Results across Multiple Experiments.
In this exercise, you will compare the focused scFv libraries after 4 rounds of biopanning and compare the results using graphs. To compare the libraries, select all of the previously annotated and clustered libraries in the Sequence annotation folder, and click Post-processing > Compare Results.
Filtering
Select the following option:
- Filter out sequences where the sum of counts for all samples is lower than: 5
Additional Clustering
Select the following option:
-
Group similar sequences across all samples
- Method: Similarity-based Clustering
- Threshold: 90%
- Region: Heavy CDR3
Experiment
Select the following option:
- Reference sample: focused_library_trimer_pan0 Annotated & Clustered
Click Run. This analysis will produce a single Biologics Comparison Result document in the parent folder, unless otherwise specified.
Viewing the results
Tracking changes in CDR3 lengths
To view the frequency of Heavy CDR3 lengths across all 5 libraries, open the comparison result and select Heavy CDR3 Length in the Cluster Table dropdown. Automatically, the top 10 clustered region lengths by score will be displayed in the Graphs tab below.
The frequency distribution of Heavy and Light CDR3 length showed that there was an enrichment in Heavy CDR3s of length 26 amino acids long and Light CDR3s of length 12 amino acids long after multiple rounds of biopanning (Figure 1):
Figure 1 | Frequency of Heavy and Light CDR3 loop length across the libraries. The graphs on the left were generated in Geneious Biologics and the graphs from the right were taken from the research article.
Note: The loop length differences between the results generated in Geneious Biologics in comparison to the research article may be contributed by the different annotation method. The sequences were annotated in reference to the IMGT annotation method for the Geneious Biologics results.
Tracking sequence/clonotype enrichment via scatterplots
You can also determine specific HCDR3 sequences (or any other clustered region, e.g. VDJ) that were selected for over multiple panning rounds by generating scatterplots. In this case, we are defining our clonotypes by grouping sequences that were 90% similar across the HCDR3. To learn more about clonotypes see our main article Understanding Clonotypes or watch the video below:
In this example, we will generate both a volcano plot and a generic clone enrichment scatter plot.
First, go to the Cluster Table dropdown and select Heavy CDR3 (90% Similarity). Automatically, the Graphs tab below will be populated with a frequency graph of the top 10 Heavy CDR3 clusters:
Go to the Graph Type: dropdown highlighted in the image above and select "Scatterplot". Then choose the following for the X and Y axis:
- X-axis:
- Normalized count focused_library_trimer_pan4/focused_library_trimer_pan0
- Y-axis:
- Log2 Fold Change (FC) Norm. Count focused_library_trimer_pan4/focused_library_trimer_pan0
This produces the below scatterplot. Any points can be hovered over to bring up the sequence in question. In this dataset, the HCDR3 (90% Similarity) clustered sequence ATARRGQRIYGVVSFGEFFYYYYMDV was found to be enriched in panning round 4 relative to panning round 0:
To generate a Volcano plot, enter in the following X and Y axis data selections for the dropdowns on the right:
- X-axis:
- Log2 Fold Change (FC) Norm. Count focused_library_trimer_pan4/focused_library_trimer_pan0
- Y-axis:
- -Log10 P-Value
As can be seen below, the same HCDR3 (90% Similarity) sequence cluster ATARRGQRIYGVVSFGEFFYYYYMDV was once again identified as being enriched between pans 0 and 4. The other potentially interesting HCDR3 sequence clusters are: ATARRGQRIYGVVSFGEFFVLLLHGR, CDSAPRTEDLWSGFIWRVLLLLLQDV and ATARRGQRIYGVVSFGEVLLLLLQDV
Exporting Graphs
Any of the graphs above can be exported by clicking the Export button next to the Graph Type: dropdown. Format is either as an image (.png) or as a table document (.csv) to recreate the datapoints.
Viewing and Exporting Sequences
To view the sequences within any of these clusters, select any rows in the above Comparison Table and switch to the Sequence Viewer tab next to the Graphs tab. For example, I used the below filter to find the enriched cluster "ATARRGQRIYGVVSFGEFFYYYYMDV"
['Heavy CDR3 (90% Similarity)'] = 'ATARRGQRIYGVVSFGEFFYYYYMDV'
Like any other Biologics Result Document, selected sequences or clusters can be exported by going to Export/Extract > Export Sequences.
To learn more about how to filter and using filter syntax, see Filtering your Sequences.
As most NGS data comprises of a large number of reads, comparing the reads from one library to another library has been proven to be rather tedious. Cluster filtering coupled with visual aids such as graphs may help in rapid identification of trends across multiple experiments.
Gene Filtering
In this exercise, you will learn how to filter on clusters from multiple experiments. First, select the Biologics Comparison Result document and select Heavy V Gene from the Cluster Table: dropdown. Use the filter syntaxes below for the Heavy and Light chains respectively to filter the results on matching V-genes used in the research article:
['Heavy V Gene'] IN ('IGHV4-28', 'IGHV4-30-2', 'IGHV4-31', 'IGHV4-34', 'IGHV4-39', 'IGHV4-4', 'IGHV4-59', 'IGHV4-61', 'IGHV4-38-2')
['Light V Gene'] IN ('IGLV3-1', 'IGLV3-10', 'IGLV3-12', 'IGLV3-16', 'IGLV3-19', 'IGLV3-21', 'IGLV3-22', 'IGLV3-25', 'IGLV3-27', 'IGLV3-32', 'IGLV3-9')
Upon filtering, a frequency graph of the cluster distribution across the 5 libraries will automatically populate in the below viewer.
The heavy and light V-gene cluster frequency distribution graphs generated in Geneious Biologics showed an enrichment in the usage of IGH4-59 and IGLV3-21 genes and this result is identical to the results shown in the research article (Figure 2).
Figure 2 | Frequency of Heavy and Light V germline gene usage across the libraries. The graphs on the left were generated in Geneious Biologics and the graphs from the right were taken from the research article.
*IGHV4-38-2 in Geneious Biologics’ bundled human immunoglobulin reference database is equivalent to IGHV4-b.
**Note that you can use advanced scripts for rapid cluster comparisons. See Filtering your Sequences to learn more.
Reference Publication
Frontiers | Hidden Lineage Complexity of Glycan-Dependent HIV-1 Broadly Neutralizing Antibodies Uncovered by Digital Panning and Native-Like gp140 Trimer. (2017) https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2017.01025/full