NGS Tutorial 2. Sample Comparisons (Version 1)

October 31, 2023 01:28
Updated

In this tutorial, you will learn how to annotate and compare single-chain variable fragment (scFv) libraries generated from multiple rounds of biopanning followed by next-generation sequencing with the Ion Torrent Personal Genome Machine (PGM).

This tutorial will cover the following exercises:

Sequence Annotation
Comparing Results
Cluster/Gene Filtering

Get Started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organization may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics. The data and images used in this tutorial are obtained from this research article.

Note: There is a new version of this tutorial here: NGS Tutorial 2. Monitoring Clonal Expansion

The videos in our Getting Started series may also be helpful, linked here. Below is our video on Using the Antibody Annotator Tool.

Sequence Annotation

The Antibody Annotator identifies immunoglobulin framework regions, complementary determining regions and V(D)JC genes, and annotates input sequences against a selected reference database.

In this exercise, you will learn how to annotate scFv sequences from multiple biopanning libraries. The scFv libraries with an expected length of 800-900 bp were sequenced on the Ion Torrent system. To annotate scFv sequences, select the focused_library_trimer_pan0 document in the Input data folder and click Annotation > Antibody Annotator (see image below).

Select the following options from the Antibody Annotator dialog box and click Run to start the analysis (see sections and image below).

Input Options

Select the following options:

Reference database: Human Ig
Selected sequences are: Both chains in a single sequence with linker (scFv)

Analysis Options

Select the following option:

Include pseudo genes from database

This operation will produce a focused_library_trimer_pan0 Annotated & Clustered document. Repeat this step with the other libraries (pan1-4). Ultimately, this will result in 5 individual Biologics Annotator Result documents (one per library).

**Note that you will have to run each library individually as running them all at one go will result in the loss of library categorization as all the libraries will be analyzed as one document. Read more about the Antibody Annotator here.

Comparing Results

Results comparison allows you to efficiently compare results from multiple experiments to identify differences between experiments and detect trends across experiments.

In this exercise, you will compare the focused scFv libraries after 4 rounds of biopanning and compare the results using graphs. To compare the libraries, select all of the previously annotated and clustered libraries in the Sequence annotation folder, and click Post-processing > Compare Results.

Select the following options from the Comparison dialog box (see sections and image below) and click Run to start the analysis.

Filtering

Select the following option:

Filter out sequences where the sum of counts for all samples is lower than: 5

Additional Clustering

Select the following option:

Group similar sequences across all samples

Experiment

Select the following option:

Reference sample: focused_library_trimer_pan0 Annotated & Clustered

This analysis will produce a single Biologics Comparison Result document. Read more about results comparison and data normalization here.

To view the frequency of Heavy CDR3 length across all 5 libraries, first, select Heavy CDR3 length in the Cluster Table dropdown then, select all of the clusters in the Comparison Table by clicking the first checkbox and finally, click the Frequency graph tab.

The frequency distribution of Heavy and Light CDR3 length showed that there was an enrichment in Heavy CDR3s of length 26 amino acids long and Light CDR3s of length 12 amino acids long after multiple rounds of biopanning (Figure 1).

Figure 1 | Frequency of Heavy and Light CDR3 loop length across the libraries. The graphs on the left were generated in Geneious Biologics and the graphs from the right were taken from the research article.

*The loop length differences between the results generated in Geneious Biologics in comparison to the research article may be contributed by the different annotation method. The sequences were annotated in reference to the IMGT annotation method for the Geneious Biologics results.

Cluster/Gene Filtering

As most NGS data comprises of a large number of reads, comparing the reads from one library to another library has been proven to be rather tedious. Cluster filtering coupled with visual aids such as graphs may help in rapid identification of trends across multiple experiments.

In this exercise, you will learn how to filter on clusters from multiple experiments. First, select the Biologics Comparison Result document and select Heavy V Gene from the Cluster Table: dropdown. Subsequently, use the filter syntaxes below for the Heavy and Light chains respectively to filter the results on matching V-genes used in the research article.

['Heavy V Gene'] IN ('IGHV4-28', 'IGHV4-30-2', 'IGHV4-31', 'IGHV4-34', 'IGHV4-39', 'IGHV4-4', 'IGHV4-59', 'IGHV4-61', 'IGHV4-38-2')

['Light V Gene'] IN ('IGLV3-1', 'IGLV3-10', 'IGLV3-12', 'IGLV3-16', 'IGLV3-19', 'IGLV3-21', 'IGLV3-22', 'IGLV3-25', 'IGLV3-27', 'IGLV3-32', 'IGLV3-9')

Upon filtering, select all of the clusters and click Frequency graph to view the graph of the cluster distribution across the 5 libraries.

The heavy and light V-gene cluster frequency distribution graphs generated in Geneious Biologics showed an enrichment in the usage of IGH4-59 and IGLV3-21 genes and this result is identical to the results shown in the research article (Figure 2).

Figure 2 | Frequency of Heavy and Light V germline gene usage across the libraries. The graphs on the left were generated in Geneious Biologics and the graphs from the right were taken from the research article.

*IGHV4-38-2 in Geneious Biologics’ bundled human immunoglobulin reference database is equivalent to IGHV4-b.

**Note that you can use advanced scripts for rapid cluster comparisons. Learn more about filtering and cluster comparison here.

The comparison result tables can be exported for further downstream analyses. To export the Heavy V Gene cluster table, select one or more clusters/rows and click Export. You can export the table as Excel or CSV files. To learn more about exporting your results see Exporting Annotated Sequences and Sequence Tables.