In this tutorial, you will learn how to use Single Clone Analysis to annotate NGS sequences that have already been partially processed and demultiplexed. You may have been provided sequences in lists that have already been grouped according to barcodes, and would like these barcodes to only be associated within and not across sequence lists.
This tutorial can also be used to generate a single annotation document from multiple sequence lists that have already had their UMIs and Barcodes processed according to NGS Tutorial 3. Using Barcodes and UMIs.
Get started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organisation may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics. See here for more information about where the data was sourced.
Note: The Single Clone Analysis functionality is available as an add-on. If your organisation does not have Barcode Separation, UMI Collapse, or Single Clone Analysis pipelines, please contact us to try them out.
This tutorial will cover the following exercises:
- Setting up the Single Clone Antibody Analysis run on multiple sequence lists
- Understanding the results from Single Clone Antibody Analysis
Setting up the Single Clone Antibody Analysis run
The input data for this tutorial is two sequence lists, each containing sequences assigned to one of four barcodes. The two sequencing runs used the same 4 barcodes, but we do not want to associate sequences with the same barcode from both lists together. Instead, we want each barcode to be associated within each list separately, before carrying out Single Clone Analysis. When Single Clone Analysis is run on multiple lists, barcodes are not compared between lists, only within lists.
Select the two files in the Input folder and click Annotation > Single Clone Antibody Analysis (see image below).
This brings up the dialogue box below, with a number of settings that need to be adjusted depending on the technology used to generate the Single Clone sequences, the expected sequence count per clone, as well as the desired outcome. To find out more about what these options mean, click here. For this tutorial, select the following options below in the Single Clone Antibody Analysis dialogue box and click Run to start the analysis.
- De novo assembly required
- Keep unmerged reads
- Antibody annotator database: Human Ig 2022
- Annotate liabilities
- Associate significant dominant heavy and light pair
- Annotate germline differences
- Combine regions at least: 97% identical
- Significant regions have at least: 1% of the read count of the cell
- Significant regions have at least: 20 reads
- Significant regions have at least: 20% of the read count of the dominant same chain region
- Only keep regions with at least: 5% of the read count of the dominant same chain region
Once the operation is completed (~1 hour), a new document will be generated containing all Single Clone results for the two documents. The top panel shows the Name and Description of this file. In the Description, one can quickly see how many total chains have been found and the type (heavy or light). In this case, A total of 237 Heavy chains were found (including 27 that were significant) and 236 Light chains were found (with 20 being significant).
Understanding the results from Single Clone Antibody Analysis
A useful first pass to explore your data is to go to the Cluster Table dropdown and select Chain Combinations (1). This brings up the combinations of each heavy and light chain assigned to the same barcode (remember that the barcodes from different lists are not grouped together).
As seen in the image above with the combinations ordered by heavy chain ranking, the two samples (SRR1056423 and SRR1056424) and their sets of barcodes are kept separate (2). Within barcode 4 of Sample SRR1056423, the top ranked heavy chain (Heavy-1 indicates the most prevalent heavy chain) contains a frame shift in the CDR2 region (3).
We can also see the diversity of light chains that pair with the top ranked heavy chain assigned to barcode 2 in the SRR1056424 data set (4, also highlighted in blue). The dominant Heavy-1 chain pairs with 4 different light chains (ranked 1-4) according to their being grouped into the same barcode. The column Light% of Dominant Same Chain shows the percentage of each of these light chains relative to the top ranked light chain. Therefore, Light-1 is ranked 100%, while the number of Light-2 sequences found in barcode 2 is 43.12% of the number of Light-1 chains.
If any of these sequences in the Chain Combinations dropdown table is selected, you can view the two chains (Heavy and Light) in the Sequence Viewer:
The above displays the sequence of Heavy-1 (the top ranked heavy chain for barcode 3 in the SRR1056423 dataset) with the second ranked Light-2 pair (the 2nd ranked light chain for barcode 3 in the same dataset). This light chain is found at a rate of 72.42% of the sequences for the top ranked light chain in this barcode and dataset.
The other tools in Biologics provide a stepping point for analysis.
- Single Clone Analysis produces clusters, just like Antibody Annotator. To learn more about clusters, see this article: Understanding "clusters"
- You can add your Assay Data (ELISA values etc.) to further inform your results: Adding Assay Data to your analysis results
- Filtering is a very powerful tool that allows you to pull out sequences that meet certain metrics you specify: Filtering your sequences
- You can align sequences to compare the amino acid diversity across a region or multiple regions: Sequence alignment
The example included here is a reduced and modified dataset from the following paper with publicly available data: