In this tutorial you will learn how to use a Name Scheme to both pair chains and assemble forward and reverse reads concurrently. Note: this was previously Sanger Tutorial 3.
This tutorial will cover the following exercises:
Get started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organization may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics.
The first few videos in our Getting Started series may also be helpful, linked here. Below is our video on making the most out of your sequence names:
Making a Name Scheme
All the data used in this tutorial has the following naming convention:
Chain_Sample_SequencingRun.ab1
The above highlighted sequence can be said to have the following parts, separated by an underscore:
- Chain = VH
- Sample = Donor1
- SequencingRun = run1
These "parts" can be used to tell Biologics how to designate and handle your sequences in downstream analysis once assigned. To do this, we will need to make a Name Scheme for this dataset.
Select Name Schemes under Organization Settings in the left-hand Navigation bar:
Create a new Name Scheme by clicking New in the top left-hand corner and entering the following:
Next we specify what is used to break up the different "parts". This is called a Delimiter and in this dataset, the delimiters are an underscore (_) and a period (.)
Following this we tell Biologics what kind of information is contained in each "part", or as they are referred to in Biologics, Fields. Select New under Define Fields to bring up a pop-up and define a field.
A pop-up will appear to designate a field as being of a specific type. The following image shows you how to designate the field of the sequence name that contains Chain information.
Click Create to define the first part of your naming sequence. Do the above step again for the two remaining fields as such:
1. Donor1
- Name = Sample
- Field Type = Common Identifier
- Select Name Part = Donor1
2. run1
- Name = Sequencing run
- Field Type = Direction
- This often refers to forward and reverse reads
- Select Name Part = run1
**** It is not necessary to specify a field type for the .ab1 field (the file type). Biologics will ignore any undesignated name fields.
Check that the Name Scheme looks correct as below and click Create to finish making the Name Scheme. If you'd like to learn more about Name Schemes, see this article.
Batch Assemble Sanger Sequences
The Sanger sequences in this tutorial have already been trimmed, and so we can proceed straight to assembly.
Note: We strongly recommend using Pre-Processing > Trim Ends on raw Sanger Sequences.
Select all the sequences and go to Pre-Processing > Batch Assemble Sanger Sequences and use the following settings:
-
Batch by Name Scheme
- Choose the Name Scheme you just made from the drop-down menu (VH_Donor1_run1.ab1)
-
Consensus: call Sanger heterozygotes
- > 50%
- Save list of unused reads
- Output consensus sequences as list
After running, Batch Assembly will generate a single document of 6 Consensus Sequences, one for both the heavy and light chain from each of the three "donors". There were no unused reads in this case (reads that could not be assembled due to insufficient sequence overlap).
Sequence Annotation
Because we have batch-assembled with a Naming Scheme containing a Chain field, the information to pair Heavy and Light chains is already attached to these sequences. This Assembly Consensus Sequences document can therefore be annotated in Antibody Annotator without the need to run the Pair Heavy/Light Chains pipeline under Pre-processing, as is done in Sanger Tutorial 2.
Run Antibody Annotator with the following settings:
Main Options
- Reference database: Human Ig
- Selected sequences are: Both chains in associated sequences
- The Name Scheme is already automatically selected
>> Hint: if you are expecting some heavy chains to pair with secondary light chains, you can toggle the option for "If there are three or more sequences in a pair:" to "Show all possible Heavy/Light combinations"
Analysis Options
- Annotate germline gene differences
- Find liabilities and assets
Click Run to start the analysis.
This will produce a Biologics Annotator Result document that automatically pairs the Light and Heavy chain for each of the three donors. This is indicated by the Chain column containing sequences labelled Both.
Further Analysis
The other tools in Biologics provide a stepping point for analysis.
- Antibody Annotator produces clusters by default. To learn more about clusters, see this article: What is a "cluster"?
- Filtering is a very powerful tool that allows you to pull out sequences that meet certain metrics you specify: Filtering your sequences
- Extract and Re-cluster to take a subset of sequences out of an existing Biologics Annotator Result Document and make a new document with re-calculated clusters
- Compare two or more Annotation Result Documents from separate experiments to monitor clonal expansion etc.
- You can add your Assay Data (ELISA values etc.) to further inform your results: Adding Assay Data to your analysis results
- You can align sequences to compare the amino acid diversity across a region or multiple regions: Sequence alignment
- Edit your Sequences to perform point mutations that might increase developability