In the previous two tutorials, chain pairing and Sanger sequence assembly were performed in separate steps. In this tutorial you will learn how to use a Name Scheme to do both concurrently.
This tutorial will cover the following exercises:
Get started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organization may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics.
Making a Name Scheme
All the data used in this tutorial has the following naming convention:
The above highlighted sequence can be said to have the following parts, separated by an underscore:
- Chain = VH
- Sample = Donor1
- SequencingRun = run1
These "parts" can be used to tell Biologics how to designate and handle your sequences in downstream analysis once assigned. To do this, we will need to make a Name Scheme for this dataset.
Select Name Schemes under Organization Settings in the left-hand Navigation bar:
Create a new Name Scheme by clicking New in the top left-hand corner and entering the following:
Next we specify what is used to break up the different "parts". This is called a Delimiter and in this dataset, the delimiters are an underscore (_) and a period (.)
Following this we tell Biologics what kind of information is contained in each "part", or as they are referred to in Biologics, Fields. Select New under Define Fields to bring up a pop-up and define a field.
A pop-up will appear to designate a field as being of a specific type. The following image shows you how to designate the field of the sequence name that contains Chain information.
Click Create to define the first part of your naming sequence. Do the above step again for the two remaining fields as such:
- Name = Sample
- Field Type = Common Identifier
- Select Name Part = Donor1
- Name = Sequencing run
- Field Type = Direction
- This often refers to forward and reverse reads
- Select Name Part = run1
**** It is not necessary to specify a field type for the .ab1 field (the file type). Biologics will ignore any undesignated name fields.
Check that the Name Scheme looks correct as below and click Create to finish making the Name Scheme. If you'd like to learn more about Name Schemes, see this article.
Batch Assemble Sanger Sequences
The Sanger sequences in this tutorial have already been trimmed, and so we can proceed straight to assembly. Select all the sequences and go to Pre-Processing > Batch Assemble Sanger Sequences and use the following settings:
- Batch by Name Scheme
- Choose the Name Scheme you just made from the drop-down menu (VH_Donor1_run1.ab1)
- Consensus: call Sanger heterozygotes
- > 50%
- Save list of unused reads
- Output consensus sequences as list
After running, Batch Assembly will generate a single document of 6 Consensus Sequences, one for both the heavy and light chain from each of the three "donors". There were no unused reads in this case (reads that could not be assembled due to insufficient sequence overlap).
Because we have batch-assembled with a Naming Scheme containing a Chain field, the information to pair Heavy and Light chains is already attached to these sequences. This Assembly Consensus Sequences document can therefore be annotated in Antibody Annotator without the need to run the Pair Heavy/Light Chains pipeline under Pre-processing, as was done in Sanger Tutorial 2.
Run Antibody Annotator with the following settings:
- Reference database: Human Ig
- Selected sequences are: Both chains in associated sequences
- The Name Scheme is already automatically selected
- Include pseudogenes from the database
- Include ORF genes from the database
- Annotate germline gene differences
- Find liabilities and assets
This will produce a Biologics Annotator Result document that automatically pairs the Light and Heavy chain for each of the three donors. This is indicated by the Chain column containing sequences labelled Both.
The other tools in Biologics provide a stepping point for analysis.
- Antibody Annotator produces clusters by default. To learn more about clusters, see this article: What is a "cluster"?
- You can add your Assay Data (ELISA values etc.) to further inform your results: Adding Assay Data to your analysis results
- Filtering is a very powerful tool that allows you to pull out sequences that meet certain metrics you specify: Filtering your sequences
- You can align sequences to compare the amino acid diversity across a region or multiple regions: Sequence alignment