Sanger Tutorial 1. Assembling and Chain-Pairing Sanger Sequences

August 14, 2025 04:44
Updated

In this tutorial you will learn how to use a Name Scheme to both pair chains and assemble forward and reverse reads concurrently. Note: this was previously Sanger Tutorial 3.

This tutorial will cover the following exercises:

Making a Name Scheme
Batch Assembly of Sanger sequences
Sequence Annotation
Further Analysis

Get started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organization may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics.

The first few videos in our Getting Started series may also be helpful, linked here. Below is our video on making the most out of your sequence names:

Making a Name Scheme

All the data used in this tutorial has the following naming convention:

Chain_Sample_SequencingRun.ab1

Screen_Shot_2022-05-11_at_10.42.44_PM.png

The above highlighted sequence can be said to have the following parts, separated by an underscore:

Chain = VH
Sample = Donor1
SequencingRun = run1

These "parts" can be used to tell Biologics how to designate and handle your sequences in downstream analysis once assigned. To do this, we will need to make a Name Scheme for this dataset.

Select Name Schemes under Organization Settings in the left-hand Navigation bar:

Screen_Shot_2022-05-11_at_10.51.55_PM.png

Create a new Name Scheme by clicking New in the top left-hand corner and entering the following:

screencapture-docs-google-document-d-1JOmCcZKSIJRkaPCaoGqpBopj4Aj75TuAO8nK-QueTag-edit-2022-05-11-23_13_52_copy.png

Next we specify what is used to break up the different "parts". This is called a Delimiter and in this dataset, the delimiters are an underscore (_) and a period (.)

screencapture-docs-google-document-d-1JOmCcZKSIJRkaPCaoGqpBopj4Aj75TuAO8nK-QueTag-edit-2022-05-11-23_13_52_copy_2.png

Following this we tell Biologics what kind of information is contained in each "part", or as they are referred to in Biologics, Fields. Select New under Define Fields to bring up a pop-up and define a field.

Screen_Shot_2022-05-11_at_11.26.36_PM.png

A pop-up will appear to designate a field as being of a specific type. The following image shows you how to designate the field of the sequence name that contains Chain information.

Screen_Shot_2022-05-11_at_11.25.07_PM.png

Click Create to define the first part of your naming sequence. Do the above step again for the two remaining fields as such:

1. Donor1

Name = Sample
Field Type = Common Identifier
Select Name Part = Donor1

2. run1

Name = Sequencing run
Field Type = Direction
- This often refers to forward and reverse reads
Select Name Part = run1

**** It is not necessary to specify a field type for the .ab1 field (the file type). Biologics will ignore any undesignated name fields.

Check that the Name Scheme looks correct as below and click Create to finish making the Name Scheme. If you'd like to learn more about Name Schemes, see this article.

Batch Assemble Sanger Sequences

The Sanger sequences in this tutorial have already been trimmed, and so we can proceed straight to assembly.

Note: We strongly recommend using Pre-Processing > Trim Ends on raw Sanger Sequences.

Select all the sequences and go to Pre-Processing > Batch Assemble Sanger Sequences and use the following settings:

Batch by Name Scheme
- Choose the Name Scheme you just made from the drop-down menu (VH_Donor1_run1.ab1)
Consensus: call Sanger heterozygotes
- > 50%
Save list of unused reads
Output consensus sequences as list

Screen_Shot_2022-05-13_at_10.13.53_AM.png

After running, Batch Assembly will generate a single document of 6 Consensus Sequences, one for both the heavy and light chain from each of the three "donors". There were no unused reads in this case (reads that could not be assembled due to insufficient sequence overlap).

Sequence Annotation

Because we have batch-assembled with a Naming Scheme containing a Chain field, the information to pair Heavy and Light chains is already attached to these sequences. This Assembly Consensus Sequences document can therefore be annotated in Antibody Annotator without the need to run the Pair Heavy/Light Chains pipeline under Pre-processing, as is done in Sanger Tutorial 2.

Screen_Shot_2022-05-13_at_3.31.59_PM.png

Run Antibody Annotator with the following settings:

Main Options

Reference database: Human Ig
Selected sequences are: Both chains in associated sequences
The Name Scheme is already automatically selected

>> Hint: if you are expecting some heavy chains to pair with secondary light chains, you can toggle the option for "If there are three or more sequences in a pair:" to "Show all possible Heavy/Light combinations"

Analysis Options

Annotate germline gene differences
Find liabilities and assets

Click Run to start the analysis.

This will produce a Biologics Annotator Result document that automatically pairs the Light and Heavy chain for each of the three donors. This is indicated by the Chain column containing sequences labelled Both.

Screen_Shot_2022-05-17_at_9.54.55_AM.png

Further Analysis

The other tools in Biologics provide a stepping point for analysis.

Antibody Annotator produces clusters by default. To learn more about clusters, see this article: What is a "cluster"?
Filtering is a very powerful tool that allows you to pull out sequences that meet certain metrics you specify: Filtering your sequences
Extract and Re-cluster to take a subset of sequences out of an existing Biologics Annotator Result Document and make a new document with re-calculated clusters
Compare two or more Annotation Result Documents from separate experiments to monitor clonal expansion etc.
You can add your Assay Data (ELISA values etc.) to further inform your results: Adding Assay Data to your analysis results
You can align sequences to compare the amino acid diversity across a region or multiple regions: Sequence alignment
Edit your Sequences to perform point mutations that might increase developability