Workflows for Sanger Antibody Analysis

October 24, 2024 03:14
Updated

This article outlines how to process Sanger sequences in Geneious Biologics. The flow chart below displays the general workflow for working with raw or processed sequences:

Preprocessing Sanger flowchart.jpg

If you have Single Cell data (such as 10X) or NGS sequences, please see our other articles: Single Cell Analysis Workflows or Workflows for NGS Antibody Analysis.

Pre-processing

Grouping Sequences

Often, Sanger sequences will be output as individual sequences. To make these easier to work with, you can group them all into a list under Pre-processing > Group Sequences

preprocessing > group sequences.png

To learn more, see our main article: Grouping Sequences

Trimming Ends (optional)

Trimming ends can tidy up your sequences by trimming off the lower quality stretches of bases at either end. Antibody Annotator can still be run on un-trimmed sequences. You can also trim off:

Primers from a pre-set list
A set amount of bp at either end
Trim by ambiguities

To run Trim Ends, go to Pre-processing > Trim Ends...

preprocessing > trim ends.png

Batch Assembling Sanger Sequences

Single Chains

If you just have single chains (either heavy or light) and are not expecting to pair Heavy+Light sequences, the following options are available:

Full reads that do not need to be assembled:
Proceed straight to the Antibody Annotation section below.
Reads that do need to be assembled into a full sequence:
Please refer to our main article Batch Assemble Sanger Sequences. You can then proceed from Batch Assembly straight to the Antibody Annotation section below.

The following sections explain how to pair chains via Batch Assembly, or manually via Pair Heavy/Light Chains.

Pairing Chains within Batch Assembly

If you have a mixed Heavy - Light chain dataset, you can use Batch Assemble Sanger Sequences along with a Name Scheme to both assemble reads for the same sequence (eg. Heavy) and also to associate that chain with it's partner chain (eg. Light).

To do this, a Name Scheme must first be created. Please see our article on How to Create a Name Scheme. A Name Scheme is used to specify what parts of the sequence names denote the Chain and the Common identifier (eg. Donor, sample, well etc.).

This can be illustrated with an example. Suppose a sequencing run produces sequences named in the following format:

Chain_Sample_SequencingDirection_well.ab1

An example sequence might be:

VL_Donor1_rev_A7.ab1

Once a Name Scheme has been set for this format, the information contained within the name can be used to:

Assemble fwd and rev reads where the Common Identifier and Chain parts are the same.
Associate Chains that can be grouped under the same Common Identifier (in the example above, this is the second "part" or Donor1).
Pull out a "part" like the well (A7 in the example above) into a separate column in your results. This can be used to pull across Assay Data in an excel or .csv file: Adding Assay Data to your Analysis Results.

Please see our article on How to Create a Name Scheme. Our Sanger Tutorial 1 may also be helpful.

Running Batch Assemble Sanger Sequences

To run Batch Assemble Sanger Sequences, go to Pre-processing > Batch Assemble Sanger Sequences...

preprocessing > batch assemble.png

Make sure to select the Name Scheme you have set for your Sanger sequence name format in the drop down below:

Screenshot 2023-07-27 at 4.56.19 PM.png

When Batch Assembling, you can also choose to:

Call heterozygote bases
- This can also be done under a separate Pre-processing option, see more here: Finding and Calling Heterozygotes
Save the reads that could not be assembled
Generate a contig for each assembly
Output the Consensus Sequences as a list
- This is recommended, as this is the sequence list you will take through to Antibody Annotation.

To learn more about the options, see our main article: Batch Assemble Sanger Sequences.

Manually Pairing Heavy and Light chains

Pairing Heavy and Light chains can also be done separately from Batch Assembly, by using the Pre-processing > Pair Heavy/Light Chains operation:

preprocessing > pair heavy light chains.png

To learn more, see our article Pairing Heavy and Light Chains

Antibody Annotation

It is recommended that for Sanger and smaller sequencing datasets, Antibody Annotator is used. To run the Antibody Annotator, select a sequence document and go to the Annotation > Antibody Annotator dropdown:

annotation > anti anno.png

The Main Options are listed below:

Anti anno main options.png

Reference database:

The Reference Database can be a database of annotated germline sequences or of template (variable region) sequences. Please see How to make a Custom Reference Database to learn how to make a reference database.
The reference database is used to help identify the correct FR and CDR regions in the new sequences being analysed. See Understanding Reference Databases.

Selected Sequences are:

This asks what chains you are expecting on each read, or pairs of reads. For more information, see the main Antibody Annotator article.

Sequence region of interest is between:

To define what a "fully annotated" sequence is, you can select the values from the dropdown menu. The default values between FR1 and FR4 means that a sequence is considered to be fully annotated if it consists of all of the regions: FR1, CDR2, FR2, CDR2, FR3, CDR3, and FR4. In addition to affecting the "Fully Annotated" column in your "All sequences" result table, this may also determine which sequences are used to create the cluster tables. See the section on clustering above for more information.

Name Scheme:

This option will be automatically filled in if you used Batch Assemble Sanger Sequences with a Name Scheme. Name schemes are highly customizable, and allow you to use information contained within the sequence names (eg. well or donor/sample) to classify sequences and groups of sequences.
- A Name Scheme is crucial for the option below, which can enumerate pairs (for example: Heavy-Kappa and Heavy-Lambda) within the same sample/well.

If there are three or more sequences in a pair:

This option, in conjunction with a Name Scheme (above) allows you to enumerate the possible heavy-light pairs within common identifier/sample. It will only be available if the option Selected Sequences are: Both chains in associated sequences is chosen. Options:
- Leave sequences unpaired
  This will pair any doublets (a single heavy and light chain with the same common identifier), but leave any triplets or singlets unpaired. Unpaired sequences will be classed as Not Fully Annotated.
- Show all possible Heavy/Light combinations
  This will enumerate all the possible heavy/light pairings within the same common identifier. For example, if a kappa and lambda chain can be found within the same common identifier as a heavy chain, two pairings will be made:
  Heavy-lambda
  Heavy-kappa

For more options, see the main Antibody Annotator article.

Inputting multiple sequence lists

Multiple sequence lists can be selected and run through Antibody Annotator. This will result in one output file (an Annotation Result Document) for each input document, with the same settings used to analyze all files.

Viewing your results

After clicking Run, a Biologics Annotator Result Document will be generated that will look similar to the one below:

anti anno result generic.png

In the above example output, three clinical antibody sequences have been selected, consisting of paired heavy-light chains. You can view the sequences in the Sequence Viewer below, where the regions and genes are annotated on the sequences as well as the germline mismatches (purple). See Exploring the Columns of the All Sequences Table for a description of all the columns produced.

Like any other Biologics Annotator Result document, you can also:

Filter your Sequences
View the Graphs for Quality Assurance and Graphs to interpret Clusters and Clonotypes
Perform Sequence Alignments
View the Clusters in your dataset
Add Assay Data to your Analysis Results
Compare Results across Multiple Experiments
Export Annotated Sequences and Sequence Tables