This article outlines how to process Sanger sequences in Geneious Biologics. The flow chart below displays the general workflow for working with raw or processed sequences:
If you have Single Cell data (such as 10X) or NGS sequences, please see our other articles: Single Cell Analysis Workflows or Workflows for NGS Antibody Analysis.
Jump to:
Pre-processing
Grouping Sequences
Often, Sanger sequences will be output as individual sequences. To make these easier to work with, you can group them all into a list under Pre-processing > Group Sequences
To learn more, see our main article: Grouping Sequences
Trimming Ends (optional)
Trimming ends can tidy up your sequences by trimming off the lower quality stretches of bases at either end. Antibody Annotator can still be run on un-trimmed sequences. You can also trim off:
- Primers from a pre-set list
- A set amount of bp at either end
- Trim by ambiguities
To run Trim Ends, go to Pre-processing > Trim Ends...
Batch Assembling Sanger Sequences
Single Chains
If you just have single chains (either heavy or light) and are not expecting to pair Heavy+Light sequences, the following options are available:
-
Full reads that do not need to be assembled:
Proceed straight to the Antibody Annotation section below. -
Reads that do need to be assembled into a full sequence:
Please refer to our main article Batch Assemble Sanger Sequences. You can then proceed from Batch Assembly straight to the Antibody Annotation section below.
The following sections explain how to pair chains via Batch Assembly, or manually via Pair Heavy/Light Chains.
Pairing Chains within Batch Assembly
If you have a mixed Heavy - Light chain dataset, you can use Batch Assemble Sanger Sequences along with a Name Scheme to both assemble reads for the same sequence (eg. Heavy) and also to associate that chain with it's partner chain (eg. Light).
To do this, a Name Scheme must first be created. Please see our article on How to Create a Name Scheme. A Name Scheme is used to specify what parts of the sequence names denote the Chain and the Common identifier (eg. Donor, sample, well etc.).
This can be illustrated with an example. Suppose a sequencing run produces sequences named in the following format:
Chain_Sample_SequencingDirection_well.ab1
An example sequence might be:
VL_Donor1_rev_A7.ab1
Once a Name Scheme has been set for this format, the information contained within the name can be used to:
- Assemble fwd and rev reads where the Common Identifier and Chain parts are the same.
- Associate Chains that can be grouped under the same Common Identifier (in the example above, this is the second "part" or Donor1).
- Pull out a "part" like the well (A7 in the example above) into a separate column in your results. This can be used to pull across Assay Data in an excel or .csv file: Adding Assay Data to your Analysis Results.
Please see our article on How to Create a Name Scheme. Our Sanger Tutorial 1 may also be helpful.
Running Batch Assemble Sanger Sequences
To run Batch Assemble Sanger Sequences, go to Pre-processing > Batch Assemble Sanger Sequences...
Make sure to select the Name Scheme you have set for your Sanger sequence name format in the drop down below:
When Batch Assembling, you can also choose to:
- Call heterozygote bases
- This can also be done under a separate Pre-processing option, see more here: Finding and Calling Heterozygotes
- Save the reads that could not be assembled
- Generate a contig for each assembly
- Output the Consensus Sequences as a list
- This is recommended, as this is the sequence list you will take through to Antibody Annotation.
To learn more about the options, see our main article: Batch Assemble Sanger Sequences.
Manually Pairing Heavy and Light chains
Pairing Heavy and Light chains can also be done separately from Batch Assembly, by using the Pre-processing > Pair Heavy/Light Chains operation:
To learn more, see our article Pairing Heavy and Light Chains
Antibody Annotation
It is recommended that for Sanger and smaller sequencing datasets, Antibody Annotator is used. To run the Antibody Annotator, select a sequence document and go to the Annotation > Antibody Annotator dropdown:
The Main Options are listed below:
Reference database:
-
The Reference Database can be a database of annotated germline sequences or of template (variable region) sequences. Please see How to make a Custom Reference Database to learn how to make a reference database.
The reference database is used to help identify the correct FR and CDR regions in the new sequences being analysed. See Understanding Reference Databases.
Selected Sequences are:
- This asks what chains you are expecting on each read, or pairs of reads. For more information, see the main Antibody Annotator article.
Sequence region of interest is between:
Name Scheme:
- This option will be automatically filled in if you used Batch Assemble Sanger Sequences with a Name Scheme. Name schemes are highly customizable, and allow you to use information contained within the sequence names (eg. well or donor/sample) to classify sequences and groups of sequences.
- A Name Scheme is crucial for the option below, which can enumerate pairs (for example: Heavy-Kappa and Heavy-Lambda) within the same sample/well.
If there are three or more sequences in a pair:
- This option, in conjunction with a Name Scheme (above) allows you to enumerate the possible heavy-light pairs within common identifier/sample. It will only be available if the option Selected Sequences are: Both chains in associated sequences is chosen. Options:
-
Leave sequences unpaired
This will pair any doublets (a single heavy and light chain with the same common identifier), but leave any triplets or singlets unpaired. Unpaired sequences will be classed as Not Fully Annotated. -
Show all possible Heavy/Light combinations
This will enumerate all the possible heavy/light pairings within the same common identifier. For example, if a kappa and lambda chain can be found within the same common identifier as a heavy chain, two pairings will be made:
Heavy-lambda
Heavy-kappa
-
Leave sequences unpaired
For more options, see the main Antibody Annotator article.
Inputting multiple sequence lists
Multiple sequence lists can be selected and run through Antibody Annotator. This will result in one output file (an Annotation Result Document) for each input document, with the same settings used to analyze all files.
Viewing your results
After clicking Run, a Biologics Annotator Result Document will be generated that will look similar to the one below:
In the above example output, three clinical antibody sequences have been selected, consisting of paired heavy-light chains. You can view the sequences in the Sequence Viewer below, where the regions and genes are annotated on the sequences as well as the germline mismatches (purple). See Exploring the Columns of the All Sequences Table for a description of all the columns produced.
Like any other Biologics Annotator Result document, you can also:
- Filter your Sequences
- View the Graphs for Quality Assurance and Graphs to interpret Clusters and Clonotypes
- Perform Sequence Alignments
- View the Clusters in your dataset
- Add Assay Data to your Analysis Results
- Compare Results across Multiple Experiments
- Export Annotated Sequences and Sequence Tables