This article outlines what a Name Scheme is and how it can be used to both pair chains and assemble sequence reads in one step. For information on how to create a Name Scheme see the How to Create a Name Scheme article. See also the Sanger Tutorial 3. Using Name Schemes to pair and assemble sequences.
The video below describes how to make use of the information contained in your Sanger sequence names to pair chains and assemble forward and reverse reads.
Jump to:
Introduction
A Name Scheme allows users to define common sequence name structures. Once created, Name Schemes can be used to automatically extract important information such as the sample name, chain, sequencing direction and sample well from each sequence. This information can then be used automatically when performing analyses and for annotating results. Name schemes are usually only useful for Sanger sequences and similar, as NGS sequences (such as from MiSeq) rarely have biologically meaningful names.
Common uses for Name Schemes include:
- Creating custom columns in analysis results from sequence names, so that those columns can be used for matching when adding Assay Data.
- Defining a Common Identifier from sequence names to use for pairing Heavy and Light chain sequences.
- Defining a Common Identifier and optional chain for use when assembling Sanger reads.
Let’s look at an example to illustrate. Suppose an organization's sequences are named in the following way:
Chain_Sample_SequencingDirection_well.ab1
An example sequence might be:
The above highlighted sequence can be said to have the following parts, separated by an underscore:
- Chain = VH
- Sample = Donor1
- SequencingDirection = rev
- Well = B12
These "parts" can be used to tell Biologics how to designate and handle your sequences in downstream analysis once assigned. The underscore separator is referred to as a Delimiter. Note that other characters can be specified as delimiters, like periods (.) etc.
Based on this, we are able to create standard rules for how to read sequence names of this format. To learn how to make a Name Scheme, follow this link.
Note: To utilise the power of Name Schemes in Geneious Biologics, it is important that sequence names can be described by relatively consistent format or formats. It is easiest to simply establish consistent sequence name formats in your organisation. However, if this is not possible, one solution could be to Batch Rename sequences first, before applying a Name Scheme.
Batch Assemble Sanger Sequences
To assemble Sanger sequences using a Name Scheme, first select the Batch by Name Scheme option, then select the name scheme you would like to use from the Name scheme dropdown. Only Name Schemes that contain a field with a Common Identifier type will be available for selection.
The assembly process will apply the Name Scheme to your input sequences by splitting their names by the Name Scheme delimiters, then extracting the Common Identifier and Chain fields (if the latter is present). It will then use the values it encounters for these fields in each sequence as the unique key for assembly. Here is an example with the following sequences:
VH_31023_5;B5
VH_31023_3;B6
VK_31023_5;H11
VK_31023_3;H12
We have defined a Name Scheme that contains delimiters of _; and two fields for Common Identifier (the second split piece) and Chain (the first split piece). Batch assembly will apply the Name Scheme to our four sequences to create assembly keys of 31023_VH for the first two sequences and 31023_VK for the second two. As these keys will be the same for two sequences each, the Heavy and Kappa sequences will be assembled together as desired.
Output sequences will be named according to the assembly key for readability, but all fields from the Name Scheme will be saved to each sequence for display in downstream results.
Pair Heavy/Light Chains
Note that you do not need to run Pair Heavy/Light Chains if you have run Batch Assemble Sanger Sequences with a Name Scheme containing chain information, you can proceed straight to Antibody Annotation.
The Pair/Heavy Light Chains analysis operation allows the selection of a name scheme with a Common Identifier field for pairing sequences using their value for this field. The output of this operation will contain the input sequences that have been paired where they had an identical value for their Common Identifier field.
Let's look at an example to illustrate, suppose we have the following sequences:
VH_31023;B5
VK_31023;H11
and we have defined a Name Scheme with delimiters of _; and a Common Identifier field for the second split piece. When we run the Pair Heavy/Light Chains operation with this Name Scheme, it will split each sequence by the delimiters and determine that both share a common identifier value of 31023. The two sequences will be output paired and shown together as a single row in Antibody Analysis results.
Antibody Annotator
Antibody Annotator can make use of Name Schemes to simplify your workflow. Once you have Batch Assembled your sequences using a Name Scheme (if this is required), you can proceed directly to Antibody Annotator which will recognise that you have already associated your sequences with a Name Scheme and make use of the name scheme fields automatically. It will pair Heavy and Light chains using the Common Identifier field in your Name Scheme. It will also save all Name Scheme display fields so they are visible as columns in the results.
If your data does not require assembly, you can proceed straight to Antibody Annotator using your raw sequences and specify a Name Scheme with a Common Identifier field. This will be used to pair the sequences by chain if you select the Both chains in associated sequences option for Selected sequences are:. All fields from the selected Name Scheme will be visible as columns in the results.