How to Create a Name Scheme

December 12, 2023 04:02
Updated

This article describes how to set up a Name Scheme for your organization. Name schemes can be created by both administrators and regular users. For more information on what a Name Scheme is and why they are useful for processing Sanger sequences see the Using Name Schemes article, or watch the video below:

Creating a Name Scheme

To view your organization's existing Name Schemes or create new schemes, click Name Schemes in the left menu inside the Administration section. The Name Schemes page lists your organization's existing Name Schemes in a table and you can click New on the top left to bring up the Name Scheme creation form. The form consists of several steps you need to fill out before the new Name Scheme is available for use.

Step 1. Specify a Name Scheme Name

First, your new Name Scheme requires a name. This will be displayed to users, so it should be obvious and easily-identifiable. You may want to name it after an experiment or for the lab group conducting the sequencing, or anything that acts as a group for sequences that have a common name structure.

Screen_Shot_2019-11-28_at_1.55.02_PM.png

Step 2. Provide Example Sequence Name

Next, provide an example sequence name to test the Name Scheme on. You can either copy/paste this into the input field or select a file on your computer that has a filename matching the scheme you are creating.

Screen_Shot_2019-11-28_at_1.55.50_PM.png

Step 3. Specify Delimiters

Delimiters are the characters used to break your sequence into pieces. In this step, specify all delimiters used in your sequence names. You can enter as many delimiters as you like as a single word, for example, for a sequence name like Sample_Chain-Direction;Machine you would enter the delimiters _-; because each of these three characters acts as a boundary between information of interest. As you enter delimiters, you will see the example sequence name you entered in the previous step being broken up into pieces using the delimiters. This allows you to confirm visually that you have specified the correct delimiters.

Note: Delimiters should not appear inside the information of interest inconsistently. For example, the following sequences names could not be described by the same Name Scheme:

sample_472_heavy_reverse
sample123_light_reverse

The underscore cannot be used as a delimiter for these 2 sequences because the first one has an underscore as part of the sample name while the second doesn't. Either one of these would be acceptable separately, but not if they are mixed. One solution here could be to use Batch Rename on sequences to get them into a consistent format before applying a Name Scheme.

Screen_Shot_2019-11-28_at_1.56.07_PM.png

Step 4. Configure Fields

Once the sequence name has been broken into pieces, the pieces can be used to capture the defined fields of interest. A field can combine any number of the pieces together under a field name.

To begin, click the New button above the fields table to bring up the new field dialog, then give the field a name and specify its type. A Display column type simply says that this field should be displayed to users as a results column later, whereas the other types allow you to define special fields that will be used during analyses. For Display Column type fields, the field name that you specify will be the new column header.

Screen_Shot_2019-11-28_at_1.56.22_PM.png

If your sequencing process involves sequencing heavy and light chains both forward and reverse, you can classify separate fields as Common Identifier and Chain (optional). When you batch assemble your sequences, the assembly process will automatically assemble forward and reverse sequences for each sample for each chain using these fields. You can also use the Common Identifier field to automatically associate paired Heavy and Light chain sequences together.

There are two ways to specify which pieces of the sample sequence name you wish to use for the new field. The first is to select a name part from a dropdown. This is the easiest approach if you only need a single part of the sequence name. Alternatively, you can construct a custom template where you can refer to each piece of the sequence name by number and combine them as you need. For example, a custom template of ${0}-${2} will take the first part, combine it with the third part, and add a dash between them.

Screen_Shot_2019-11-28_at_1.57.20_PM.png Screen_Shot_2019-11-28_at_1.58.28_PM.png

Examples

This section will run through a few typical Biologics sequencing use cases to illustrate how Name Schemes fit into a variety of different workflows.

Bidirectional Sanger VHH (heavy chain only) Sequences

Since only a single chain is being sequenced, a Chain field type is not required for this analysis. Only a Common Identifier field needs to be specified in the Name Scheme to allow Batch Assembly of sequences using this field. For example, you may have the following sequence names to describe two separate antibodies:

31023_5_B5
31023_3_B6

33743_5_H11
33743_3_H12

In this scenario, you would specify underscore (_) as the field delimiter and denote the first field as the Common Identifier. Batch Assembly will then automatically assemble both directions for each sample using this identifier.

You can also pull out the well ID (third field) as a Display Column type field, to pull the well ID into a column on your annotated results.

Assembled Sanger Heavy and Light Chain Sequences

In this case, sequences have already been assembled, or were only sequenced in one direction. This means assembly is not required, but there is a sequence for each chain and these need to be paired together. You may have the following sequence names to describe two separate antibodies:

VH_31023_B5
VK_31023_B6

VH_33743_H11
VK_33743_H12

In this scenario, you would specify underscore (_) as the field delimiter and denote the second field as the Common Identifier. You can then proceed directly to Antibody Annotator which will pair the reads automatically using the Common Identifier field.

You can also pull out the well ID (third field) as a Display Column type field, to pull the well ID into a column on your annotated results.

Bidirectional Sanger Heavy and Light Chain Sequences

In this case there are four sequences for every antibody; two directions for each of the two chains. This means both assembly and pairing of Heavy/Light chains is required. You may have the following sequence names to describe a single antibody:

VH_31023_5_B5
VH_31023_3_B6

VK_31023_5_H11
VK_31023_3_H12

In this scenario, you would specify underscore (_) as the field delimiter and denote the first field as the Chain type field and the second field as the Common Identifier field type. Batch Assembly will automatically assemble both directions for each chain for each sample using these two fields in combination. You can then proceed directly to Antibody Annotator with the assembled consensus sequences which will pair the reads automatically using the Common Identifier field.

You can also pull out the well ID (fourth field) as a Display Column type field, to pull the well ID into a column on your annotated results.

This means that not only will your sequences assemble together correctly, but also that Heavy and Light chains will be associated together where possible, and shown together as a single row in the Antibody Analysis results.