A reference sequence database is chosen as the basis by which query sequences are annotated, and therefore must include annotated reference sequences. These reference sequences will be used by the Biologics Annotation pipelines to automatically annotate your input sequences. There are two broad types of databases: Reference databases (including Germline Gene and Antibody Template databases) and Feature databases.
Jump to:
- Differences between Feature, Template and Germline Reference Databases
- Annotated Germline Databases in depth
-
How to create custom Reference Databases
Differences between Feature, Template and Germline Reference Databases
In short, Germline Gene and Antibody Template databases are necessary for the annotation of IgG-like sequences, while Feature Databases are not.
- The Germline/Template reference databases contain sequences with FR/CDRs either as germline genes or as entire variable regions (template reference databases). These are necessary to annotate the FR/CDR boundaries of your input sequences.
-
Feature databases can be used in addition to the reference database, and are less strictly formatted. This allows you to augment your standard antibody annotations with other custom annotations such as fusion proteins, signal peptides or other sequence features. To read more about the benefits of using a feature database see our main article: Using Feature Databases to identify Constant Regions and Fusion Proteins
Annotated Germline Databases in depth
Antibody Annotator, NGS Antibody Annotator and Single Clone Antibody Analysis use reference sequences to determine two key aspects of sequence annotation:
- The closest matching reference gene(s) to each of your target sequences, and what the silent and non-silent variations in the target sequence are relative to its closest match.
- The most appropriate FR/CDR region boundaries, calculated from the whole dataset not just the closest match.
These custom sequences can either represent the Ig/TCR variable region of a different species (or multiple species). Some amount of ambiguities are tolerated, particularly in the centre of CDR regions.
Biologics comes with three fully supported germline gene reference databases: Human, Mouse (Mus musculus) and Alpaca (Vicugna pacos). We also have other species available on an as-is basis, please contact us if you would like access to any of these. You can also make your own reference database, as outlined in the below section.
How to create custom Reference Databases
Our main articles outline how to create Reference Databases:
- Please see How to make a Custom Reference Database for instructions on how to make a custom reference database using Biologics. This includes both Germline Gene or Template variable region databases.
- If you are working with TCR sequences, please see this article on how to make a custom reference database containing TCR gene sequences: Analyzing TCR sequences in Geneious Biologics
- If you would like to manually annotate your germline or template variable sequences, please see this article: How to Manually Annotate Reference Sequences
- For more information on how to create Feature Databases, please see this article: Using Feature Databases to identify Constant Regions and Fusion Proteins
The Reference Database section can be found on the navigation panel under Organization Databases. To create a new reference database, click on the 3 vertical dots to bring up the New database option.
See the above guides for more information.