A reference sequence database is used to identify and annotate your input or query sequences, and is crucial for antibody/TCR annotation and analysis. Geneious Biologics comes with a set of provided Germline Gene reference databases: Human Ig, Human TCR, Mouse Ig and Alpaca Ig. Others are available on request.
Jump to:
- Types of Reference Databases
- Reference Databases for Antibody Analysis in depth
-
How to create Custom Reference Databases
Types of Reference Databases
Biologics comes with and allows you to create databases of the following types:
- Germline Gene (Antibody/TCR Specific)
- Antibody Template (Antibody/TCR Specific)
- General Template (for use with the Peptide Annotator, see General Template Databases)
-
Linker Database
- These are used in addition to either a Germline Gene or Antibody Template database, for dual chain/scFv-like datasets. See Linker Databases
-
Feature Databases
- Feature databases can be used in addition to any of the above reference databases, and are less strictly formatted. This allows you to augment your standard antibody annotations with other custom annotations such as fusion proteins, signal peptides or other sequence features. To read more about the benefits of using a feature database see our main article: Using Feature Databases to identify Constant Regions and Fusion Proteins
In short, Germline Gene and Antibody Template databases are necessary for the annotation of IgG-like sequences, while General Template databases are used for other biological molecules, like peptides or proteins.
- The Germline Gene and Antibody Template reference databases contain sequences with FR/CDRs either as germline genes or as entire variable regions (template reference databases). These are necessary to annotate the FR/CDR boundaries of your input sequences.
-
General Template databases are less thoroughly formatted, and can consist of plain (fasta) input sequences for a protein or peptide, either as nucleotides or amino acids. Learn more here: General Template Databases.
Reference Databases for Antibody Analysis in depth
Antibody Annotator, NGS Antibody Annotator and Single Clone Antibody Analysis use reference sequences to determine two key aspects of sequence annotation:
- The closest matching reference gene(s) to each of your target sequences (Germline Gene databases only) and what the silent and non-silent variations in the target sequence are relative to its closest match in the reference database.
- The most appropriate FR/CDR region boundaries, calculated from the whole reference dataset not just the closest match.
If you are using a custom reference database, some amount of ambiguities are tolerated. For example, you might leave ambiguous residues (X) at positions across the CDR3 region.
Biologics comes with three fully supported germline gene reference databases: Human, Mouse (Mus musculus) and Alpaca (Vicugna pacos). We also have other species available on an as-is basis, please contact us if you would like access to any of these. You can also make your own reference database, as outlined in the below section.
How to create custom Reference Databases
The Reference Database section can be found on the navigation panel under Organization Databases. To create a new reference database, click on the 3 vertical dots to bring up the New database option.
Our main articles outline how to create Reference Databases:
- Please see How to make an Antibody Custom Reference Database for instructions on how to make a custom reference database using Biologics. This includes both Germline Gene or Antibody Template variable region databases.
- If you are wanting to create a reference database for analyzing non-antibody data, see General Template Databases
- If you are working with TCR sequences, please see this article on how to make a custom reference database containing TCR gene sequences: Analyzing TCR sequences in Geneious Biologics
- If you would like to manually annotate your germline or template variable sequences for antibody data, please see this article: How to Manually Annotate Reference Sequences
For more information on how to create Feature Databases, please see this article: Using Feature Databases to identify Constant Regions and Fusion Proteins