Using Feature Databases to identify Fusion Proteins

October 25, 2024 03:01
Updated

The Antibody Annotator has the capacity to add additional annotations that are not part of the variable domain of IgG-like proteins. This article explains how to create a Feature Database for annotating features like fusion proteins and other large sequence features, allowing a degree of sequence mismatch if desired.

If you would like to identify short, exact motifs like HisTags and signal peptides etc you can see this article to learn how to specify your own Liabilities and Assets.

Creating a feature database

To use this function, you will need to first create a Feature database containing Annotated sequences with those features you would like to add.

Make sure any sequences you would like to add to the feature database are annotated with a unique Annotation Type. Adding annotations is demonstrated below in Geneious Prime:

Screenshot_2023-03-14_at_4.40.51_PM.png

Screenshot_2023-03-14_at_4.38.36_PM.png

A feature database can contain individual sequences or a sequence list. Each sequence fragment should have a single annotation on it. The annotation needs to have two properties:

'Name' which should be a readable, unique name describing the annotated region.
'Type', which determines how the region is treated during analysis. You may have a mix of multiple feature types within your database, such as both Primer and Signal_Peptide types.

When your sequences are properly annotated, go to the Organization Databases section in the left navigation bar in Biologics and hover over Reference Sequences. Three dots will appear to the left, click these and select New database.

Screenshot_2023-05-15_at_10.46.45_PM.png

Name your database and make sure to create a "Feature" type database.

Screenshot_2023-05-15_at_11.36.13_PM.png

Note: Annotated Germline databases are instead used for annotating your antibody sequences. Please refer to the following article to learn more on how to create your own reference database.

Following this, upload any reference sequences into this databases. You can add as many sequences as required into your database but note that:

We currently only support nucleotide sequences
You can have reference sequences containing ambiguous bases but those sequences must have an unambiguous stretch of at least 10 nucleotides

Annotating your sequences with a Feature Database

To annotate these additional features, select the Additional Features option in the Antibody annotator window and select your Feature database from the dropdown.

Screen_Shot_2022-07-08_at_5.45.46_PM.png

You can specify the percentage of mismatches between your query (input sequence) and the reference sequence by inputting the appropriate number in the Mismatches % field. In addition to this, you can also specify the Gap Size in the same manner.

Examples

Phage Display

Below is an example of a database containing a pIII and a PelB leader sequence involved in Phage display.

Antibody-GFP fusion protein

Below is an example of a variable region fused to a GFP protein, annotated using Antibody Annotator and a custom feature database containing the GFP sequence.

Screen_Shot_2022-07-08_at_5.56.24_PM.png