This article describes how to turn on the option for annotating germline gene differences in Antibody Annotator and Single Clone Analysis, as well as how to then view these mismatches relative to the germline in your sequences. Note that this function annotates differences relative to the Reference Database used. To learn how to make your own reference databases, view this article.
How to turn on Annotate germline gene differences
To annotate the differences between the input sequence and reference sequence, select the Annotate germline gene differences option in the Antibody Annotator or Single Clone Analysis popup window.
It can also be helpful to turn on Always annotate entire regions (except CDR3) in the Advanced option tab, to ensure that you get variants identified across the full variable region.
Run the pipeline as usual. To learn more about the Antibody Annotator and the available functions, please refer to the following article. The corresponding article for Single Clone Analysis can be found here.
Viewing germline mismatches in the Sequence Viewer
To view the germline gene differences, select the Antibody annotator output file and select the sequence that you are interested in.
- In general, for every sequence with germline differences, there will be a DNA (nucleotide) and a AA (amino acid) variant annotation track.
- When equally good matches for the same gene would have exactly the same nucleotide or amino acid differences, these annotations are reduced to just a single track named after both (or more) genes.
- Amino acid differences use the frame of the closest FR/CDR that starts at or before the start of the gene. So for example if there is a frame shift in FR1, this won't have an effect on the J Gene amino acid differences which will use the frame that CDR3 starts in.
- To turn off the annotation tracks, simply uncheck the DNA Variant and AA Variant annotation tracks in the Sequence Viewer Annotation panel.
Viewing germline mismatches in the Sequences Table
When Annotate germline gene differences is selected, a number of additional columns are present in the All Sequences table, containing information about variants across different regions of interest.
Any mismatches between the best-matched gene in the germline reference database and your annotated sequences are outputted into the Sequence Table as mismatches in the standard format: eg. Q17E for a substitution of Gln to Glu at position 17. Numbering is based on the IMGT system unless otherwise specified under Analysis Options when running Antibody Annotator.
The following general naming schema is used in Biologics for mutations other than substitutions:
- Deletions Eg. L4del (Leu at position 4 deleted) or P9_G11del (Pro at position 9 through to Gly at position 11 deleted)
- Insertions Eg. T18_L19insK (Lys has been inserted between Thr18 and Leu19)
- Deletion-Insertions Eg. S88_L91delinsT (Residues Ser88 through to Leu91 have been deleted and replaced with a Thr)
Note: IMGT numbering includes suffixes to some positions eg. position 98 a, b, c etc. An example of a mutation that includes this notation may look like P98a_D98bDel. This means that two amino acids (Pro and Asp) found at position 98a and 98b in the germline have been deleted.
In the above image, we can see that in the Heavy FR4 region of this sequence there are a total of two AA Germline Mismatches that correspond to substitutions T122I and L123V. These Germline Mismatches indicate the differences between the sequenced FR4 region and the best-fit candidate gene found in the germline: IGHJ4-02
In cases where the region (eg. FR1) is longer than the gene region (eg. Heavy V Gene) or within the highly variable CDR3, nucleotides which are not covered by a gene annotation are considered to be mismatches for any mismatch statistics.
- The number of mismatches in the DNA with respect to the best-fit germline gene are also recorded in the sequences table.
- All or any parts of the Sequences Table can be exported as an .xlsx file by selecting the columns under Table Preferences and then selecting Export in the Table Viewer.
** See Varnomen for more information on HGVS nomenclature
Germline Gene Statistics
In addition to the variant annotations, you can also view the closest V,D, and J gene matches for each of your sequences. Biologics also calculates percentage identity of your sequence relative to the closest gene (Heavy V Gene Identity below), and the percentage of the complete gene that the target sequence matches (Heavy V Gene Coverage).
As shown in the image above, filtering and sorting on these gene match statistics can aid in investigating the characteristics of your sequence data set. There is more detailed information about filtering here.