Geneious Biologics automatically generates a number of graphs to help you to interpret your sequences and find broader trends or discrepancies in your dataset. This article focuses on the graphs that represent the individual regions or clusters in your dataset. For more information on what a cluster is, please see this article. Biologics also produces a number of graphs that can help you to interrogate your dataset as a whole: Using Graphs for Quality Assurance.
The videos in our Getting Started series may also be helpful, linked here. Below is our video on Understanding your Results in Graphs and Clusters.
How do I access the graphs tab?
The graphs tab can be found for any Biologics Annotator Result document, whether you have used Antibody Annotator or Single Clone Antibody Analysis. It is found next to the Sequences Table:
Viewing different cluster graphs
All the graphs in this section have a drop-down menu to the right of the graph that allows you to select the cluster (region and/or gene) you would like to view:
Graphs covered in this article:
- Cluster Diversity
- Cluster Lengths
- Cluster Sizes
- Amino Acid Distribution Chart
- Gene Combinations Heat-maps
- Codon Distribution Chart
Cluster Diversity
This graph displays how many clusters were found of each "size" (the number of sequences that were assigned to an individual cluster). In the example below using the dataset from NGS Tutorial 1, the number of HCDR3 (85% similarity) clusters are shown grouped according to how many sequences are contained in each cluster.
We can see that in this case, the vast majority of the clusters found (~1250) had only one unique HCDR3 sequence. About ~340 clusters contained two HCDR3 sequences that could be grouped together according to an 85% similarity threshold. If you'd like to learn more about how to make threshold based or combination clusters, see this article: Clustering: Advanced Options.
Cluster Lengths
This graph plots the varying sequence lengths (amino acids) of a selected region. In the below example using the sequences from Sanger Tutorial 2, the various lengths of the various lengths of the HCDR3 region are plotted.
As we can see, the HCDR3 lengths follow a roughly normal distribution centered around a length of 12 amino acids.
Cluster Sizes
This graph shows the number of sequences assigned to each different cluster found in your data for a given region(s) and/or gene(s). In the example below from NGS Tutorial 1, we have chosen the cluster of Heavy CDR3 V Gene J Gene (85% Similarity on Heavy CDR3) - this cluster is therefore identifying the most populated clonotypes in the data.
As we can see above, the most populated clonotype is those sequences that have an HCDR3 that is at least 85% similar to the sequence ASYYYGSSSFAY with the germline genes best matching IGHV14-3 and IGHJ3. You can mouse over any of the columns to find the exact number of sequences and the name of the cluster if it is not showing on the graph.
Amino Acid Distribution Chart
This graph plots the frequency at which each different amino acid is found at any given position for a region. It combines ALL the sequences in your dataset found for a region of specific length. In the below example using the data from NGS Tutorial 5, the amino acid variation at each position is shown for all the HCDR3 sequences that are 14 residues long.
To the right of the graph you have the following options to select from:
- The region (eg. Heavy FR4, Light CDR3)
- The amino acid length (if the region varies in length)
- The colours used for the amino acids (Rasmol, Hydrophobicity, Polarity, Clustal, Structural AAs and
Cysteines). To learn more about what these different colour schemes are, see this article.
***Note: This chart is very similar to the Sequence Logo tab, which plots the amino acid variation within any selected cluster(s) of an Annotator Result document. You can access the sequence logo tab by selecting the cluster(s) you would like to see the variation in and switching from the Sequence Viewer tab to the Sequence Logo tab. The Sequence Logo tab can also be accessed for any region Alignment: How do I align sequences?
Gene Combinations Heat-maps
The gene combinations heat map shows the combinations of V and J genes used for both the Light and Heavy chains in your dataset. The Light or Heavy heat-maps can be accessed via the drop-down menu to the right of the graph.
In the above example using the data from NGS Tutorial 5, we can see that the most common combination of V and J genes in the Heavy chain is IGHV4-34 and IGHJ4.
Codon Distribution Chart
This graph is similar to the heat map chart for gene combinations in that it displays the relative frequencies that different codons are used for each amino acid in a given region. In the below example using the data from NGS Tutorial 4, we are viewing the different codons used at each amino acid position in all the HCDR3 regions of length 14. It is also possible to look at the codons used across the HCDR3 regions of all lengths, which will reveal which codons are favoured for each individual amino acid, regardless of position.
We can see that of the HCDR3 regions that are 14 amino acids long, the most common amino acid at position 1 is Ala, encoded most of the time with the codon GCG.