Using Graphs to interpret Clusters and Clonotypes

September 23, 2025 02:24
Updated

Geneious Biologics automatically generates a number of graphs to help you to interpret your sequences and find broader trends or discrepancies in your dataset. This article focuses on the graphs that represent the individual regions or clusters in your dataset. For more information on what a cluster is, see Understanding "Clusters".

Biologics also produces a number of graphs that can help you to interrogate your dataset as a whole: Using Graphs for Quality Assurance.
If you are comparing different panning rounds to find enriched sequences, scatterplots are also available. Please see Comparing Results across Multiple Experiments to learn more about this.

The videos in our Getting Started series may also be helpful, linked here. Below is our video on Understanding your Results in Graphs and Clusters.

How do I access the graphs?

Graphs will automatically be populated under the Sequences Table. Many of the graphs described in this article will be available when accessing different cluster tables, for example Heavy CDR3. To view a cluster table, select it in the Cluster Table dropdown:

cluster table graph example.png

To bring up the graphs in a larger window, you can also click on the Graphs tab at the top of a result:

Screenshot 2023-12-07 at 11.42.48 AM.png

Viewing different graphs

All the graphs in this section have a drop-down menu to the right of the graph that allows you to select either the type of graph or the region you would like to view:

graph navigation.png

Graphs covered in this article:

Cluster Similarity Tree and Network Plots
Cluster Relationships
Cluster Diversity
Cluster Lengths
Cluster Sizes
Amino Acid Distribution Chart
Gene Combinations Heat-maps
Codon Distribution Chart

Cluster Similarity Tree and Network Plots

When navigating to any single-region Cluster Table - for example the VDJ Region or the Heavy CDR3 Region - you can toggle between a Network (left) and Tree (right) plot of the clusters for that region. Up to a maximum of 1000 unique clusters will be plotted.

focused_library_trimer_pan1 Annotated & Clustered.png

The above two graphs were generated using our NGS Tutorial 2 pan_1 dataset.

By default, the Network plot of the clusters for a given region will be displayed, with the Tree plot accessible from the graph-chooser dropdown.
The Tree plot can be displayed with three different branch transforms:

No transform
Equal transform
Cladogram
- This option allows you to add Heatmap layers, plotting data from the table (eg. # of Sequences) around the circular tree. See our Alignment article for how to use our heatmap functions:

These graphs are both interactive, allowing you to zoom and hover over clusters of interest to bring up relevant information. For more information on their underlying algorithm, see Network and Tree plots: Identifying clonotype and sequence relationships.

Cluster Relationships

The Cluster Relationships graph loads automatically below the All Sequences Table upon opening an Annotator Result. This graph shows the the proportion of each identified region/gene which is associated with another region/gene.

Sankey example.png

The default clusters shown are Heavy V Gene > HCDR3 > VDJ Region. If these clusters do not exist for the result, the first three clustered regions/genes will be displayed.

The right-hand panel can be used to Add Regions, Re-order regions (by clicking and dragging the icon to the left of the regions/genes) and to increase the maximum number of clusters represented by a region/vertical "column".

Cluster Diversity

This graph displays how many clusters were found of each "size" (the number of sequences that were assigned to an individual cluster). In the example below using the dataset from NGS Tutorial 1, the number of HCDR3 (85% similarity) clusters are shown grouped according to how many sequences are contained in each cluster.

Screen_Shot_2022-06-02_at_3.51.05_PM.png

We can see that in this case, the vast majority of the clusters found (~1250) had only one unique HCDR3 sequence. About ~340 clusters contained two HCDR3 sequences that could be grouped together according to an 85% similarity threshold. If you'd like to learn more about how to make threshold based or combination clusters, see this article: Clustering: Advanced Options.

Cluster Lengths

This graph plots the varying sequence lengths (amino acids) of a selected region. In the below example using the sequences from Sanger Tutorial 2, the various lengths of the various lengths of the HCDR3 region are plotted.

Screen_Shot_2022-06-02_at_4.46.26_PM.png

As we can see, the HCDR3 lengths follow a roughly normal distribution centered around a length of 12 amino acids.

Cluster Sizes

This graph shows the number of sequences assigned to each different cluster found in your data for a given region(s) and/or gene(s). In the example below from NGS Tutorial 1, we have chosen the cluster of Heavy CDR3 V Gene J Gene (85% Similarity on Heavy CDR3) - this cluster is therefore identifying the most abundant clonotypes in the data.

Screen_Shot_2022-06-07_at_9.34.50_AM.png

As we can see above, the most abundant clonotype is those sequences that have an HCDR3 that is at least 85% similar to the sequence ASYYYGSSSFAY with the germline genes best matching IGHV14-3 and IGHJ3. You can mouse over any of the columns to find the exact number of sequences and the name of the cluster if it is not showing on the graph.

If you would like to learn more about clonotypes, see our main article Understanding Clonotypes or watch the video below.

Amino Acid Distribution Chart

This graph plots the frequency at which each different amino acid is found at any given position for a region. It combines ALL the sequences in your dataset found for a region of specific length. In the below example using the data from NGS Tutorial 5, the amino acid variation at each position is shown for all the HCDR3 sequences that are 14 residues long.

Screen_Shot_2022-06-07_at_11.14.08_AM.png

To the right of the graph you have the following options to select from:

- The region (eg. Heavy FR4, Light CDR3)

- The amino acid length (if the region varies in length)

- The colors used for the amino acids (Rasmol, Hydrophobicity, Polarity, Clustal, Structural AAs and
Cysteines). To learn more about what these different color schemes are, see this article.

***Note: This chart is very similar to the Sequence Logo tab, which plots the amino acid variation within any selected cluster(s) of an Annotator Result document. You can access the sequence logo tab by selecting the cluster(s) you would like to see the variation in and switching from the Sequence Viewer tab to the Sequence Logo tab. The Sequence Logo tab can also be accessed for any region Alignment: How do I align sequences?

Gene Combinations Heat-maps

The gene combinations heat map shows the combinations of V and J genes used for both the Light and Heavy chains in your dataset. The Light or Heavy heat-maps can be accessed via the drop-down menu to the right of the graph.

Screen_Shot_2022-06-01_at_11.39.02_AM.png

In the above example using the data from NGS Tutorial 5, we can see that the most common combination of V and J genes in the Heavy chain is IGHV4-34 and IGHJ4.

Codon Distribution Chart

This graph is similar to the heat map chart for gene combinations in that it displays the relative frequencies that different codons are used for each amino acid in a given region. In the below example using the data from NGS Tutorial 4, we are viewing the different codons used at each amino acid position in all the HCDR3 regions of length 14. It is also possible to look at the codons used across the HCDR3 regions of all lengths, which will reveal which codons are favored for each individual amino acid, regardless of position.

Screen_Shot_2022-06-07_at_2.28.44_PM.png

We can see that of the HCDR3 regions that are 14 amino acids long, the most common amino acid at position 1 is Ala, encoded most of the time with the codon GCG.