Using Graphs for Quality Assurance and Data Exploration

June 17, 2025 02:14
Updated

Geneious Biologics automatically generates a number of graphs to help you to interpret your sequences and find broader trends or discrepancies in your dataset. This article focuses on the graphs that represent your entire dataset. Biologics also produces a number of graphs that help you to represent and find trends within individual clusters/regions (eg. HCDR3 length); to find out more please see this article: Using graphs to interpret clusters and clonotypes.

How do I access the graphs?

Graphs will automatically populate under the Sequences Table in a result. The graphs described here are all viewable under the main All Sequences Table. If you would like to view any individual sequences, just select a row(s) and switch from the Graphs tab to the Sequence Viewer tab.

nice graphs shot.png

To bring up the graphs in a larger window, you can also click on the Graphs tab at the top of a result:

Screenshot_2023-05-23_at_12.34.52_AM.png

Custom Plots

The Custom Plots option allows you to make scatter plots using values in the All Sequences Table. Geneious Biologics supports the following kinds of plots:

Numerical series vs Numerical series
For example, Isoelectric Point vs BVP ELISA, as shown below.

scatterplot BVP ELISA vs V(D)J pI.png

Numerical series vs Categorical data
For example, HCDR3 length vs non-neutralizing binding targets, as shown below on a collection of antibodies derived from COVID-19 patient sera:

scatterplot neutralising vs HCDR3 length.png

The options also include coloring by numerical values like score. Note that added assay data and other metadata can be plotted in these scatterplots: Adding Assay Data to your Analysis Results.

Annotation Rates

The Annotation Rates graphs can tell you a lot about what regions and chains were found in your data, as well as the rate at which certain regions were identified relative to others and to the dataset as a whole.

Screen_Shot_2022-05-26_at_2.25.26_PM.png

In the example above, scFv data from NGS Tutorial 2 has been annotated. We can see that in almost all the sequences a Light chain was found, however a Heavy chain was only found in around 75% of the sequences. This is reflected in the annotation rates of all the Heavy regions (Heavy FR1, CDR1 etc). This may indicate a sequencing issue or improper design/cloning of the scFv library.

There is an additional drop-down menu on the right that allows you to select more stringent checks, such as In Frame and Fully Annotated:

Screen_Shot_2022-05-26_at_3.02.35_PM.png

Number of Clusters

This graph displays the number of clusters found for each region. For more information on what a cluster is see this article: Understanding "clusters". The dropdown menu on the right allows you to instead display, for example, all the clusters found within the sequences that were Without Stop Codons & In Frame & Fully Annotated.

Screen_Shot_2022-05-26_at_3.45.19_PM.png

By comparing the above two graphs for the Heavy chain data found in NGS Tutorial 1, we can see that the majority of the clusters found came from high quality sequences (Without Stop Codons & In Frame & Fully Annotated).

Number of Clusters (Nucleotide)

This graph is similar to the Number of Clusters graph above, except the clusters are determined by the nucleotide sequence rather than the protein sequence.

Number of Genes

This graph can be used to perform a rough check of the diversity of genes found within your dataset. For example, in the below graph displaying the data from NGS Tutorial 4, we can see that there are a greater number of different V genes identified in the Light chain sequences compared to the heavy chain sequences:

Screen_Shot_2022-06-01_at_12.07.34_AM.png

The next graph type (Gene Usage) further illustrates the increased genetic diversity found within the Light chains of the above dataset.

Gene Usage

This graph allows you to interrogate the genetic landscape of your data and identify the gene families most prevalent in your data.

Screen_Shot_2022-06-01_at_11.30.12_AM.png

The above graphs display the germline genes found within the dataset from NGS Tutorial 4. From this we can identify which major gene families are represented and we can also mouse over any of these bars to find out the actual frequency of each gene subfamily within the wider gene family. In the above example, IGLV2-8 (selected in green) makes up 0.992% the Light V genes identified. The wider gene family IGLV2 makes up 6.78% of all the Light V genes identified.