Geneious Biologics automatically generates a number of graphs to help you to interpret your sequences and find broader trends or discrepancies in your dataset. This article focuses on the graphs that represent your entire dataset. Biologics also produces a number of graphs that help you to represent and find trends within individual clusters/regions (eg. HCDR3 length); to find out more please see this article: Using graphs to interpret clusters and clonotypes.
How do I access the graphs?
Graphs will automatically populate under the Sequences Table in a result. The graphs described here are all viewable under the main All Sequences Table. If you would like to view any individual sequences, just select a row(s) and switch from the Graphs tab to the Sequence Viewer tab.
To bring up the graphs in a larger window, you can also click on the Graphs tab at the top of a result:
Graphs covered in this article:
Annotation Rates
The Annotation Rates graphs can tell you a lot about what regions and chains were found in your data, as well as the rate at which certain regions were identified relative to others and to the dataset as a whole.
In the example above, scFv data from NGS Tutorial 2 has been annotated. We can see that in almost all the sequences a Light chain was found, however a Heavy chain was only found in around 75% of the sequences. This is reflected in the annotation rates of all the Heavy regions (Heavy FR1, CDR1 etc). This may indicate a sequencing issue or improper design/cloning of the scFv library.
There is an additional drop-down menu on the right that allows you to select more stringent checks, such as In Frame and Fully Annotated:
Number of Clusters
This graph displays the number of clusters found for each region. For more information on what a cluster is see this article: Understanding "clusters". The dropdown menu on the right allows you to instead display, for example, all the clusters found within the sequences that were Without Stop Codons & In Frame & Fully Annotated.
By comparing the above two graphs for the Heavy chain data found in NGS Tutorial 1, we can see that the majority of the clusters found came from high quality sequences (Without Stop Codons & In Frame & Fully Annotated).
Number of Clusters (Nucleotide)
This graph is similar to the Number of Clusters graph above, except the clusters are determined by the nucleotide sequence rather than the protein sequence.
Number of Genes
This graph can be used to perform a rough check of the diversity of genes found within your dataset. For example, in the below graph displaying the data from NGS Tutorial 4, we can see that there are a greater number of different V genes identified in the Light chain sequences compared to the heavy chain sequences:
The next graph type (Gene Usage) further illustrates the increased genetic diversity found within the Light chains of the above dataset.
Gene Usage
This graph allows you to interrogate the genetic landscape of your data and identify the gene families most prevalent in your data.
The above graphs display the germline genes found within the dataset from NGS Tutorial 4. From this we can identify which major gene families are represented and we can also mouse over any of these bars to find out the actual frequency of each gene subfamily within the wider gene family. In the above example, IGLV2-8 (selected in green) makes up 0.992% the Light V genes identified. The wider gene family IGLV2 makes up 6.78% of all the Light V genes identified.