This article outlines how to explore and plot amino acid variation by position within regions, such as the FRs, CDRs and even the entire V(D)J region. There are multiple ways to access these plots in Geneious Biologics.
Jump to:
- What variation are you interested in?
- Sequence Logo
- Amino Acid Distribution Chart
-
Colouring and plotting options
What variation are you interested in?
- If you would like to view the variation across a single region (eg the HCDR3), but include CDR3s of different lengths you would first need to perform an Alignment, which also produces a Sequence Logo
- If you are interested in viewing the variation found within a single inexact cluster (eg. HCDR3 clustered at 85% identity) you would view the Sequence Logo from the cluster table
-
If you would like to view the variation across the whole dataset for all regions of a fixed length (say 11 residue HCDR3s), you can view the Amino Acid Distribution Chart
Sequence Logo
The sequence logo represents the proportion of each amino acid found at a given position in a region across a set of sequences.
The sequence logo can be accessed in two ways: For individual inexact clusters (below) or for an alignment document (following section).
Sequence logos for Inexact Clusters
Inexact clusters include both identity and similarity based clusters. If you are unsure what a cluster refers to, see Understanding "Clusters". You can also Add Clusters to your Results if you do not have an inexact cluster available.
To access the Sequence Logo for the region of a similarity or identity based cluster:
- Navigate to an inexact cluster table of your choice in a Biologics Annotator Result - Heavy CDR3 (85% Similarity) in the example below.
- Select a single cluster. This will display all the sequences contained in that cluster in the Sequence Viewer below.
- Switch from the Sequence Viewer tab to the Sequence Logo tab.
In the example above, the largest cluster for the Heavy CDR3 region (with an 85% similarity threshold) has been selected. This cluster contains all the sequences in the dataset that had a Heavy CDR3 of ASYYYGSSSFAY or a sequence at least 85% similar. Using the sequence logo we can see that certain residues in this Heavy CDR3 cluster are highly conserved, while other residues show greater variation.
All plots can be exported as images or .csv files using the Export dropdown in the top left.
Note: you can only view the contents of one cluster at a time with the sequence logo. If you would like to see the Sequence Logo for multiple clusters with regions of varying length, you can perform an alignment. This is outlined below.
Sequence logos for Alignments
Performing an alignment before viewing the Sequence Logo allows you to compare the variation in amino acids across regions of varying lengths. To perform an alignment on the sequenced region of each cluster, first navigate to your cluster table of choice and select the clusters you would like to include in your alignment.
In the below example, the Heavy CDR3 (85% Similarity) cluster table has been used.
A filter has been applied to find only those clusters that had a total number of sequences greater than 15. You can learn more about filtering here: Filtering your Sequences. The 2nd largest cluster containing frameshifted sequences has been de-selected.
To align the sequence(s) in these clusters:
- Go to Post-processing (highlighted in orange above)
- Select Align... from the dropdown
- Choose the relevant options (see our main Sequence Alignment article) and click Run
After the alignment job has completed, open the alignment document. To view the sequence logo for the alignment, click on the Sequence Logo tab, as shown below:
All plots can be exported as images or .csv files using the Export dropdown in the top left.
Amino Acid Distribution Charts
Graphs are populated for all Annotator Result Documents, and will display under the Sequences Table automatically. To learn more about the graphs produced by Geneious Biologics see:
The plots are also accessible in a larger view via the Graphs tab of any Biologics Annotator Result document:
It might first be useful to look at the Cluster Lengths graph to determine the most common length for the region you are interested in. In the below example, we can see that the most common VDJ region length is 119 residues long.
We can then navigate the the Amino Acid Distribution Chart via the Graph Type: dropdown and specify a VDJ region length of 119. The chart below is colored according to Polarity.
All plots can be exported as images or .csv files using the Export dropdown o the right of the graph drop-down.
Coloring and plotting options
Plot by (Sequence Logo only)
-
Frequency
- Each position is given the same weighting, and the fraction that each residue made up at each position is represented
-
Entropy
- Entropy is a metric that quantifies uncertainty, with more variation at a given position contributing to lower entropy. The Entropy calculation used is Shannons Entropy. Positions that are highly conserved will appear larger, while positions with more variation will appear smaller.
- Entropy is a metric that quantifies uncertainty, with more variation at a given position contributing to lower entropy. The Entropy calculation used is Shannons Entropy. Positions that are highly conserved will appear larger, while positions with more variation will appear smaller.
Colour by (Sequence Logo and Amino Acid Distribution Chart)
-
Default
- Geneious Biologics color scheme for proteins
- Geneious Biologics color scheme for proteins
-
Geneious
- Default amino acid colors used in Geneious Prime
- Default amino acid colors used in Geneious Prime
-
Rasmol
- The Rasmol scheme colors amino acids according to traditional amino acid properties. Amino acids associated with the outer surface of a protein are given bright colors and non-polar residues are darker. Most colors are hallowed by tradition.
Bright red: D, Q
Blue: K, R
Mid blue: F, Y
Light grey: G
Dark grey: A
Pale blue: H
Yellow: C, M
Orange: S, T
Cyan: N, Q
Green: L, V, I
Pink: W
Flesh: P
- The Rasmol scheme colors amino acids according to traditional amino acid properties. Amino acids associated with the outer surface of a protein are given bright colors and non-polar residues are darker. Most colors are hallowed by tradition.
-
Hydrophobicity
- This colors amino acids from red through to blue according to their hydrophobicity value,
where red is the most hydrophobic and blue is the most hydrophilic. The values of the color scale are given in the figure below. These values are taken from Expasy:
- This colors amino acids from red through to blue according to their hydrophobicity value,
-
Polarity
-
This colors amino acids according to their polarity as follows:
Yellow: Non-polar (G, A, V, L, I, F, W, M, P)
Green: Polar, uncharged (S, T, C, Y, N, Q)
Red: Polar, acidic (D, E)
Blue: Polar, basic (K, R, H)
-
This colors amino acids according to their polarity as follows:
-
Clustal
-
This colors amino acids according to their properties and is adapted from Clustal to incude acidic residues as follows:
Orange: G, P, S, T
Red: H, K, R
Blue: F, W, Y
Green: I, L, M, V
Purple: D, E
-
-
Structural AAs
- This colors amino acids F, Y, W, P and G light green.
- This colors amino acids F, Y, W, P and G light green.
-
Cysteines
- This colors cysteines yellow.
- This colors cysteines yellow.