This article outlines how to explore and plot the amino acid variation within regions, such as the FRs, CDRs and even the entire V(D)J region. There are multiple ways to access these plots in Geneious Biologics. The Sequence Logo can be calculated for alignments of selected region(s) or for inexact clusters, while the Amino Acid Distribution Chart found in the Graphs tab can be used to find the variation in amino acids across regions of a fixed length.
Jump to:
Sequence Logo
The sequence logo represents the proportion of each amino acid found at a given position in a region. The above Sequence Logo shows the variation in amino acids at each position of the HCDR3. To the right of the Sequence Logo, you can select to plot the amino acids by Frequency or Entropy, and color the amino acids by a variety of options.
Above is the Sequence Logo for the same region, but plotted according to Entropy and colored according to the hydrophobicity of the individual amino acids.
Coloring Options
- Default
- Geneious Biologics color scheme for proteins
- Geneious Biologics color scheme for proteins
- Geneious
- Default amino acid colors used in Geneious Prime
- Default amino acid colors used in Geneious Prime
- Rasmol
- The Rasmol scheme colors amino acids according to traditional amino acid properties. Amino acids associated with the outer surface of a protein are given bright colours and non-polar residues are darker. Most colours are hallowed by tradition.
Bright red: D, Q
Blue: K, R
Mid blue: F, Y
Light grey: G
Dark grey: A
Pale blue: H
Yellow: C, M
Orange: S, T
Cyan: N, Q
Green: L, V, I
Pink: W
Flesh: P
- The Rasmol scheme colors amino acids according to traditional amino acid properties. Amino acids associated with the outer surface of a protein are given bright colours and non-polar residues are darker. Most colours are hallowed by tradition.
- Hydrophobicity
- This colors amino acids from red through to blue according to their hydrophobicity value,
where red is the most hydrophobic and blue is the most hydrophilic. The values of the color scale are given in the figure below. These values are taken from Expasy:
- This colors amino acids from red through to blue according to their hydrophobicity value,
- Polarity
- This colors amino acids according to their polarity as follows:
Yellow: Non-polar (G, A, V, L, I, F, W, M, P)
Green: Polar, uncharged (S, T, C, Y, N, Q)
Red: Polar, acidic (D, E)
Blue: Polar, basic (K, R, H)
- This colors amino acids according to their polarity as follows:
- Clustal
-
This colors amino acids according to their properties and is adapted from Clustal to incude acidic residues as follows:
Orange: G, P, S, T
Red: H, K, R
Blue: F, W, Y
Green: I, L, M, V
Purple: D, E
-
- Structural AAs
- This colors amino acids F, Y, W, P and G light green.
- This colors amino acids F, Y, W, P and G light green.
- Cysteines
- This colors cysteines yellow.
- This colors cysteines yellow.
Inexact Clusters
Inexact clusters include both identity and similarity based clusters. If you are unsure what a cluster refers to, see Understanding "Clusters".
To access the Sequence Logo for the region of a similarity or identity based cluster, navigate to your cluster table of choice in Antibody Annotator and select a single cluster. This will display all the sequences contained in that cluster in the Sequence Viewer below. Switch from the Sequence Viewer tab to the Sequence Logo tab.
In the example above, the largest cluster for the Heavy CDR3 region (with an 85% similarity threshold) has been selected. This cluster contains all the sequences in the dataset that had a Heavy CDR3 of ASYYYGSSSFAY or a sequence at least 85% similar. Using the sequence logo we can see that certain residues in the Heavy CDR3 are highly conserved, while other residues show greater variation.
All plots can be exported as images or .csv files using the Export dropdown in the top left.
Note that you can only view the contents of one cluster at a time with the sequence logo. If you would like to see the Sequence Logo for multiple clusters with regions of varying length, you can perform an alignment. This is outlined below.
How to view the Sequence Logo for Alignments
Performing an alignment before viewing the Sequence Logo allows you to compare the variation in amino acids across regions of varying lengths. To perform an alignment on the sequenced region of each cluster, first navigate to your cluster table of choice and select the clusters you would like to include in your alignment. In the below example, the Heavy CDR3 (85% Similarity) cluster table has been used.
A filter has been applied to find only those clusters that had a total number of sequences greater than 15. You can learn more about filtering here: Filtering your Sequences. The 2nd largest cluster containing frameshifted sequences has been de-selected.
To align the sequence(s) in these clusters, go to Post-processing (highlighted in orange above) and select Align... from the dropdown. When aligning multiple clusters, you have the option to choose which sequences to align from each cluster. Remember that only the Clustered Region will be aligned - in this case the Heavy CDR3.
Options include:
- Majority sequence only
- Only the most common sequence from each cluster will be aligned
- Threshold by count
- Any sequence that is found more than X times in a cluster will be included in the alignment, where X is the count
- Threshold by frequency
- Any sequence that represents more than X% of the cluster will be included in the alignment, where X is the percentage
- All sequences
- Aligns every sequence from each cluster
In the above example, only those Heavy CDR3 regions that made up at least 10% of each individual cluster will be aligned. The resulting alignment contains 31 Heavy CDR3 sequences of varying lengths. To view the sequence logo for the alignment, click on the Sequence Logo tab.
All plots can be exported as images or .csv files using the Export dropdown in the top left.
Amino Acid Distribution Charts
These plots can be accessed via the Graphs tab of any Biologics Annotator Result Document. It might first be useful to look at the Cluster Lengths graph to determine the most common length for the region you are interested in. In the above example, we can see that the most common VDJ region length is 119 residues long.
We can then navigate the the Amino Acid Distribution Chart and specify a VDJ region length of 119. The chart below is colored according to Polarity.
All plots can be exported as images or .csv files using the Export dropdown to the right of the graph droop-down.