This article lists the columns for a generic Cluster Table in an Antibody Annotator result and what each column represents. If you are unsure what a cluster is, please see our article on Understanding "Clusters". If you are interested in the All Sequences Table columns of Antibody Annotator, please see our other article: Antibody Annotator: All Sequences Table Columns.
Jump to:
The Cluster Tables
Each row of a Cluster Table represents a number of sequences that have been grouped together due to sharing the same sequence (or close to the same sequence) across a given region(s) or gene.
For example, the cluster table below shows sequences grouped together due to having at least 85% sequence identity across the Heavy CDR3 region. There were 39 sequences that had a HCDR3 of "ARWEYYAMDY" or at least 85% similar. This corresponded to 0.99% of the total dataset.
For more information on what a cluster is, please see our article on Understanding "Clusters".
How to search for columns
To search for any column, go to Table preferences (1), start typing into the search bar (2) and hover over the column you would like to navigate to and click on the Focus Column button that appears (3) as shown below:
For more column management options, see How to Customize the Sequences Table.
Filtering
In addition, all the cells of the table can be Filtered upon, allowing you to pull out sequences of interest by right clicking on the cell and selecting "Filter..." as shown below:
After selecting a cell to filter on, it will be added to the filter bar above, where it can be edited. Filters can also be layered; right clicking on another cell will allow you to add another filter with an AND operator. Our filtering uses SQL syntax, please see our main article on Filtering your Sequences for more detail and examples.
General cluster columns
-
ID
- Each cluster is given a numerical ID number. The number is a ranking of how large the cluster is - ie. how many sequences have been grouped together due to conserved amino acid sequence across the region(s) or by being derived from the same gene (gene clusters). The largest cluster is given an ID of 1.
-
Labels
- This column contains any custom labels you have added to tag your cluster. See Using Custom Labels to learn more
-
Notes
- Here you can type in notes for any cluster by double-clicking on the cell.
-
[Cluster-Name]
-
This column will be the name of the cluster (eg. Heavy CDR3) and contains the amino acid sequence for that clustered region. For example, a Heavy CDR3 cluster might contain ARWEYYAMDY as the dominant sequence.
- For an inexact cluster, the most common sequence is listed.
- For multiple region clusters, the sequence is given in the form:
Region1Sequence-Region2Sequence - For gene clusters, the closest germline gene is listed
-
This column will be the name of the cluster (eg. Heavy CDR3) and contains the amino acid sequence for that clustered region. For example, a Heavy CDR3 cluster might contain ARWEYYAMDY as the dominant sequence.
-
Length
- The length of the clustered region in amino acids. For example, a HCDR3 cluster might have a length of 13 as all the sequences in the cluster have an HCDR3 length of 13 residues.
- This column is not present for multi-region or gene clusters
- The length of the clustered region in amino acids. For example, a HCDR3 cluster might have a length of 13 as all the sequences in the cluster have an HCDR3 length of 13 residues.
-
Total
- The number of individual sequences that make up the cluster
-
Frequency %
- The proportion that the cluster makes up of the total dataset
-
Database Name
- Gene Clusters only. This lists the reference database match for the gene, which can be useful when working with hybridized sequences (eg. Humanized mice)
-
% Fully Annotated
- This indicates the proportion of sequences within the cluster that could be fully annotated. This does not necessarily mean that the sequence(s) are in frame, or without stop codons.
-
% In Frame & Fully Annotated
- This indicates the proportion of sequences within the cluster that were fully annotated and in frame. This does not necessarily mean that the sequence(s) are without stop codons.
-
% Without Stop Codons & In Frame & Fully Annotated
- This indicates the proportion of sequences within the cluster that were without stop codons, in frame and could be fully annotated.
- This indicates the proportion of sequences within the cluster that were without stop codons, in frame and could be fully annotated.
Secondary cluster columns
These refer to the sequences of other regions that your primary cluster is most commonly associated with. For example, for a cluster of the Heavy CDR3 region, there will be listed the most common FR sequences, CDR sequences and the most common V(D)J. These are the "Secondary Clusters", and will be generated for any available region, including the associated heavy or light chain. All these columns will have names given in the format below:
-
[Secondary cluster region] (ClusterID-Sequence)
- This will list the first 5 associated region sequences for the given primary cluster. This will be in the format ID-Sequence (%). This gives the secondary cluster a numeric ID (the ID is explained above under General cluster columns) followed by the amino acid sequence and the percentage of sequences in the primary cluster that had the secondary associated cluster.
For example, the secondary Heavy CDR1 cluster for a Heavy CDR3 primary cluster might read: 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%) This means that of the sequences in the given Heavy CDR3 cluster, 53.85% of those had a CDR1 of RYTMH.
- This will list the first 5 associated region sequences for the given primary cluster. This will be in the format ID-Sequence (%). This gives the secondary cluster a numeric ID (the ID is explained above under General cluster columns) followed by the amino acid sequence and the percentage of sequences in the primary cluster that had the secondary associated cluster.
-
[Secondary cluster region] (Top 5) %
- This is the summed percents of the top 5 secondary clusters. If this is less than 100, that means that some of the sequences in the primary cluster had a different amino acid sequence than those listed in the above column.
Carrying on the example from above, if the secondary Heavy CDR1 (ClusterID-Sequence) read 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%), then the (Top 5)% column would read 92.31. This means that 7.69% of the Heavy CDR3 sequences in this cluster had a HCDR1 that was not RYTMH, DYYMH, DTYMH, NYLIE or SGYYWN.
- This is the summed percents of the top 5 secondary clusters. If this is less than 100, that means that some of the sequences in the primary cluster had a different amino acid sequence than those listed in the above column.
-
[Secondary cluster region] Nucleotides (ClusterID-Sequence)
- Similar to the [Secondary cluster region] (ClusterID-Sequence) column, but for the nucleotide sequence
-
[Secondary cluster region] Nucleotides (Top 5) %
- Similar to the [Secondary cluster region] (Top 5) % column, but for the nucleotide sequence
- Similar to the [Secondary cluster region] (Top 5) % column, but for the nucleotide sequence
Additional columns for inexact (percentage) clusters
-
Primary Exact Cluster %
- Since an inexact cluster contains sequences that are not all identical across the clustered region, the sequence that represents the largest proportion of the cluster is given as a percentage of the inexact cluster.
-
# Exact Clusters
- This column gives the number of exact clusters that are included in the inexact cluster.
-
Eveness
- This describes how "even" the cluster is. The value is calculated using Pielou's evenness index, which goes from 0 (uneven) to 1 (even). A cluster heavily skewed towards a dominant representative sequence will have a score closer to 0, while a cluster where each sequence within the cluster is found at similar abundances would have a value closer to 1.
-
-
- This column lists the different amino acid sequences that make up an inexact cluster and the percentage each unique sequence makes up of the cluster.
-
-
- This column lists the different amino acid sequences that make up an inexact cluster and the count for each of these sequences.
- This column lists the different amino acid sequences that make up an inexact cluster and the count for each of these sequences.
How to export an Excel file of selected columns
Before exporting any sequence/cluster Table, you may find it useful to both filter your sequence results and select the columns you want using Table Column Preferences. Below, I have filtered for clusters that were found at a frequency above 0.1 % of the total dataset, and I have selected to display only the following columns:
- ID
- Total
- Frequency %
- Cluster Contents % (Top 100)
*** Note that you can save these column table preferences as Profiles. See this article: How to customise the Sequences Table.
Click Export Table once you have selected your sequences. This will open a pop-up allowing you to select the output format (Excel or .csv) and the option to export only the selected columns, or all hidden columns.
Make sure to select Only Visible if you would just like to export your selected columns.