This article lists the columns for a generic Cluster Table in an Single Cell Antibody Annotator result and what each column represents. If you are unsure what a cluster is, please see our article on Understanding "Clusters". If you are interested in the All Sequences Table columns of NGS Antibody Annotator, see NGS Antibody Analysis: All Sequences Table Columns.
Jump to:
The Cluster Tables
Each row of the Cluster Table represents the number of unique sequences that have been grouped together due to sharing the same identity (or close to the same identity) across a given region(s) or gene.
For example, the inexact cluster table below shows sequences grouped together due to having at least 90% sequence identity across the Heavy CDR3 region and the same V and J genes:
There were 950 Sequences that had a Heavy CDR3 of ATARRGQRIYGVVSFGEFFYYYYMDV or a sequence at least 90% identical. This made up 27.65% of the total sequences in the result. The primary sequence (or exact cluster) of ATARRGQRIYGVVSFGEFFYYYYMDV made up 89% of this inexact cluster, which did contain other HCDR3 sequences that were at least 90% similar. In total, there were 25 unique HCDR3 sequences (or exact clusters) that made up this inexact cluster.
For more information on what a cluster is, please see our article on Understanding "Clusters".
How to search for columns
To search for any column, go to Table preferences (1), start typing into the search bar (2) and hover over the column you would like to navigate to and click on the Focus Column button that appears (3) as shown below:
For more column management options, see How to Customize the Sequences Table.
Filtering
In addition, all the cells of the table can be Filtered upon, allowing you to pull out sequences of interest by right clicking on the cell and selecting "Filter..." as shown below:
After selecting a cell to filter on, it will be added to the filter bar above, where it can be edited. Filters can also be layered; right clicking on another cell will allow you to add another filter with an AND operator. Our filtering uses SQL syntax, please see our main article on Filtering your Sequences for more detail and examples.
General cluster columns
-
ID
Each cluster is given a numerical ID number. The number is a ranking of how large the cluster is - ie. how many sequences have been grouped together due to conserved amino acid sequence across the region(s) or by being derived from the same gene (gene clusters). The largest cluster is given an ID of 1. -
Labels
This column contains any custom labels you have added to tag your cluster. See Using Custom Labels to learn more -
Notes
Here you can type in notes for any cluster by double-clicking on the cell. -
[Cluster-Name]
This column will be the name of the cluster (eg. Heavy CDR3) and contains the amino acid sequence for that clustered region. For example, a Heavy CDR3 cluster might contain ARWEYYAMDY as the dominant sequence.- For an inexact cluster, the most common sequence is listed.
-
For multiple region clusters, the sequence is given in the form:
Region1Sequence-Region2Sequence - For gene clusters, the closest germline gene is listed
-
Length
The length of the clustered region in amino acids. For example, a HCDR3 cluster might have a length of 13 as all the sequences in the cluster have an HCDR3 length of 13 residues.- This column is not present for multi-region or gene clusters
-
# of Clones
The number of individual unique clones that make up the cluster. Each row of the All Sequences Table is considered a unique clone, regardless of how many sequences were collapsed to make the clone -
% of Clones
The proportion that the cluster makes up of the total number of clones across the whole dataset -
# Sequences
The number of individual sequences, rather than unique clones that make up the cluster. This corresponds to the # Sequences column in the All Sequences table -
% of Sequences
The proportion that the cluster makes up of the total number of sequences across the whole dataset -
Database Name
Gene Clusters only. This lists the reference database match for the gene, which can be useful when working with hybridized sequences (eg. Humanized mice) -
% Fully Annotated
This indicates the proportion of sequences within the cluster that could be fully annotated. This does not necessarily mean that the sequence(s) are in frame, or without stop codons. -
% In Frame & Fully Annotated
This indicates the proportion of sequences within the cluster that were fully annotated and in frame. This does not necessarily mean that the sequence(s) are without stop codons. -
% Without Stop Codons & In Frame & Fully Annotated
This indicates the proportion of sequences within the cluster that were without stop codons, in frame and could be fully annotated.
Secondary cluster columns
These refer to the sequences of other regions that your primary cluster is most commonly associated with. For example, for a cluster of the Heavy CDR3 region, there will be listed the most common FR sequences, CDR sequences and the most common V(D)J. These are the "Secondary Clusters", and will be generated for any available region, including the associated heavy or light chain. All these columns will have names given in the format below:
-
[Secondary cluster region] (ClusterID-Sequence)
This will list the first 5 associated region sequences for the given primary cluster. This will be in the format ID-Sequence (%). This gives the secondary cluster a numeric ID (the ID is explained above under General cluster columns) followed by the amino acid sequence and the percentage of sequences in the primary cluster that had the secondary associated cluster.- For example, the secondary Heavy CDR1 cluster for a Heavy CDR3 primary cluster might read: 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%) This means that of the sequences in the given Heavy CDR3 cluster, 53.85% of those had a CDR1 of RYTMH.
- For example, the secondary Heavy CDR1 cluster for a Heavy CDR3 primary cluster might read: 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%) This means that of the sequences in the given Heavy CDR3 cluster, 53.85% of those had a CDR1 of RYTMH.
-
[Secondary cluster region] (Top 5) %
This is the summed percents of the top 5 secondary clusters. If this is less than 100, that means that some of the sequences in the primary cluster had a different amino acid sequence than those listed in the above column.- Carrying on the example from above, if the secondary Heavy CDR1 (ClusterID-Sequence) read 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%), then the (Top 5)% column would read 92.31. This means that 7.69% of the Heavy CDR3 sequences in this cluster had a HCDR1 that was not RYTMH, DYYMH, DTYMH, NYLIE or SGYYWN.
- Carrying on the example from above, if the secondary Heavy CDR1 (ClusterID-Sequence) read 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%), then the (Top 5)% column would read 92.31. This means that 7.69% of the Heavy CDR3 sequences in this cluster had a HCDR1 that was not RYTMH, DYYMH, DTYMH, NYLIE or SGYYWN.
-
[Secondary cluster region] Nucleotides (ClusterID-Sequence)
Similar to the [Secondary cluster region] (ClusterID-Sequence) column, but for the nucleotide sequence
-
[Secondary cluster region] Nucleotides (Top 5) %
Similar to the [Secondary cluster region] (Top 5) % column, but for the nucleotide sequence
Additional columns for inexact clusters
-
Primary Exact Cluster %
Since an inexact cluster contains sequences that are not all identical across the clustered region, the primary exact cluster within the inexact cluster is the most common exact region. This number is given as a percentage of the inexact cluster- For example, the primary HCDR3 sequence AREGGSSYCTDY might make up 56% of the inexact cluster, meaning that other sequences similar to AREGGSSYCTDY make up the remaining 44% of the inexact cluster
-
# Exact Clusters
This column gives the number of exact clustered regions that are included in the inexact cluster. E.g. how many different unique HCDR3 sequences are included in the inexact cluster -
Eveness
This describes how "even" the cluster is. The value is calculated using Pielou's evenness index, which goes from 0 (uneven) to 1 (even). A cluster heavily skewed towards a dominant representative sequence will have a score closer to 0, while a cluster where each sequence within the cluster is found at similar abundances would have a value closer to 1. -
Cluster Contents % (Top 100)
This column lists the different exact clustered regions that make up an inexact cluster and the percentage (by # Sequences) that each exact cluster makes up of the inexact cluster. -
Cluster Contents (Top 100)
This column lists the different exact clustered regions that make up an inexact cluster and the count (by # Sequences) that each exact cluster makes up of the inexact cluster.
How to export an Excel file of selected columns
Before exporting any Cluster Table, you may find it useful to both filter your sequence results and select the columns you want using the Table Column Preferences panel on the right. Below, I have filtered for clusters that made up more than 0.2% of the total dataset, and I have selected to display only the following columns:
- Length
- # of Sequences
- % of Sequences
- Primary Exact Cluster %
- Unique
*** Note that you can save these column table preferences as Profiles. See this article: How to customise the Sequences Table.
Click Export Table once you have selected your sequences. This will open a pop-up allowing you to select the output format (Excel or .csv) and the option to export only the selected columns, or all hidden columns.
Make sure to select Only Visible if you would just like to export your selected columns.