This article lists the columns for a generic Cluster Table in an Single Cell Antibody Annotator result and what each column represents. If you are unsure what a cluster is, please see our article on Understanding "Clusters". If you are interested in the All Sequences Table columns of Single Cell Antibody Annotator, see Single Cell Antibody Analysis: All Sequences Table Columns.
Each row of the Cluster Table represents the number of unique clones that have been grouped together due to sharing the same sequence (or close to the same sequence) across a given region(s) or gene.
For example, the cluster table below shows sequences grouped together due to having at least 85% sequence similarity across the Heavy CDR3 region. There were 23 unique clones (as found in the All Sequences Table) that had a HCDR3 of "AREGGSSYCTDY" or at least 85% similar. This corresponded to 0.24% of the total clones in the result.
Because this is an example of an inexact cluster table (each cluster can contain different but similar HCDR3 sequences), the Primary Exact Cluster % (by Clone) column shows that the dominant sequence of AREGGSSYCTDY made up 47.83% of that cluster. The other HCDR3 sequences make up the other 52.17% and will be similar to AREGGSSYCTDY.
To search for any column, go to Table preferences (1), start typing into the search bar (2) and hover over the column you would like to navigate to and click on the Focus Column button that appears (3) as shown below:
In addition, all the cells of the table can be Filtered upon, allowing you to pull out sequences of interest by right clicking on the cell and selecting "Filter..." as shown below:
After selecting a cell to filter on, it will be added to the filter bar above, where it can be edited. Filters can also be layered; right clicking on another cell will allow you to add another filter with an AND operator. Our filtering uses SQL syntax, please see our main article on Filtering your Sequences for more detail and examples.
How to add a cluster
You can add clusters by selecting Post-processing > Add Clusters (Recluster). For more information see our main article Adding Clusters to your Results.
Note: at present, combination clusters across paired Heavy-Light chains (paired under the same barcode) like Heavy-Light CDR3 can only be generated when running the initial annotation. If you are interested in specific clusters that combine regions from the heavy and light chains, please run Single Cell Antibody Annotator again with those clusters added.
General cluster columns
ID Each cluster is given a numerical ID number. The number is a ranking of how large the cluster is - ie. how many sequences have been grouped together due to conserved amino acid sequence across the region(s) or by being derived from the same gene (gene clusters). The largest cluster is given an ID of 1.
Labels This column contains any custom labels you have added to tag your cluster. See Using Custom Labels to learn more
Notes Here you can type in notes for any cluster by double-clicking on the cell.
[Cluster-Name] This column will be the name of the cluster (eg. Heavy CDR3) and contains the amino acid sequence for that clustered region. For example, a Heavy CDR3 cluster might contain AREGGSSYCTDY as the dominant sequence.
For an inexact cluster, the most common sequence is listed.
For multiple region clusters, the sequence is given in the form: Region1Sequence-Region2Sequence
For gene clusters, the closest germline gene is listed
Length The length of the clustered region in amino acids. For example, a HCDR3 cluster might have a length of 13 as all the sequences in the cluster have an HCDR3 length of 13 residues.
This column is not present for multi-region or gene clusters
# of Clones The number of individual unique clones that make up the cluster. Each row of the All Sequences Table is considered a unique clone, regardless of how many sequences were collapsed to make the clone
% of Clones The proportion that the cluster makes up of the total number of clones across the whole dataset
# of Sequences The number of individual sequences, rather than unique clones that make up the cluster. This corresponds to the # Sequences column in the All Sequences table
% of Sequences The proportion that the cluster makes up of the total number of sequences across the whole dataset
Database Name Gene Clusters only. This lists the reference database match for the gene, which can be useful when working with hybridized sequences (eg. Humanized mice)
% Fully Annotated This indicates the proportion of sequences within the cluster that could be fully annotated. This does not necessarily mean that the sequence(s) are in frame, or without stop codons.
% In Frame & Fully Annotated This indicates the proportion of sequences within the cluster that were fully annotated and in frame. This does not necessarily mean that the sequence(s) are without stop codons.
% Without Stop Codons & In Frame & Fully Annotated This indicates the proportion of sequences within the cluster that were without stop codons, in frame and could be fully annotated.
Secondary cluster columns
These refer to the sequences of other regions that your primary cluster is most commonly associated with. For example, for a cluster of the Heavy CDR3 region, there will be listed the most common FR sequences, CDR sequences and the most common V(D)J. These are the "Secondary Clusters", and will be generated for any available region, including the associated heavy or light chain. All these columns will have names given in the format below:
[Secondary cluster region] (ClusterID-Sequence) This will list the first 5 associated region sequences for the given primary cluster. This will be in the format ID-Sequence (%). This gives the secondary cluster a numeric ID (the ID is explained above under General cluster columns) followed by the amino acid sequence and the percentage of sequences in the primary cluster that had the secondary associated cluster.
For example, the secondary Heavy CDR1 cluster for a Heavy CDR3 primary cluster might read: 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%) This means that of the sequences in the given Heavy CDR3 cluster, 53.85% of those had a CDR1 of RYTMH.
[Secondary cluster region] (Top 5) % This is the summed percents of the top 5 secondary clusters. If this is less than 100, that means that some of the sequences in the primary cluster had a different amino acid sequence than those listed in the above column.
Carrying on the example from above, if the secondary Heavy CDR1 (ClusterID-Sequence) read 81-RYTMH (53.85%); 12-DYYMH (15.38%); 1-DTYMH (7.69%); 8-NYLIE (7.69%); 35-SGYYWN (7.69%), then the (Top 5)% column would read 92.31. This means that 7.69% of the Heavy CDR3 sequences in this cluster had a HCDR1 that was not RYTMH, DYYMH, DTYMH, NYLIE or SGYYWN.
[Secondary cluster region] Nucleotides (ClusterID-Sequence) Similar to the [Secondary cluster region] (ClusterID-Sequence) column, but for the nucleotide sequence
[Secondary cluster region] Nucleotides (Top 5) % Similar to the [Secondary cluster region] (Top 5) % column, but for the nucleotide sequence
Additional columns for inexact clusters
All calculations are done by the number of unique clones, not the number of sequences that made up each clone. If you would like these numbers to be calculated in reference to the number of total sequences that make up each clone, you can run NGS Antibody Annotator.
Primary Exact Cluster % (by Clone) Since an inexact cluster contains sequences that are not all identical across the clustered region, the primary exact cluster within the inexact cluster is the most common exact region. This number is given as a percentage of the inexact cluster
For example, the primary HCDR3 sequence AREGGSSYCTDY might make up 56% of the inexact cluster, meaning that other sequences similar to AREGGSSYCTDY make up the remaining 44% of the inexact cluster
# Exact Clusters (by Clone) This column gives the number of exact clustered regions that are included in the inexact cluster. E.g. how many different unique HCDR3 sequences are included in the inexact cluster
Eveness (by Clone) This describes how "even" the cluster is. The value is calculated using Pielou's evenness index, which goes from 0 (uneven) to 1 (even). A cluster heavily skewed towards a dominant representative sequence will have a score closer to 0, while a cluster where each sequence within the cluster is found at similar abundances would have a value closer to 1.
Cluster Contents % (Top 100) (by Clone) This column lists the different exact clustered regions that make up an inexact cluster and the percentage (by unique clones) that each exact cluster makes up of the inexact cluster.
Cluster Contents (Top 100) (by Clone) This column lists the different exact clustered regions that make up an inexact cluster and the count (by unique clones) that each exact cluster makes up of the inexact cluster.
How to export an Excel file of selected columns
Before exporting any Cluster Table, you may find it useful to both filter your sequence results and select the columns you want using the Table Column Preferences panel on the right. Below, I have filtered for clusters that had more than 5 unique clones, and I have selected to display only the following columns:
Click Export Table once you have selected your sequences. This will open a pop-up allowing you to select the output format (Excel or .csv) and the option to export only the selected columns, or all hidden columns.
Make sure to select Only Visible if you would just like to export your selected columns.