This article lists each column in the All Sequences Table and Chain Combinations Table produced by a typical Single Cell Antibody Analysis run, and what each column represents. If you are looking for the cluster table columns, see Single Cell Antibody Analysis: Cluster Table Columns.
Jump to:
- The All Sequences Table
- Standard columns
- The Chain Combinations Table
- How to export an Excel file of selected columns
The All Sequences Table
Single Cell Analysis produces a reduced dataset where similar sequences are collapsed to a representative sequence or "clone" according to a user-specified threshold. Therefore, each row on the All Sequences table of a Single Cell Antibody Analysis result document can represent multiple sequences. To learn more about what single cell sequencing entails, see Understanding Single Cell technologies: Barcodes and UMIs.
Note: Single Cell Analysis can be run on normal NGS data with no UMIs or Barcodes, however some columns will not be present.
Single Cell Antibody Annotator will collapse sequences and generate counts only within each individual sequence list.
- This means that if you submit multiple sequence lists as the input, each sequence list will will be treated as a different dataset and collapsing will only occur within datasets, not across datasets.
The above Single Cell Analysis result shows a light chain that was found expressed at high levels in the individual B cell barcoded with the sequence GATCGCGAGAATGTGT. It was the most common/dominant light chain produced in the cell (% of Dominant Same Chain column = 100%) and made up 32.47% of the sequences in the cell.
A brief introduction to UMIs and Barcodes
Depending on whether Collapse UMI Duplicates and Separate Barcodes was run prior to Single Cell Antibody Analysis, there may be additional columns/information in the resulting tables. Please see our main article Understanding Single Cell technologies: Barcodes and UMIs for more in depth information.
- If UMIs were included in your sequencing process, then sequences with the same UMI will be grouped together and collapsed. Each unique UMI represents an individual mRNA, therefore collapsing identical UMIs will determine the starting mRNA content or Expression levels of different antibody chains. Grouping identical chains from these representative sequences on one row allows you to determine the dominant heavy and light chains as expressed in your dataset.
-
Barcodes may be used to tag individual cells, or to represent different wells/samples. If Barcodes were included in your sequencing process, then sequences with the same barcode will be tagged, allowing you to see what chains were produced under each barcode. Grouping the different heavy and light chains produced in each cell or "barcode" on one row allows you to explore the heavy-light pairs in each cell. See the Chain Combinations Table section.
Searching and filtering your Sequences in the Tables
To search for any column, go to Table preferences (1), start typing into the search bar (2) and hover over the column you would like to navigate to and click on the Focus Column button that appears (3) as shown below:
You can explore and navigate your tables using this Table Preferences panel, which you can use to search for columns, toggle hiding/showing columns and jump to columns of interest. You can also save-preset column views as profiles, see How to Customize the Sequences Table.
In addition, all the cells of the table can be Filtered upon, allowing you to pull out sequences of interest by right clicking on the cell and selecting "Filter..." as shown below:
After selecting a cell to filter on, it will be added to the filter bar above, where it can be edited. Filters can also be layered; right clicking on another cell will allow you to add another filter with an AND operator. Our filtering uses SQL syntax, please see our main article on Filtering your Sequences for more detail and examples.
Standard columns
-
ID
The ID column consists of automatically-assigned numerical numbers for each representative sequence or clone. These ID numbers are not ranked in any meaningful way. -
Name
The name of the sequence or clone is generated using the following format:
OriginalSequenceDoumentName-Barcode-ChainRanking- Barcode will not be present if the barcodes option was turned off when running Collapse UMI Duplicates and Separate Barcodes, or if a non-barcoded dataset is used.
- ChainRanking is explained under the column Chain-Ranking below.
-
Labels
This column contains any custom labels you have added to tag your sequences. See Using Custom Labels to learn more -
Notes
Here you can type in notes for any sequence by double-clicking on the cell. -
Chain
This column indicates what chain(s) were identified. This can be Light or Heavy. -
Chain-Ranking
This ranks the heavy/light chains found under each barcode/within a single cell according to proportion of reads. For example, Heavy-2 would be given to the heavy chain found in second highest abundance within a single cell. If non-barcoded data is used, the ranking will be relative to the whole dataset. -
V & J Gene Summary
Lists the closest-matching germline V & J genes, with their Source-Matches % in brackets. For example: IGHV1-69 (89.5%) IGHJ6 (77.4%) indicates that the closest matching V Gene was IGHV1-69 with 89.5% identity match found between the sequence and the entire length of IGHV1-69. -
Document Name
The original name of the sequence list document used -
Barcode
The Barcode sequence for the cell/well. Eg. GATCGCGAGAATGTGT -
% of Dominant Same Chain
The percent this chain makes up of the count of the Dominant Chain within the cell, or the entire dataset if Barcodes were not used. Without UMI analysis to determine the original read count of the cell, this number does not necessarily indicate expression levels. See Understanding Single Cell technologies: Barcodes and UMIs to learn more.
The Dominant chain will always read 100 (%) for this, while the second most abundant chain will be represented as a percentage relative to the abundance of the dominant chain. -
% of Sequences in Cell
This refers to the % number of sequences this chain makes up of the entire cell, including any other chain types. Note that without UMI analysis to determine the original read count of the cell, this number does not necessarily indicate expression levels. If Barcodes were not present and accounted for with Collapse UMI Duplicates and Separate Barcodes, this number will be relative to the entire dataset. -
# Sequences
This refers to the total count of original sequences that were "collapsed" to form a clone. This number comes from collapsing highly similar sequences according to the setting for Combine Regions at least:__% identical when running Single Cell Antibody Analysis. This is not the number of sequences collapsed by running Collapse UMI Duplicates. -
Significant (above threshold)
This refers to whether this chain/region was found to be significant, according to the metrics specified under Significant regions have at least: when running Single Cell Antibody Analysis -
Minimum Coverage
Minimum coverage is only calculated when the de novo assembly option is used. Biologics finds the point within the consensus sequence (a single nucleotide) which has the least agreement with the other reads used for assembling the sequence. For example, if the consensus was assembled from 50 overlapping reads and the nucleotide which differed the most was found in 30 of the sequences, the minumum coverage would be 30. -
Maximum Coverage
Maximum coverage is only calculated when the de novo assembly option is used. Biologics finds the point within the consensus sequence (a single nucleotide) which has the most agreement with the other reads used for assembling the sequence. For example, if the consensus was assembled from 50 overlapping reads and the nucleotide that was most consistent was found in 48 of the sequences, the minumum coverage would be 48. -
Fully Annotated
This indicates whether the consensus sequence for the chain could be fully annotated. This does not necessarily mean that the sequence is in frame, or without stop codons. -
In Frame & Fully Annotated
This indicates whether the consensus sequence for the chain is in frame and could be fully annotated. This does not necessarily mean that the sequence is without stop codons. -
Without Stop Codons & In Frame & Fully Annotated
This indicates whether the consensus sequence is without stop codons, in frame and could be fully annotated. -
Sequence Length
The consensus sequence length in nucleotides. If protein sequences were used as the input, the length will be in amino acids. -
Score
This indicates the score for the consensus sequence, based on your chosen Liabilities and Assets. Liabilities and assets need to be turned on for this column to be present.- See Antibody Sequence Liabilities for our list of default liabilities.
- See How to Customize Sequence Liabilities and Assets to learn how to specify your own custom liabilities
- See Positional Liabilities based on Antibody Numbering to view our default positional liabilities and learn how to specify your own.
-
Error
This column lists any errors and the region the error(s) were found in for the sequence(s). This could be "Frame Shift (Heavy CDR1)" for example. Liabilities and assets need to be turned on for this column to be present. See the above "Score" column for more info.
Region-dependent columns
All these columns will be generated for the various regions of your sequences. The full list includes:
-
Light regions:
- The FR1, CDR1, FR2, CDR2, FR1, CDR3, FR4
- The VJ Region, VJC Region
-
Heavy regions:
- The FR1, CDR1, FR2, CDR2, FR1, CDR3, FR4
- The VDJ Region, VDJC Region
For each region, these columns will be generated:
-
Region
This column contains the amino acid sequence for that region -
ID
Each unique region sequence is given a numerical ID number. Regions with the same amino acid sequence will have the same ID number. The number corresponds to a ranking of how common the sequence is, with 1 being the most common sequence for the given region. This is effectively the Cluster ID - see Single Cell Antibody Analysis: Cluster Table Columns for more information. -
Length
The region length in amino acids -
Nucleotides
The nucleotide sequence of the region -
DNA Germline/Template Mismatches
This column is only generated if Annotate variants is turned on when running Single Cell Antibody Analysis. The number of DNA mismatches relative to the reference sequences used (either germline or target sequence) is listed here. -
AA Germline/Template Mismatches
This column is only generated if Annotate variants is turned on when running Single Cell Antibody Analysis. The number of amino acid mismatches relative to the reference sequences used (either germline or target sequence) is listed here. -
AA HGVS
This column is only generated if Annotate variants is turned on when running Single Cell Antibody Analysis. The amino acid mismatches relative to the reference sequences used (either germline or target sequence) are listed here. We use standard HGVS nomenclature.
Gene columns
For each gene (and some gene combinations like Heavy VJ gene), these columns will be generated:
-
Gene
This column lists the closest matching germline gene (eg. IGHV1-5). If there are two evenly matching genes, both are listed. -
ID
Each gene (eg. IGHV1-5) is assigned a numerical ID. The number corresponds to the most common gene, with 1 being the most common sequence for the given gene family. This is effectively the Gene Cluster ID - see Exploring the Cluster Table Columns for more -
DNA Germline Mismatches
This column is only generated if Annotate variants is turned on in Single Cell Antibody Annotator. The number of DNA mismatches relative to the identified germline gene is listed here. -
AA Germline Mismatches
This column is only generated if Annotate variants is turned on in Single Cell Antibody Annotator. The number of amino acid mismatches relative to the identified germline gene is listed here. -
AA HGVS
This column is only generated if Annotate variants is turned on in Single Cell Antibody Annotator. The amino acid mismatches relative to the identified germline gene are listed here. We use standard HGVS nomenclature. -
Identity %
This is the percent identity match found between the sequence and the found length of the closest-match germline gene -
Coverage %
This is what percentage of the closest-match germline gene can be found, not including any mismatches within the "covered area" of the gene. -
Matches %
This is the percent identity match found between the sequence and the entire length of the closest-match germline gene
Note that gene combination columns (like Heavy VJ Gene) will only list the Gene and ID columns above.
Additional columns
-
Liability columns
Various columns for your specified liabilities, with a count for the number of times the liability is found in the sequence and the region(s) the liability is found in. For example, the liability column for Deamidation (SN) might have cell values like 2 (Heavy CDR3, Light CDR1).
Liabilities and assets need to be turned on under Analysis Options on when running Single Cell Antibody Annotator for these columns to be present:
- See Antibody Sequence Liabilities for our list of default liabilities.
- See How to Customize Sequence Liabilities and Assets to learn how to specify your own custom liabilities
- See Positional Liabilities based on Antibody Numbering to view our default positional liabilities and learn how to specify your own.
-
Assay Data columns
These will only be present if you have added Assay Data
-
Protein Statistics columns calculated for the VJ and VDJ Regions
These will only be present if Calculate protein statistics is turned on under Analysis Options. The values are calculated for full length VDJ or VJ regions.- Charge at pH 7
- Extinction Coefficient
Chain Combinations Table
The chain combinations table pairs the heavy and light chains found under the same barcode (within the same cell/well). This table is only available if you have run Single Cell Antibody Analysis with barcoded data.
The leading chains will be paired, up to the top four most populous heavy/light chains for a maximum of 16 combinations in a barcode/cell. You can therefore view the second-ranked Heavy chain for a barcode/cell along with the first, second, third and fourth ranked light chain.
Most of the columns are the same as for the Standard Columns above, however there are some changes and additions:
-
Name
The name is generated using this format:
OriginalSequenceDoumentName-Barcode-HeavyChain-Ranking-LightChain-Ranking -
Chain
This column will now read Both (for both heavy and light chain on one row). -
Score
This will now be the score for both chains added together
For many of the columns listed under Standard Columns, there will now be a Heavy or Light qualifier before the column name. For example, there are two columns for % of Sequences in Cell, one for the Light chain of the pair and one for the Heavy chain.
How to export an Excel file of selected columns
Before exporting the a Table, you may find it useful to both filter your sequence results and select the columns you want using Table Column Preferences. Below in the Chain Combinations Table, I have filtered for sequence pairs where the light chain within the cell was at least 60% of the abundance of the dominant light chain within the cell. I have also selected to display only the following columns:
- ID
- Heavy Chain-Ranking
- Light Chain-Ranking
- Heavy % of Sequences in Cell
- Light % of Sequences in Cell
- Heavy V & J Gene Summary
- Light V & J Gene Summary
- Score
- VDJ Region
- VJ Region
*** Note that you can save these column table preferences as Profiles. See this article: How to customise the Sequences Table.
Click Export Table once you have selected your sequences. This will open a pop-up allowing you to select the output format (Excel or .csv) and the option to export only the selected columns, or all hidden columns.
Make sure to select Only Visible if you would just like to export your selected columns.