This article lists each column in the All Sequences Table produced by a typical NGS Antibody Annotator run, and what each column represents. If you want to learn more about the columns of a cluster table, see NGS Antibody Analysis: Cluster Table Columns
Jump to:
The All Sequences Table
NGS Antibody Annotator produces a reduced dataset where similar sequences are collapsed to a representative sequence or "clone" according to a user-specified threshold. Therefore, each row on the All Sequences table of an NGS Analysis result document can represent multiple sequences.
NGS Antibody Annotator will collapse sequences and generate counts only within each individual sequence list.
- This means that if you submit multiple sequence lists as the input, each sequence list will will be treated as a different dataset and collapsing will only occur within datasets, not across datasets.
- You can use the Group Sequences pre-processing option if you would like all your sequence lists to be analyzed together.
The above NGS Analysis result shows the dominant Heavy chain, assigned to ID: 1. It was the most common Heavy chain, and made up 1.54% of the sequences in the dataset.
A brief introduction to UMIs and Barcodes
Depending on whether Collapse UMI Duplicates and Separate Barcodes was run prior to NGS Antibody Analysis, there may be additional columns/information in the resulting tables. Please see our main article Understanding Barcodes and UMIs for more in depth information.
- If UMIs were included in your sequencing process, then sequences with the same UMI will be grouped together and collapsed. Each unique UMI represents an individual mRNA, therefore collapsing identical UMIs will determine the starting mRNA content or expression levels of different antibody chains. Grouping identical chains from these representative sequences on one row allows you to determine the dominant heavy and light chains as expressed in your dataset.
-
Barcodes may be used to tag individual cells, or to represent different wells/samples. If Barcodes were included in your sequencing process, then sequences with the same barcode will be tagged, allowing you to see what chains were produced under each barcode. Sequences within the same Barcode will be "collapsed" as explained in the introduction.
Searching and filtering your Sequences in the Tables
To search for any column, go to Table preferences (1), start typing into the search bar (2) and hover over the column you would like to navigate to and click on the Focus Column button that appears (3) as shown below:
You can explore and navigate your tables using this Table Preferences panel, which you can use to search for columns, toggle hiding/showing columns and jump to columns of interest. You can also save-preset column views as profiles, see How to Customize the Sequences Table.
In addition, all the cells of the table can be Filtered upon, allowing you to pull out sequences of interest by right clicking on the cell and selecting "Filter..." as shown below:
After selecting a cell to filter on, it will be added to the filter bar above, where it can be edited. Filters can also be layered; right clicking on another cell will allow you to add another filter with an AND operator. Our filtering uses SQL syntax, please see our main article on Filtering your Sequences for more detail and examples.
Standard columns
-
ID
The ID column consists of automatically-assigned numerical numbers for each representative sequence or clone.- These will be ordered in decreasing abundance if a single dataset was used as the input.
-
Name
The name of the sequence or clone is generated using the following format:
OriginalSequenceDoumentName-Barcode-ChainRanking- Barcode will not be present if the barcodes option was turned off when running Collapse UMI Duplicates and Separate Barcodes, or if a non-barcoded dataset is used.
- ChainRanking is explained under the column Chain-Ranking below.
-
Labels
This column contains any custom labels you have added to tag your sequences. See Using Custom Labels to learn more -
Notes
Here you can type in notes for any sequence by double-clicking on the cell. -
Chain
This column indicates what chain(s) were identified. This can be Light or Heavy. -
Chain-Ranking
This ranks the heavy/light chains found under each barcode/dataset according to proportion of reads. For example, Heavy-2 would be given to the heavy chain found in second highest abundance within a well. If non-barcoded data is used, the ranking will be relative to the input dataset(s). -
V & J Gene Summary
Lists the closest-matching germline V & J genes, with their Source-Matches % in brackets. For example: IGHV1-69 (89.5%) IGHJ6 (77.4%) indicates that the closest matching V Gene was IGHV1-69 with 89.5% identity match found between the sequence and the entire length of IGHV1-69. -
Document Name
The original name of the sequence list document used -
Barcode
The Barcode sequence for the well. Eg. GATCGCGAGAATGTGT -
% of Dominant Same Chain
The percent this chain makes up of the count of the Dominant Chain within the well, or the dataset if Barcodes were not used. Without UMI analysis to determine the original read count of the cell, this number does not necessarily indicate expression levels. See Understanding Barcodes and UMIs to learn more.
The Dominant chain will always read 100 (%) for this, while the second most abundant chain will be represented as a percentage relative to the abundance of the dominant chain. -
% of Sequences
This refers to the % number of sequences this chain makes up of the dataset, including any other chain types. Note that without UMI analysis to determine the original mRNA read count, this number does not necessarily indicate expression levels. If Barcodes were not present and accounted for with Collapse UMI Duplicates and Separate Barcodes, this number will be relative to the original dataset. -
# Sequences
This refers to the total count of original sequences that were "collapsed" to form a clone. This number comes from collapsing highly similar sequences according to the setting for Combine Regions at least:__% identical when running NGS Antibody Annotator. This is not the number of sequences collapsed by running Collapse UMI Duplicates.
-
Minimum Coverage
Minimum coverage is only calculated when the advanced de novo assembly option is used. Biologics finds the point within the consensus sequence (a single nucleotide) which has the least agreement with the other reads used for assembling the sequence. For example, if the consensus was assembled from 50 overlapping reads and the nucleotide which differed the most was found in 30 of the sequences, the minumum coverage would be 30. -
Maximum Coverage
Maximum coverage is only calculated when the advanced de novo assembly option is used. Biologics finds the point within the consensus sequence (a single nucleotide) which has the most agreement with the other reads used for assembling the sequence. For example, if the consensus was assembled from 50 overlapping reads and the nucleotide that was most consistent was found in 48 of the sequences, the minumum coverage would be 48. -
Fully Annotated
This indicates whether the consensus sequence for the chain could be fully annotated. This does not necessarily mean that the sequence is in frame, or without stop codons. -
In Frame & Fully Annotated
This indicates whether the consensus sequence for the chain is in frame and could be fully annotated. This does not necessarily mean that the sequence is without stop codons. -
Without Stop Codons & In Frame & Fully Annotated
This indicates whether the consensus sequence is without stop codons, in frame and could be fully annotated. -
Sequence Length
The consensus sequence length in nucleotides. If protein sequences were used as the input, the length will be in amino acids. -
Score
This indicates the score for the consensus sequence, based on your chosen Liabilities and Assets. Liabilities and assets need to be turned on for this column to be present.- See Antibody Sequence Liabilities for our list of default liabilities.
- See How to Customize Sequence Liabilities and Assets to learn how to specify your own custom liabilities
- See Positional Liabilities based on Antibody Numbering to view our default positional liabilities and learn how to specify your own.
-
Error
This column lists any errors and the region the error(s) were found in for the sequence(s). This could be "Frame Shift (Heavy CDR1)" for example. Liabilities and assets need to be turned on for this column to be present. See the above "Score" column for more info.
Region-dependent columns
All these columns will be generated for the various regions of your sequences. The full list includes:
-
Light regions:
- The FR1, CDR1, FR2, CDR2, FR1, CDR3, FR4
- The VJ Region, VJC Region
-
Heavy regions:
- The FR1, CDR1, FR2, CDR2, FR1, CDR3, FR4
- The VDJ Region, VDJC Region
For each region, these columns will be generated:
-
Region
This column contains the amino acid sequence for that region -
ID
Each unique region sequence is given a numerical ID number. Regions with the same amino acid sequence will have the same ID number. The number corresponds to a ranking of how common the sequence is, with 1 being the most common sequence for the given region. This is effectively the Cluster ID - see NGS Antibody Analysis: Cluster Table Columns for more information. -
Length
The region length in amino acids -
Nucleotides
The nucleotide sequence of the region -
DNA Germline/Template Mismatches
This column is only generated if Annotate variants is turned on when running NGS Antibody Analysis. The number of DNA mismatches relative to the reference sequences used (either germline or target sequence) is listed here. -
AA Germline/Template Mismatches
This column is only generated if Annotate variants is turned on when running NGS Antibody Analysis. The number of amino acid mismatches relative to the reference sequences used (either germline or target sequence) is listed here. -
AA HGVS
This column is only generated if Annotate variants is turned on when running NGS Antibody Analysis. The amino acid mismatches relative to the reference sequences used (either germline or target sequence) are listed here. We use standard HGVS nomenclature.
Gene columns
For each gene (and some gene combinations like Heavy VJ gene), these columns will be generated:
-
Gene
This column lists the closest matching germline gene (eg. IGHV1-5). If there are two evenly matching genes, both are listed. -
ID
Each gene (eg. IGHV1-5) is assigned a numerical ID. The number corresponds to the most common gene, with 1 being the most common sequence for the given gene family. This is effectively the Gene Cluster ID - see Exploring the Cluster Table Columns for more -
DNA Germline Mismatches
This column is only generated if Annotate variants is turned on in NGS Antibody Annotator. The number of DNA mismatches relative to the identified germline gene is listed here. -
AA Germline Mismatches
This column is only generated if Annotate variants is turned on in NGS Antibody Annotator. The number of amino acid mismatches relative to the identified germline gene is listed here. -
AA HGVS
This column is only generated if Annotate variants is turned on in NGS Antibody Annotator. The amino acid mismatches relative to the identified germline gene are listed here. We use standard HGVS nomenclature. -
Identity %
This is the percent identity match found between the sequence and the found length of the closest-match germline gene -
Coverage %
This is what percentage of the closest-match germline gene can be found, not including any mismatches within the "covered area" of the gene. -
Matches %
This is the percent identity match found between the sequence and the entire length of the closest-match germline gene
Note that gene combination columns (like Heavy VJ Gene) will only list the Gene and ID columns above.
Additional columns
-
Liability columns
Various columns for your specified liabilities, with a count for the number of times the liability is found in the sequence and the region(s) the liability is found in. For example, the liability column for Deamidation (SN) might have cell values like 2 (Heavy CDR3, Light CDR1).
Liabilities and assets need to be turned on under Analysis Options when running NGS Antibody Annotator for these columns to be present:
- See Antibody Sequence Liabilities for our list of default liabilities.
- See How to Customize Sequence Liabilities and Assets to learn how to specify your own custom liabilities
- See Positional Liabilities based on Antibody Numbering to view our default positional liabilities and learn how to specify your own.
-
Assay Data columns
These will only be present if you have added Assay Data
-
Linker Columns
These will only be present if you have scFv-like data or VHH-VHH data, and selected the "with linker" options when running Antibody Annotator.- Linker: This gives the amino acid sequence of the linker
- Linker ID: This assigns each unique linker an ID, and a ranking based on how prevalent it is in the dataset
- Linker Length: The length of the linker in amino acids
- Linker Nucleotides: The nucleotide sequence of the linker
- Linker Match: This will only be generated if you have created and selected a Linker Database when running Antibody Annotator. If you have specified linkers, this table lists the linker match in the database
-
Protein Statistics columns calculated for the VJ and VDJ Regions
These will only be present if Calculate protein statistics is turned on under Analysis Options. The values are calculated for full length VDJ or VJ regions.- Charge at pH 7
- Extinction Coefficient
How to export an Excel file of selected columns
Before exporting the a Table, you may find it useful to both filter your sequence results and select the columns you want using Table Column Preferences. Below in the All Sequences Table, I have filtered for clones that had at least 1000 sequences. I have also selected to display only the following columns:
- ID
- Chain
- # Sequences
- % Sequences
- Heavy V & J Gene Summary
- Score
- Heavy CDR3
- VDJ Region
*** Note that you can save these column table preferences as Profiles. See this article: How to customise the Sequences Table.
Click Export Table once you have selected your sequences. This will open a pop-up allowing you to select the output format (Excel or .csv) and the option to export only the selected columns, or all hidden columns.
Make sure to select Only Visible if you would just like to export your selected columns.