The Single Clone Antibody Analysis operation is an alternate way to annotate and analyze the variable regions of standard IgG-like molecules. It combines the sequences analyzed into fewer representative sequences and can pull out the dominant chain(s) within your dataset.
The Single Clone Analysis operation is particularly suited for analysing Barcoded Sequences from both "single clones in wells" and "single cells in droplets" experiments. It is also suitable for analyzing datasets from NGS technologies that incorporate UMIs. To learn more about Barcodes and UMIs, see this article: Understanding Single Cell technologies: Barcodes and UMIs.
For Barcode and/or UMI analysis, you will first need to run the Collapse UMI Duplicates and Separate Barcodes tool. For more discussion about possible workflows, see the Single Cell Analysis workflows article.
Note: The Barcode and UMI functionality is currently available as an add-on. If your organisation does not have Barcode Separation, UMI Collapse, or Single Clone Analysis pipelines enabled, please contact us to try them out.
Jump to:
How do I run Single Clone Antibody Analysis?
To specify Single Clone Antibody Analysis options, select one or more nucleotide sequences or lists in your folder and select Single Clone Antibody Analysis... in the Annotation dropdown.

To run the Single Clone Antibody Analysis operation, select options relevant to your sequence data in the dialog that appears and click Run.
Each selected input sequence list will be analysed independently. The pipeline works by:
- Trimming sequences and then applying filtering steps to reduce total data size (configurable)
- De-novo assembling reads together (optional)
- Identifying and annotating Ig-like Antibody regions (FR/CDR)
- Collapsing sequences with similar V(D)J regions into a single dominant sequence. Heavy and light chains are compared separately.
Alternatively when run on barcoded sequences in combination with the Collapse UMI Duplicates and Separate Barcodes operation, it will identify one or more dominant chains for each barcode instead of for each input list. For an example of a workflow for analysing barcoded data, please refer to the UMI/Barcode and Single Clone Analysis tutorials.
The operation will output a Biologics Annotator Result of dominant annotated chains present in your input sequences. The Single Clone Analysis tutorial also contains further information on how to understand Single Clone Analysis Chain Combination tables.
Single Clone Antibody Analysis options

Input options

Discard short sequences
- All sequences that are shorter than the threshold defined in Discard sequences shorter than will be discarded. This is useful to remove likely low quality sequences. You may wish to set this parameter lower if your sequences have adapters, UMIs and/or barcodes which have been trimmed in the Collapse UMI Duplicates and Separate Barcodes operation.
Use longest reads only
- The Only use longest option lets you specify the number of the reads that will be used from each list or barcode (after sorting by length), with any additional reads discarded. This helps improve performance on large data sets where excessive data significantly slows down analysis. 500 reads is generally sufficient.
De novo assembly
- Select the De novo assembly required option to perform de novo assembly if reads are fully annotated region fragments. Otherwise reads are grouped via the fully annotated region annotation.
Unmerged reads
- The Keep unmerged reads option specifies that paired reads which failed to merge should be used in the next step of the pipeline. In use cases where pairs are expected to overlap, discarding unmerged pairs is recommended in order to improve assembly accuracy.
Annotation Options
Annotation database
- The Antibody annotator database dropdown lets you select the reference database that should be used for annotating your sequences in order to determine the V(D)J region.
Heavy/light association
- Selecting the Associate significant dominant heavy and light pair option means that if the dominant heavy and light chain sequences are both significant (above the thresholds defined in the Advanced Options section) and have the same barcode, they will be associated so that they appear in a single row in the results table.
Annotate germline differences
- To annotate and see the differences between your input sequences and reference sequences, select the Annotate germline differences option. With the selection of this option, the nucleotide and amino acid differences will be annotated on your input sequences.
Name scheme
- The Name scheme dropdown lets you optionally select a Name Scheme that can read the collapsed single clone names to extract information of interest. Single Clone Antibody Analysis will use this information to pair Heavy/Light chains (if the Name Scheme has chain values) and output all Name Scheme fields as columns in the results table. For more information about Name Schemes, see Using Name Schemes
Trim primers
- The Trim primers option allows you to select a primer database from the dropdown for trimming any of the primers in the database from your sequences, if they are present.
Liabilities
- To search and score motifs liable to post-translational modifications or any other types of modifications or beneficial motifs, select the Annotate liabilities option. The Antibody Annotator pipeline has a default set of sequence liability checks, these include: cleavage, deamidation, glycosylation, hydrolysis, isomerization and oxidation. To learn how to specify your own liabilites, see this article: How to customize antibody sequence liabilities and assets
Sequence Region of Interest Options
Fully annotated region
-
The Fully annotated region between dropdowns allow you to define the required region range for a sequence to be classed as fully annotated. The fully annotated classification is used for the other options in this section.
Retain additional bases
- The Retain upstream of fully annotated region option retains the specified number of nucleotides upstream of the fully annotated region when trimming the ends of contigs in order to identify duplicates when ignoring incorrect contig ends. If a contig is not long enough to cover the specified range, it will be excluded from the next step in the pipeline.
- The Retain downstream of fully annotated region option retains the specified number of nucleotides downstream of the fully annotated region when trimming the ends of contigs in order to identify duplicates when ignoring incorrect contig ends. If a contig is not long enough to cover the specified range, it will be excluded from the next step in the pipeline.
Annotate entire regions
- If a CDR or FR annotation would be truncated due to mismatches, the Always annotate entire regions option instead forces it to end at the boundary of the respective CDR/FR region. That region will be complete and not truncated.
Advanced Options
Combining regions
- The Combine regions at least option, if selected, means that after assembly or grouping, regions are further clustered to merge sequences which differ due to sequencing errors. It is recommended to use an identity percentage low enough to capture sequencing errors but high enough to preserve true variation. 97% is a reasonable default.
Significant regions
Significant regions thresholds allow you to filter out infrequent regions to retain statistically significant regions for downstream analysis. Regions deemed insignificant are retained but annotated as not significant.
-
The Significant regions have at least (percentage read count of the cell) input field lets you flag regions with low numbers of reads relative to the total reads in the cell, specified as a percentage.
- The Significant regions have at least (reads) input field lets you flag regions with low numbers of reads as not significant. This is useful for filtering out regions that were only defined due to reads of very low frequency or due to sequencing errors.
- The Significant regions have at least (percentage read count of the dominant same chain region) input field lets you flag regions where the number of reads is equal to or less than the number in the dominant same chain region (the one with the most reads), specified as a percentage. This setting allows you to filter out regions that do not have enough supporting data for further analysis.
Discard regions
- The Only keep regions with at least input field lets you discard regions where the number of reads is equal to or less than the number in the dominant same chain region, specified as a percentage. This setting allows you to permanently discard regions that do not have enough supporting data for further analysis.
Clustering Options

The above clustering options are the defaults, however you can specify custom clusters including percentage threshold and multi-region/gene clusters. Clicking on the blue Plus icon (+) will allow you to add a custom cluster. Please see our main article Clustering Options for our guide on how to specify more complex clusters.