Discard short sequences
All sequences that are shorter than the threshold defined in Discard sequences shorter than will be discarded. This is useful to remove likely low quality sequences. You may wish to set this parameter lower if your sequences have adapters, UMIs and/or barcodes which have been trimmed in the Collapse UMI Duplicates and Separate Barcodes operation.
Use longest reads only
The Only use longest option lets you specify the number of the reads that will be used from each list or barcode (after sorting by length), with any additional reads discarded. This helps improve performance on large data sets where excessive data significantly slows down analysis. 500 reads is generally sufficient.
De novo assembly
Select the De novo assembly required option to perform de novo assembly if reads are fully annotated region fragments. Otherwise reads are grouped via the fully annotated region annotation.
The Keep unmerged reads option specifies that paired reads which failed to merge should be used in the next step of the pipeline. In use cases where pairs are expected to overlap, discarding unmerged pairs is recommended in order to improve assembly accuracy.
The Antibody annotator database dropdown lets you select the reference database that should be used for annotating your sequences in order to determine the V(D)J region.
Selecting the Associate significant dominant heavy and light pair
option means that if the dominant heavy and light chain sequences are both significant (above the thresholds defined in the Advanced Options
section) and have the same barcode, they will be associated so that they appear in a single row in the results table.
Annotate germline differences
To annotate and see the differences between your input sequences and reference sequences, select the Annotate germline differences option. With the selection of this option, the nucleotide and amino acid differences will be annotated on your input sequences.
The Name scheme
dropdown lets you optionally select a Name Scheme that can read the collapsed single clone names to extract information of interest. Single Clone Antibody Analysis
will use this information to pair Heavy/Light chains (if the Name Scheme has chain values) and output all Name Scheme fields as columns in the results table. For more information about Name Schemes, see Using Name Schemes
The Trim primers option allows you to select a primer database from the dropdown for trimming any of the primers in the database from your sequences, if they are present.
To search and score motifs liable to post-translational modifications or any other types of modifications or beneficial motifs, select the Annotate liabilities
option. The Antibody Annotator
pipeline has a default set of sequence liability checks, these include: cleavage, deamidation, glycosylation, hydrolysis, isomerization and oxidation. To learn how to specify your own liabilites, see this article: How to customize antibody sequence liabilities and assets
Sequence Region of Interest Options
Fully annotated region
The Fully annotated region between
dropdowns allow you to define the required region range for a sequence to be classed as fully annotated. The fully annotated classification is used for the other options in this section.
Retain additional bases
The Retain upstream of fully annotated region option retains the specified number of nucleotides upstream of the fully annotated region when trimming the ends of contigs in order to identify duplicates when ignoring incorrect contig ends. If a contig is not long enough to cover the specified range, it will be excluded from the next step in the pipeline.
The Retain downstream of fully annotated region option retains the specified number of nucleotides downstream of the fully annotated region when trimming the ends of contigs in order to identify duplicates when ignoring incorrect contig ends. If a contig is not long enough to cover the specified range, it will be excluded from the next step in the pipeline.
Annotate entire regions
If a CDR or FR annotation would be truncated due to mismatches, the Always annotate entire regions option instead forces it to end at the boundary of the respective CDR/FR region. That region will be complete and not truncated.
The Combine regions at least option, if selected, means that after assembly or grouping, regions are further clustered to merge sequences which differ due to sequencing errors. It is recommended to use an identity percentage low enough to capture sequencing errors but high enough to preserve true variation. 97% is a reasonable default.
Significant regions thresholds allow you to filter out infrequent regions to retain statistically significant regions for downstream analysis. Regions deemed insignificant are retained but annotated as not significant.
The Significant regions have at least (percentage read count of the cell) input field lets you flag regions with low numbers of reads relative to the total reads in the cell, specified as a percentage.
The Significant regions have at least (reads) input field lets you flag regions with low numbers of reads as not significant. This is useful for filtering out regions that were only defined due to reads of very low frequency or due to sequencing errors.
The Significant regions have at least (percentage read count of the dominant same chain region) input field lets you flag regions where the number of reads is equal to or less than the number in the dominant same chain region (the one with the most reads), specified as a percentage. This setting allows you to filter out regions that do not have enough supporting data for further analysis.
The Only keep regions with at least input field lets you discard regions where the number of reads is equal to or less than the number in the dominant same chain region, specified as a percentage. This setting allows you to permanently discard regions that do not have enough supporting data for further analysis.