Single Cell Analysis Workflows

October 26, 2023 04:12
Updated

Single Cell Antibody Annotator is useful for anyone who is expecting to find one (or more) dominant heavy and light chains in each of their samples. This could be barcoded data where each barcode corresponds to a cell or a well, or fasta/fastq files where each file is expected to have a dominant chain or pair of chains.

See Understanding Single Cell technologies: Barcodes and UMIs if you are unsure what single cell analysis is. The video below gives an overview of these concepts:

There are three main ways to handle 10X and other barcoded/UMI sequence data in Geneious Biologics. The best path depends on how much preprocessing has already been done prior to upload, which will determine your entry point.

Method 1 - Unprocessed - Pair, Collapse UMIs, Single Cell Antibody Annotator
Method 2 - Partially pre-processed - Single Cell Antibody Analysis from Separate Lists
Method 3 - Fully pre-processed - Antibody Annotator + Add Assay Data
FAQ: Why do my results for Antibody Annotator and Single Cell Antibody Annotator look different?

Method 1 - Pair, Collapse UMIs, Single Cell Antibody Analysis

If you have a fastq/fasta (or other format) file with raw sequences that contain UMI and/or Barcodes and have not been demultiplexed, this is the method to use. The NGS Tutorials 3 & 4 on our website follow a similar workflow, so that may be a good place to start. You can find them here: NGS Tutorials 3: Using Barcodes and UMIs and Tutorial 4: Single Cell Antibody Analysis. This tutorial has a small sampled dataset, the full dataset can be provided on request.

A typical workflow could look like:

Set + Merge Paired Reads: feed raw FASTQ files into 'Set and Merge Paired Reads'. You may wish to choose the Setting Paired Reads option if you are working with 10X data. The UMI/Barcode tool can handle paired unmerged reads.
UMI Collapse and Separate Barcodes: Feed paired (or merged) reads into Collapse UMI Duplicates and Separate Barcodes. This merges all similar sequences with the same UMI and sorts sequences from the same Barcode together. It is useful to leave the Trim UMIs/Barcodes' option off while experimenting, but you will want to have them trimmed off for the next step. The Barcode will be saved on each sequence as metadata. See Collapse UMI Duplicates and Separate Barcodes.
Annotation: Feed the (trimmed) generated consensus sequences into Single Cell Antibody Annotator. Single Cell Antibody Annotator attempts to combine related sequences within a barcode so that you can see the dominant clones. You could also use Antibody Annotator if you wish, which will analyse each of the sequences without merging them. Both tools offer clonotype analysis (our cluster tables, see Understanding "Clusters").
Discovery: Sequence Alignment can be performed directly from the results. You can choose to align translated regions of interest (e.g. HCDR3), with trees.
Export: Extract selected annotate candidate sequences, or export tables, graphs, and images for reporting or record keeping purposes. See Exporting Annotated Sequences and Sequence Tables.

Method 2 - Single Cell Antibody Analysis from Separate Lists

If you or your sequence provider have already done some demultiplexing in another tool, you can feed the preprocessed sequences into Geneious Biologics for annotation and analysis. If you have your sequences sorted into sequence lists (such as one fastq file per clone/barcode/well/cell) then you could use these sequences directly with Single Cell Antibody Annotator. When you run Single Cell Analysis on multiple files or sequence lists, then it will attempt to find a dominant heavy and light chain for each list. NGS Tutorial 5. Single Cell Analysis from Separate Lists goes over this method.

Method 3 - Antibody Annotator + Add Assay Data

If you have already done some demultiplexing in another tool such as Cell Ranger, but don't have them sorted into sequence lists, then you could feed the consensus sequences directly into Antibody Annotator. You can import it as sequences (such as fasta, fastq or genbank format) or as a csv file using our csv sequence import tool.
Antibody Annotator will annotate FR and CDR regions, identify liabilities, variants relative to germline, clonotypes and more. The Add Assay Data feature can also be useful to add any coverage or depth information from csv or excel files from your demultiplexing stage.

Note: For barcoded data in particular, when trialling it can be useful to have actual samples of your expected data format. This allows you to determine the optimal pre-processing and analysis workflow and make tweaks where necessary. As always, please contact us if you have any questions or feedback.

FAQ: Why do my results for Antibody Annotator and Single Cell Analysis look different?

Single Cell Antibody Annotator and Antibody Annotator have some differences in their underlying algorithm, so it is not surprising if their results are similar but not exactly the same. The inherent differences in the analysis are explained below, so that you know how to compare the two results.

Antibody Annotator

Antibody Annotator annotates sequences individually or in pairs (for Illumina paired reads). That means each individual input sequence gets its own row in the All Sequences table, even if it is only a fragment of a variable region or not identifiable. The cluster tables shows the number of sequences that have the exact same protein sequence over the chosen region. For the VJ cluster table, this will only show sequences (or chain pairs!) that were identified as having a full VJ region. So the VJ Cluster table shows the number of sequences with the exact same protein sequence over a complete VJ region. Read the Understanding "clusters" article for more details.

Single Cell Antibody Annotator

Single Cell Analysis is designed to find dominant sequences within a sample or samples, and is often used when the scientist expects to only find one or two dominant heavy/light chains per sample/cell/clone/sequence list/barcode. It is often used in combination with barcoded sequences, or for batch analysis where each uploaded file represents a different but related sample. Single Cell Antibody Annotator uses some special rules:

A single row in the "All Sequences" table may represent many individual reads.
Sequences are combined together based on a similarity threshold (you can set this in the options). If you have the 'De Novo Assembly' option turned on then sequences are assembled together. This means that one row in the All Sequences table might include sequences that originally had slightly different VJ protein sequences. The VJ Region amino acid sequence shown is the consensus across all similar reads.
Only "Fully Annotated" sequences (or assembled contigs) are kept to show in the analysis results. You can define what regions should be used to determine "Fully Annotated".
Sequences shorter than a certain length may also be filtered out, depending on your options.
More detailed explanation of Single Clone Analysis filtering and options can be found in the main Single Cell Antibody Annotator.

Summary: There are a number of differences between the two analysis pipelines, but the main difference is that for Antibody Annotator the VJ Regions you highlight above are showing sequences with the exact same VJ protein sequence, whereas with Single Cell it is showing a consensus of similar VJ regions - a reduced dataset. In this sense, it is more similar to our NGS Antibody Annotator pipeline.