Workflows for NGS Antibody Analysis

August 15, 2023 04:51
Updated

This article outlines how to process NGS sequencing data such as Illumina (HiSeq, MiSeq etc.) and PacBio sequencing in Geneious Biologics.

The flow chart below displays the general workflow for working with NGS sequencing data:

New flowcharts for Antibody Analysis (4).jpg

If you have Single Cell data (such as 10X) or Sanger sequences, please see our other articles: Single Cell Analysis Workflows or Sanger Analysis Workflows.

Pre-processing

NGS sequences with UMIs

These include sequences from NGS sequencing kits like the SMART-Seq Human TCR (with UMIs) kit from Takarabio. If you are unsure what UMIs are, please see Understanding Single Cell technologies: Barcodes and UMIs.

Set & Merge Paired Reads

Raw reads from NGS technologies will often need to be either merged or paired. The Set & Merge Paired Reads operation can be found under Pre-processing:

Screenshot 2023-03-30 at 4.07.27 PM.png

Choose the appropriate option for your paired reads. Often this will be either:

Interlaced sequences within each list
Pairs of lists (as with R1 and R2 files)

To determine if you would like to merge reads or just set them, see the following:

If not all of your read pairs are expected to overlap but you would like to retain this sequence information then you can choose to Set Paired Reads but not merge them.
If you chose to Merge Paired Reads, the output will produce two files. One file will contain all your successfully merged read pairs, while the other file will contain the reads that could not be merged due to a lack of overlapping sequence.

***Note that Set & Merge Paired Reads can be skipped if you are working with long read technologies, like PacBio sequences.

Collapse UMIs

As these sequences will have UMIs, next go to Pre-Processing > Collapse UMI Duplicates & Separate Barcodes:

Collapse UMI duplicates & separate barcodes.png

The following options are applicable to most datasets. Note that only the relevant sections need to be ticked - if your data only has UMIs, you can leave the Adaptor and Barcode section un-ticked. Please see our main article Collapse UMI Duplicates and Separate Barcodes.

UMI Section

Length:
- Input the length of the UMI in bp on your sequences.

Settings Section

Adapter/Barcode/UMI are present at:
- Specify which end of the sequence(s) your UMIs are present on. For example, if your UMIs are on the R1 read of Illumina sequences, select the 5' end. The other options are: 3' end or both ends.
Allow single mismatch in UMI, barcode, and TSO
- Sequencing errors can introduce mismatches in adapter sequences. To loosen the criteria for similarity to allow for a single mismatch, select this option.
Trim adapter, barcode, UMI, and TSO from results
- Select this option to remove the sequences for these adapters from the output collapsed sequences. This can be turned off for QAing purposes, but should be on for sequences taken through to annotation and analysis.
Discard sequences shorter than:
- You may discard all sequences shorter than a specified length by ticking this checkbox and entering a number of bases in the corresponding input field.
Discard sequences with a chance of error over:
- You may discard sequences that have a chance of error over a specified percentage by ticking this checkbox and entering a percentage in the corresponding input field.
Collapse sequences with the same UMI if they are more than:
- This input field lets you specify the minimum sequence similarity (identity) required for collapsing sequences within each UMI. The goal is to specify a low enough value that accounts for sequencing errors while specifying a high enough value that doesn't collapse real variation.

You can now proceed to Antibody Annotation.

Processing NGS sequences without UMIs

Set & Merge Paired Reads

Raw reads from NGS technologies will often need to be either merged or paired. The Set & Merge Paired Reads operation can be found under Pre-processing:

Screenshot 2023-03-30 at 4.07.27 PM.png

Choose the appropriate option for your paired reads. Often this will be either:

Interlaced sequences within each list
Pairs of lists (as with R1 and R2 files)

To determine if you would like to merge reads or just set them, see the following:

If not all of your read pairs are expected to overlap but you would like to retain this sequence information then you can choose to Set Paired Reads but not merge them.
If you chose to Merge Paired Reads, the output will produce two files. One file will contain all your successfully merged read pairs, while the other file will contain the reads that could not be merged due to a lack of overlapping sequence.

***Note that Set & Merge Paired Reads can be skipped if you are working with long read technologies, like PacBio sequences.

You can now proceed to NGS Antibody Annotation below.

NGS Antibody Annotation

It is recommended that for NGS Sequencing data, the NGS Antibody Annotator is used. This is because NGS Antibody Annotator collapses the input down to representative sequences according to a specified threshold, while retaining the original count.

To run the NGS Antibody Annotator, select a sequence document(s) and go to the Annotation > NGS Antibody Annotator dropdown:

Annotation > NGS AA.png

The Main Options are listed below:

NGS AA.png

Reference database:

The Reference Database can be a database of antibody germline sequences or of template (variable region) sequences. Your Biologics account will include a Human and Mouse Ig germline database. If you would like to access to other germlines, please either contact us or see How to make a Custom Reference Database.
The reference database is used to help identify the correct FR and CDR regions in the new sequences being analysed. See Understanding Reference Databases.

Selected sequences are:

This asks what chains you are expecting on each read, or pairs of reads. For more information, see the main NGS Antibody Annotator article.

Sequence region of interest:

This allows you to set the boundaries of your sequence. For example, if your forward sequencing primer was located in the FR2 while your reverse sequencing primer was located in the Constant region, you might set this to CDR2 -> FR4

Collapse Sequences at least:

This is the threshold at which sequences will be assumed to be identical, and collapsed down (while retaining the count). The default is 100%.

Retain upstream and downstream of fully annotated region:

These options retain the specified number of nucleotides upstream/downstream of the Sequence region of interest (explained above) when trimming the ends of contigs. This is in order to identify duplicates when ignoring incorrect contig ends. If a contig is not long enough to cover the specified range, it will be excluded from the next step in the pipeline.

For more options, see the main NGS Antibody Annotator article.

Inputing multiple sequence lists

Multiple sequence lists can be selected and run through NGS Antibody Annotator. This will result in one output file (an Annotation Result Document), which will contain all the sequences collapsed only within the same list, not across lists.

All graphs and clustering will be across the whole dataset (multiple lists)

Viewing your results

The results will look similar to an Antibody Annotator result, but the All Sequences Table will represent multiple sequences that have been "collapsed" together due to having 100% identity (or whatever percent you set in the Main Options section). It is therefore more similar to a Cluster Table, see Understanding "Clusters".

NGS AA result overview.png

In the above example output, the most abundant heavy chain in the dataset (Heavy-1) is selected. You can see the sequence in the Sequence Viewer below. % of Sequences refers to the percent that this sequence made up of the entire dataset, and the exact # Sequences that were collapsed which is 497. See Exploring the tables produced by NGS and Single Clone Antibody Analysis for a description of all the columns produced.

Like any other Biologics Annotator Result document, you can also:

Filter your Sequences
View the Graphs for Quality Assurance and Graphs to interpret Clusters and Clonotypes
Perform Sequence Alignments
View the Clusters in your dataset
Add Assay Data to your Analysis Results
Compare Results across Multiple Experiments
Export Annotated Sequences and Sequence Tables