In this tutorial, you will learn how to group standalone sequences into a sequence list and pair heavy and light chains prior to sequence annotation.
This tutorial will cover the following exercises:
- Sequence Grouping
- Heavy and Light Chain Pairing
- Sequence Annotation
- Adding Assay Data
- Filtering and Sequence Alignment
Get started: To start this tutorial, you will need the input data. If you have recently started Geneious Biologics, your organisation may already have the tutorial folders set up as described in the tutorial below. If not, you can still follow this tutorial by first downloading the input sequences here and then uploading them into Geneious Biologics.
The first few videos in our Getting Started series may also be helpful, linked here. Below is our video on Pre-processing Sanger Sequences.
Sequence lists make it easier to manage large numbers of sequences by grouping related sequences into a single document.
In this exercise, you will learn how to group multiple nucleotide sequences into a Nucleotide Sequence List. To group the standalone clinical-stage antibody sequences into a sequence list, select all the sequences located in the Input data folder, click Pre-processing > Group Sequences.
In the example above, the 274 standalone sequences in the Input data folder were grouped into a single Nucleotide Sequence List named Clinical-stage antibodies.
**Note that you can also group multiple Sequence Lists into a single Sequence List.
Heavy and Light Chain Pairing
Previous studies have shown that the interaction between the heavy and light chain variable regions may alter the positions of hypervariable loops and in turn affect the conformation of the antigen binding site. The Pair Heavy/Light Chains operation pairs heavy and light sequences allowing these sequences to be analysed as a single molecule.
In this exercise, you will learn how to pair heavy and light chain sequences with similar names. Select the Clinical-stage antibodies Sequence List, click Pre-processing > Pair Heavy/Light Chains. Then, select the Match sequence names (for standalone sequences, or within each list) (slow) in the Pair Heavy/Light dialog box option to pair sequences with similar name.
This operation will produce a Clinical-stage antibodies (paired) document. To view the paired sequences, select the Clinical-stage antibodies (paired) document in the Chain association folder.
In the example above, the sequences are paired by name (indicated by the paired-reads icon and the red outlined encompassing the paired reads). Learn more about heavy and light chain pairing here.
The Antibody Annotator is a versatile pipeline that identifies and annotates both NGS-type and Sanger-type sequences in reference to an immunoglobulin reference database.
In this exercise, you will learn how to annotate paired heavy and light chain sequences. To annotate these heavy and light chain sequences, select the Clinical-stage monoclonal antibodies (paired) document in the Chain association folder and click Annotation > Antibody Annotator.
Select the following options from the Antibody Annotator dialog box (see sections and image below).
Select the following options:
- Reference database: Human Ig 2022
- Selected sequences are: Both chains in associated sequences
Select the following options:
- Annotate Numbers (IMGT)
- Annotate germline gene differences
- Find liabilities and assets
Leave all other settings as default, and click Run.
This operation will produce a Clinical-stage antibodies (paired) Annotated & Clustered Biologics Annotator Result document.
Add Assay Data
Below is our introductory video on adding Assay Data to your annotated sequences:
Data from antibody assays can be appended to Biologics Annotator Result documents. The assay data coupled with the sequence annotation, liabilities score and germline differences produced by Antibody Annotator may aid antibody candidate selection. This can then in turn serve as an antibody registry for future analysis.
In this exercise, you will learn how to add assay data into a Biologics Annotator Result document. First you will need an excel or csv file containing your functional data. You can download one for the tutorial here. Then open the Clinical-stage monoclonal antibodies (paired) Annotated & Clustered document and click Add Metadata > Add Assay Data.
Follow the prompts to upload the assay data file. To match the columns on Name, select Name in the Matching column in the document dropdown in the Add Assay Data dialog box and click Add Assay Data to append the assay data columns (see image below).
To view the added assay data, select the Clinical-stage monoclonal antibodies (paired) Annotated & Clustered (with assay data) document in the Add assay data folder and scroll to the right end of the Sequences Table or use the Table Preferences panel to find the column of interest. You can then hover over the column and click on the Focus column button that appears to jump straight to that column:
**Note that adding assay data does not create a new document as the assay data columns are appended to the existing Biologics Annotator Result document. For this tutorial, an identical Biologics Annotator Result was generated and the assay data was appended to the Clinical-stage monoclonal antibodies (paired) Annotated & Clustered (with assay data) document in the Add assay data folder. See our article on adding external assay data to learn more.
Filtering and Sequence Alignment
Below is our introductory video on aligning your annotated sequences:
Multiple sequence alignment is a comparison of multiple related DNA or amino acid sequences to identify regions of similarity and dissimilarity that may be a consequence of evolutionary relationships between the sequences. Alignments can be used for many purposes, including inferring the relatedness between sequences.
In this exercise, you will learn how to filter and align sequences that meet a set of conditions. To align sequences that meet the conditions of having a Score of ≥ -200 and an ELISA affinity value of ≥ 1.5, first, filter the sequences using the following filter syntax:
['Score'] >= -200 AND ['ASSAY_DATA_Biophysical_Assays:ELISA'] >= 1.5
Once the filter syntax is entered, click Filter or hit Enter. This should result in 20 heavy and light paired sequences. Subsequently, select all of the paired sequences that meet the filtering criteria and click Post-processing > Align (see image below)
Select the following options in the Alignment dialog box to align the Heavy CDR3 amino acid sequences and click Run to start the analysis (see section and image below).
Regions to Align
- Align regions: Heavy CDR3 - 20 sequences
Translate Nucleotide Sequence(s) Prior to Alignment
- Alignment algorithm: MAFFT
- Build tree from alignment with: Geneious Tree Builder Algorithm
This alignment operation will produce a new document with an alignment tree of the Heavy CDR3 protein sequence generated with the Neighbour-Joining method.
You will now learn how to view alignment with assay data and learn how to sort the sequences on assay data. First, to view the alignment, select the 20 protein sequences alignment tree document located in the parent folder. Then in the Sequence Viewer panel, select the following options to sort the sequences in descending Liability score and ELISA affinity value (see image below).
The alignment and sequence logos showed a relatively highly conservation of amino acid residues A1, R2, F18, D21 and Y22 of the Heavy CDR3 region (Figure 2.1).Additionally, the pairwise identity and Wu-Kabat variability graphs in the alignment show high conservation at the ends of Heavy CDR3 (Figure 2.1A).
Figure 2.1 | Conservation of amino acid residues at both ends of the Heavy CDR3 region. A) Heavy CDR3 amino acid sequence alignment sorted by descending liability score and ELISA affinity. B) Sequence logo shown as a frequency of amino acid per position. C) Sequence logo is shown as the entropy of amino acids per position.
**Note that the Neighbour-Joining (NJ) tree will dissolve upon sorting, to view the NJ tree, remove the sorting field. Read more about sequence alignment and sequence associated metadata here.
To access the Sequence Logo graphs shown above (Figure 2.1B, 2.1C), click on the Sequence Logo tab next to the Sequence Viewer tab:
To the right of the Sequence Logo, you can select to plot the amino acids by Frequency or Entropy, and colour the amino acids by a variety of options including Hydrophobicity and Polarity.
*The colored amino acids in the alignment are in agreement with the consensus sequence while the grey amino acids are dissimilar to the consensus sequence.
***Tutorial Reference - Biophysical properties of the clinical-stage antibody landscape, PNAS January 31, 2017 114 (5) 944-949; first published January 17, 2017. https://doi.org/10.1073/pnas.1616408114