What is a clonotype?
Generally, clonotypic antibodies are defined as antibody sequence clusters derived from V and J genes with 80-100% similarity in sequence identity, and are assumed to originate from clonally related B cells, although sequences may converge in distinct clonal lineages. Clonotyping can be used to monitor clonal expansion following antigen exposure, and compare similarities and differences in antibody repertoires within and between different subjects. “Public sequences” refer to clonotypes that occur in more than two individuals, a result of convergent evolution of antibody sequences.
How do I find clonotypes in my dataset?
In Biologics, the "clustering" function is used to find clonotypes. Clustering will group your sequences by similarity based on shared identity across sequences in the regions selected, or in the case of genes, sequences that have the same "best-fit" germline gene. For more on what clustering is and what other insights you can glean from clusters in your data, see this article.
Clustering can be specified when using Antibody Annotator, under the clustering options box. Shown below are the default clusters that will be found by Antibody annotator in your dataset.
Different clusters can be specified by clicking the blue "Plus" icon in the Clustering Options box and then selecting the Advanced tab, as shown below.
To select multiple regions and/or genes, hold down the command or control button on your keyboard while selecting. To specify a cluster that will identify the clonotypes in your dataset, select the Heavy V gene, the Heavy J gene and the Heavy CDR3 region. Chose the following settings:
Cluster Method: Identity
Similarity Threshold: 85%
Allow Mismatches In: Heavy CDR3
This will allow for some sequence dissimilarity (up to 15%) in the Heavy CDR3 region to account for somatic hypermutation during clonal expansion. You can alter this threshold value to capture more dissimilar or more strict clonotypes as needed.
If you would like to learn more about how to specify different clusters, how to cluster using more advanced techniques (like amino acid similarity rather than strict identity) and how these clustering algorithms work, see the Advanced Clustering Options article.
Why would you cluster your sequences?
Reviewing how sequences cluster early in your sequence processing pipeline provides information on the diversity of sequences within your dataset, indicates what sequences may be dominating your dataset, and reduces redundancy of identical sequences within your dataset.
Clustering patterns can also be used to assess the general quality of the dataset and make judgements on whether to proceed with further analyses.
How can I further analyse my clonotypes?
The clustering function in Antibody Annotator produces many additional graphs for interpreting your data. See this article for how to find and view these graphs in your Antibody Annotator result. Examples of graphs that may be useful include:
- The relative proportion of sequences in your dataset that represent each clonotype (which clonotypes are most represented in your data)
- Heatmaps showing the frequency of gene combinations between the various Heavy V genes and Heavy J genes
- Heavy CDR3 length distribution
Alignment across the Heavy CDR3 region
To explore the variation across the Heavy CDR3 region you can perform an Alignment. This will produce an alignment tree like the one below. To learn more about alignments and how to perform them in Biologics, see this article.
Marks & Deane, "How repertoire data are changing antibody science", Journal of Biological Chemistry, V.295, Issue 29, 2020, p9823-9837, https://doi.org/10.1074/jbc.REV120.010181.
Hershberg Uri and Luning Prak Eline T. 2015The analysis of clonal expansions in normal and autoimmune B cell repertoiresPhil. Trans. R. Soc. B3702014023920140239