Understanding Clonotypes and how to find them in your data

January 24, 2025 03:20
Updated

This article describes what a clonotype is, and how to find clonotypes in your dataset using the clustering function in Antibody Annotator. To find out more about clustering, see this article.

Jump to:

What is a Clonotype?
How do I find Clonotypes in my Dataset?
Why would you cluster your sequences?
How can I further analyze my Clonotypes?
Graphical Analyses
Alignment across the Heavy CDR3 region

What is a Clonotype?

Generally, clonotypic antibodies are defined as antibody sequences that were derived from the same V and J genes, along with having 80-100% similarity in sequence identity. These antibodies are assumed to originate from clonally related B cells, although sequences may converge in distinct clonal lineages. Clonotyping can be used to monitor clonal expansion following antigen exposure, and to compare similarities and differences in antibody repertoires within and between different subjects.

“Public sequences” refer to clonotypes that occur in more than two individuals, a result of convergent evolution of antibody sequences to recognize a common antigen.

How do I find Clonotypes in my dataset?

In Biologics, the "clustering" function is used to find clonotypes. Clustering will group your sequences together based on shared identity or amino acid similarity across sequences in the regions selected. You can also cluster on genes, which will group sequences that have the same "best-fit" germline gene. For more on what clustering is and what other insights you can glean from clusters in your data, see this article.

Clustering can be specified when using Antibody Annotator, under the clustering options box. Shown below are the default clusters that will be found by Antibody Annotator in your dataset.

Screen_Shot_2022-02-17_at_4.22.57_PM.png

Different clusters can be specified by clicking the blue "Plus" icon in the Clustering Options box and then selecting the Advanced tab, as shown below.

Screen_Shot_2022-02-23_at_10.13.29_AM.png

To select multiple regions and/or genes, hold down the command or control button on your keyboard while selecting. To specify a cluster that will identify the clonotypes in your dataset, select the Heavy V Gene, the Heavy J Gene and the Heavy CDR3 region. Chose the following settings:

Cluster Method: Identity

Similarity Threshold: 85%

Allow Mismatches In: Heavy CDR3

Screen_Shot_2022-02-25_at_3.28.37_PM.png

This will allow for some sequence dissimilarity (up to 15%) in the Heavy CDR3 region to account for somatic hypermutation during clonal expansion. You can alter this threshold value to capture more dissimilar or more strict clonotypes as needed.

If you would like to learn more about how to specify different clusters, how to cluster using more advanced techniques (like amino acid similarity rather than strict identity) and how these clustering algorithms work, see the Advanced Clustering Options article.

Why would you cluster your sequences?

Reviewing how sequences cluster early in your sequence processing pipeline provides information on the diversity of sequences within your dataset. It also indicates what sequences are dominating your dataset (if any), and reduces the redundancy of identical sequences within your dataset.

Clustering patterns can also be used to assess the general quality of the dataset and make judgements on whether to proceed with further analyses.

How can I further analyze my Clonotypes?

Graphical analyses

The clustering function in Antibody Annotator produces many additional graphs for interpreting your data. See this article for how to find and view these graphs in your Antibody Annotator result. Examples of graphs that may be useful include:

Relationships/similarity between sequences can be explored with our Network and Tree plots. See Network and Tree plots: Identifying clonotype and sequence relationships for more information
The relative proportion of sequences in your dataset that represent each clonotype (ie. which clonotypes are most represented in your data)
Heatmaps showing the frequency of gene combinations between the various Heavy V genes and Heavy J genes
Heavy CDR3 length distribution

Alignment across the Heavy CDR3 region

To explore the variation across the Heavy CDR3 region you can perform an Alignment. This will produce an alignment tree like the one below. To learn more about alignments and how to perform them in Biologics, see this article.

References

Marks & Deane, "How repertoire data are changing antibody science", Journal of Biological Chemistry, V.295, Issue 29, 2020, p9823-9837, https://doi.org/10.1074/jbc.REV120.010181.

Hershberg Uri and Luning Prak Eline T. 2015 The analysis of clonal expansions in normal and autoimmune B cell repertoires Phil. Trans. R. Soc. B3702014023920140239
http://doi.org/10.1098/rstb.2014.0239