In this article:
- How to Filter
- Filter Syntax
- Notes on SQL syntax for Advanced users
- Examples of simple searches
- Examples of advanced searches
- Examples of multiple search queries
- Filtering using Scripts
How to Filter
To filter your results, you can either right click on a cell with the values you want to filter on. In the example below, right clicking and selecting the suggested Filter (['Fully Annotated'] = 'Yes') will append AND ['Fully Annotated'] = 'Yes' to the filter resulting in ['Chain'] = 'Heavy' AND ['Fully Annotated'] = 'Yes' query syntax.
Note that this right click to build a filter syntax function automatically adds the AND operator but this operator can be easily changed to other operators.
You can also manually specify a query with conditions in the form:
comparative operator
You can also combine multiple conditions using logical operators. Once these values are entered in the Filter bar in the NGS results table, hit Enter or click Filter.
Note that:
- SQL filter query is case sensitive and single quotes are only used for column type STRING.
- To combine multiple queries for a single search, you can use 'AND' and 'OR'.
- It is not possible to test for NULL values with comparison operators, such as =, <,><></></,>
Filter Syntax
The Biologics Annotator Result table and Comparison table can be filtered using SQL search queries. The following outlines the filtering language, and some examples of how to use it to build your search query.
Filtering can be performed by specifying a valid SQL condition using column headers as the fields to filter on. Comparative operators define how query conditions will be compared and are dependent on the column type. While most columns in the NGS results table are of type STRING, columns may also be of type NUMBER.
The SQL filter syntax supports the following STRING comparative operators:
- =
- !=
- LIKE (search for a pattern - regex)
- NOT LIKE (search for a pattern - regex)
Note that '=' and '!=' will only work for exact character queries while 'LIKE' and 'NOT LIKE' will work with ambiguous character queries.
There are two wildcards used in conjunction with the LIKE operator:
- % (zero, one, or multiple unspecified characters)
- _ (a single unspecified character)
The SQL filter syntax supports the following NUMBER comparative operators:
- (more than)
- <(less></(less>
- = (equal to)
- <=></=>
- = (more than or equal to)
- != (does not equal)
The SQL filter syntax supports the following logical operators:
- AND
- OR
- BETWEEN
- IN
- IS NULL
- IS NOT NULL
Mathematical operations are supported using a script. See the Advanced searches section for an example.
Note that NULL is a special value that signifies 'no value'. Both the 'IS NULL' and 'IS NOT NULL' operators are not followed by a value.
Notes on SQL syntax for Advanced users
This section explains the SQL syntax used in Geneious Biologics. If you are unfamiliar with SQL, you can skip this section and proceed to the next sections to learn more about filtering followed some examples of SQL searches.
Below is an example of a SQL syntax used in Geneious Biologics for filtering:
SELECT column1, column2,
FROM my_table
WHERE
To filter your results, you will only need to input the WHERE clause which is indicated by . In SQL, the WHERE clause is used to filter results and the WHERE clause is used to extract only those results that meet a specified condition, therefore, to use the SQL filter, you will need to specify both ‘WHERE’ and ‘condition’ in the presence of a comparative operator.
Examples of simple searches
To search for sequences that meet the condition of having a heavy CDR3 sequence which contains the motif AAQTRPRPYGTSRAEGDYGY, use an SQL filter query in the form:
['Heavy CDR3'] = 'AAQTRPRPYGTSRAEGDYGY'
In this example, Heavy CDR3 is the field we are filtering on, = is the operator, and AAQTRPRPYGTSRAEGDYGY is the value.
To search for sequences that meet the condition of having a score of more than -1000, use an SQL filter query in the form:
['Score'] = -1000
In this example, Score is the field we are filtering on, = is the operator, and -1000 is the value.
To search for sequences that meet the condition of having a heavy CDR3 length between 10 and 12 amino acids long, use an SQL filter query in the form:
['Heavy CDR3 Length'] BETWEEN 10 AND 12
In this example, Heavy CDR3 Length is the field we are filtering on, BETWEEN and AND are the operators, and 10 and 12 are the values.
To search for sequences that meet the condition of having a heavy CDR3 sequence, use an SQL filter query in the form:
['Heavy CDR3'] IS NOT NULL
In this example, Heavy CDR3 is the field we are filtering on, IS NOT NULL is the operator.
Examples of advanced searches
To search for sequences that meet the condition of having a heavy FR4 sequence which contains an amino acid motif that starts with “W”, use an SQL filter query in the form:
['Heavy FR4'] LIKE 'W%'
In this example, Heavy FR4 is the field we are filtering on, LIKE is the operator, and W% is the value.
To search for sequences that meet the condition of having a heavy CDR3 sequence which contains an amino acid motif that has a "Y{any amino acid residue}Y” motif, use an SQL filter query in the form:
['Heavy CDR3'] LIKE '%Y_Y%'
In this example, Heavy CDR3 is the field we are filtering on, LIKE is the operator, and %Y_Y% is the value.
To search for sequences that meet the condition of having a heavy FR4 sequence which contains an amino acid motif that starts with “W” and ends with “S”, use an SQL filter query in the form:
['Heavy FR4'] LIKE 'W%S'
In this example, Heavy FR4 is the field we are filtering on, LIKE is the operator, and W%S is the value.
To search for sequences that meet the condition of having a heavy FR4 sequence which contains an amino acid motif that has a “TLVTV” motif in any position of the heavy FR4 sequence, use an SQL filter query in the form:
['Heavy FR4'] LIKE '%TLVTV%'
In this example, Heavy FR4 is the field we are filtering on, LIKE is the operator, and %TLVTV% is the value.
To search for sequences that meet the condition of not having a stop codon (*) in the heavy CDR3 sequence, use an SQL filter query in the form:
['Heavy CDR3'] NOT LIKE '%\\*%'
In this example, Heavy CDR3 is the field we are filtering on, NOT LIKE is the operator, and %\\*% is the value.
Examples of multiple search queries
To search for sequences that meet the conditions of having both a heavy chain and is fully annotated, use an SQL filter query in the form:
['Chain'] = 'Heavy' AND ['Fully Annotated'] = 'Yes'
In this example, Chain and Fully Annotated are the fields we are filtering on, = and AND are the operators, and Heavy and Yes are the values.
To search for sequences that meet the conditions of having either both chains or a heavy chain, use an SQL filter query in the form:
['Chain'] = 'Heavy' OR ['Chain'] = 'Both'
In this example, chain is the field we are filtering on, = and OR are the operators, and Both and Heavy are the values.
To search for sequences that meet the conditions of being fully annotated, with a heavy CDR3 length between 8 and 10 amino acids long, use an SQL filter query in the form:
['Fully Annotated'] = 'Yes' AND ['Heavy CDR3 Length'] BETWEEN 8 AND 10
In this example, Fully Annotated and Heavy CDR3 Length are the fields we are filtering on, =, AND and BETWEEN are the operators, and Yes, 8, and 10 are the values.
To search for sequences that meet the conditions of having a heavy chain, is fully annotated and have a CDR3 length of 10 amino acids, use an SQL filter query in the form:
['Chain'] = 'Heavy' AND ['Fully Annotated'] = 'Yes' AND ['Heavy CDR3 Length'] = 10
In this example, Chain, Fully Annotated and Heavy CDR3 Length are the fields we are filtering on, = and AND are the operators, and Heavy, Yes, and 10 are the values.
The operator IN can also be useful for performing multiple searches within the same column. If you would like to search for a list of particular CDR3 sequences you could use the following SQL filter query:
['Light CDR3'] IN ('QQYGSSQGT', 'QQYYTTPIT', 'QSYDGSLSGSV')
Filtering using Scripts
To create a filter based on a condition between two fields, advanced queries using script can be used. This is especially useful when analyzing comparison of annotated results. To search for sequences that have more counts in Sample A over Sample B, use an SQL filter query in the form:
script( '(doc["Count Sample A"].value doc["Count Sample B"].value)' )
In this example, Count Sample A and Count Sample B are the two fields we are filtering on, is the operator.
Scripting also allows for more custom expression using Arithmetic Operators (+, -, *, /, %). To search for sequences that have twice as many counts in Sample A over Sample B, use the SQL filter query in the form:
script( '(doc["Count Sample A"].value * 2 doc["Count Sample B"].value)' )
In this example, Count Sample A and Count Sample B are the two fields we are filtering on, and * are the operators.
Invalid Characters
Apostrophes need to be escaped from the column name if they are used for filtering. To do this, you would add a second apostrophe beside the existing one.
e.g. ['1' Screen'] will instead need to be written as ['1'' Screen'].
The filter builder context menu will handle this for you.
Comments
0 comments
Please sign in to leave a comment.