The Pathway Tools Advanced Query Pages
The Structured Advanced Query Page
and the Free Form Advanced Query Page
Note: available on this Web site are two webinars on Using the Structured Advanced Query Page.
The Advanced Query Pages allow you to write queries to extract data from Pathway/Genome Data Bases (PGDBs), hosted on a Pathway Tools server. A complex database query is a database expression that selects a subset of data from a PGDB by specifying constraints on the values of data fields, by combining information from different regions of the DB, and by operating on data fields. Example: "Find Reactions that have a Reactant that is a Small-Molecule and the Common-Name of the Reactant is ATP."
There are two different interfaces for formulating queries.
- The Structured Advanced Query Page: This is the initial page provided when you first click the Advanced Query button (see Section 2).
- The Free Form Advanced Query Page: This is accessible from the Structured Advanced Query Page by clicking the similarly labeled wide button at the top of the web page (see Section 3).
Both interfaces use an underlying query language called BioVelo — but to use the Structured Advanced Query Page does not require the user to know this language because this page translates your input to BioVelo. The Free Form Advanced Query Page gives full and direct access to the BioVelo language.
You can switch back and forth between the Structured page and the Free Form page, using the button near the top of the pages. However, this switching will not modify the contents of the pages. In particular, you can enter a query on one page, submit it, and then switch to the other page and submit a different query. But, the output format selection is shared between the two pages.
The databases queried by BioVelo contain objects belonging to various classes: metabolic pathways, reactions, proteins, genes, and so on. Each class has a set of attributes associated with it. For example, the class Proteins has attributes that include pI (its isoelectric point), and Gene (the gene encoding the protein). That means that each protein object (instance) of this class has the attribute Gene, although in some objects, the attribute may have no value. The Pathway Tools schema (ontology) is described in several documents. The most comprehensive is the Pathway Tools User's Guide, which is available as part of the Pathway Tools software download package. See also several publications listed on the BioCyc publications page. The better you know the Pathway Tools schema, the more adept you will be at writing BioVelo queries, because you will need to know what classes to base your queries on, and which attributes to filter in your queries.
The Structured Advanced Query Page has been designed to facilitate writing simple as well as complex queries. This page is formed dynamically and its content expands depending on your selections. This interface lets you formulate a query without knowing the underlying query language (BioVelo). When you submit your query, it is translated into BioVelo before being sent to the server. As mentioned before, this page does not provide complete access to the BioVelo language, which is richer in capability than this interface provides. But the page allows a powerful range of queries to be formulated. The Free Form Advanced Query Page (see Section 3) provides full access to the BioVelo language.
The Structured Advanced Query Page contains two main sections. Your first step should be to specify the query in the section labeled Specify your query below. Only after specifying the query should you specify the output contents of the query in the section labeled Specify the contents of the output of your query below. The selection of the query output contents depends on the specified query. The default output format is HTML format for viewing in your Web browser. The format can be changed by selecting the radio button labeled Text Tabulated instead, which may be preferable in some cases. The wide button at the bottom of the page labeled Submit Query should be clicked only after specifying your query, the desired output format, and the contents of the output.
To get started quickly here are some query examples, and descriptions of how to build those queries using the Structured Advanced Query Page. More details on using the Structured Advanced Query Page are given in the next subsections.
Example Query 1: Find all the proteins of E. coli K-12.
Starting from the initial page, you select the database E. coli K-12, and the selector next to search for to Proteins. The column 1 in the contents of the output section is pre-selected to NAME which is good selection for this query. Clicking the Submit Query button, the query is sent and a new browser window will open displaying the result (this may take a while depending on the server) -- that is a table of one column of all known proteins defined in the PGDB for E. coli K-12. Clicking on a protein name will bring you to the PGDB information page about the selected protein.
Example Query 2: Find all the proteins of E. coli K-12 for which the DNA-FOOTPRINT-SIZE is smaller than 10.
As in example 1, you select database E. coli K-12 and Proteins for the first two selectors. Since we want to add a condition to this search, click the add a condition; a Where clause will appear. Select DNA-FOOTPRINT-SIZE next to the Where; then select is smaller than from the selector next to it (the operator "<"). Enter the value 10 in the last free input text box. Since you would probably like to see the value of DNA-FOOTPRINT-SIZE for each proteins, add an output column by clicking add column in the bottom part of the web page and selecting DNA-FOOTPRINT-SIZE from the pull-down menu of this new column. Submit the query by clicking the submit button.
This query is intended to select transcription factors, and although it scans all proteins in the PGDB, only transcription factors will have the DNA-FOOTPRINT-SIZE attribute set.
Example Query 3: Find all E. coli K-12 proteins that have No information and 2006 in the comment attribute, meaning that curators found no information about this protein during literature searches performed in 2006.
As in example 1, you select database E. coli K-12 and Proteins for the first two selectors, then click the add a condition; a Where clause will appear. You then select COMMENT from the selector next to Where (COMMENT is an attribute of protein objects) and contains the substring from the next selector. In the last box of this line, which is a free text input box, not a selector, you enter No information. The repeat operator should already be selected to at least one element of. On the next line there is a selector box with add a condition; select and from it. A new term appears on its right; you essentially do the same operations as on the first line but enter 2006 in the free text input box on the right. Finally, click the Submit Query button; the proteins that satisfy this query will be displayed in a new browser window.
Example Query 4: Search for all pathways in MetaCyc that are in the taxonomic range of metazoa.
You select database MetaCyc and class Pathways, then click the add a condition; a Where clause will appear. You then select Taxonomic-Range from the selector next to Where. The repeat operator for some object ... will automatically appear on the left of the Taxonomic-Range attribute and a we have subcondition will be created underneath them. There is a repeat operator, since the attribute Taxonomic-Range is a list of objects, not a single value. This subcondition applies to the objects of the Taxonomic-Range attribute. Enter metazoa in the green box located to the right of the attribute NAME that was automatically selected when the we have subcondition was created. Finally, click the Submit Query button;
Example Query 5: Search for all reactions in MetaCyc that have D-glucose on the left (reactant) and D-glucose-6-phosphate on the right (product).
The next subsections explain the Structured Advanced Query Page in more detail.
2.2 The Initial Page
In the initial state of the Structured Advanced Query Page, only one simple search component with two selectors (also called pull-down menus) are shown: a database selector and a class selector. You can select the desired database and class by using these pull-down menus. You submit such a query by clicking the Submit Query button. This is a global search for all objects of the given class for the given database. This will typically return many results.
Note that the class selector shows the class names in a hierarchical manner to present the subclass relation between them. That is, if class S is a subclass of class T, S would be shown underneath T and indented to the right with at least one dash. For example, the class of reactions is divided into the smaller subclasses of binding-reactions and small-molecule-reactions. Subclasses can themselves have subclasses, which are shown indented several times with several dashes.
Each class name is followed by a number in parentheses. This is the number of instances that exist in the class, for the database selected, available on the server. For example, --Genes (4819) says that there are 4819 genes in the database selected (e.g., MetaCyc). If the server is busy, this number may be absent for a few seconds when you first access the web page as the server needs to calculate it once a database is selected.
Often there is a need to add one or more conditions to the search to select a subset of all instances of a class. You can add conditions to a search by clicking the button add a condition, causing a where clause to open up below the two main selectors. (The clicked button will also change to a different state with the label remove condition.) The where clause will show one term. A term is composed of a left operand, that is a pull-down menu (typically the attribute NAME is preselected), a relational operation, that is a second pull-down menu, and a right operand that is initially a free input text box. The right operand is initially a pull-down menu if the left operand attribute is of type enumerated or Boolean. You can formulate a condition on the selected attribute by selecting the appropriate operation and a value for the right operand. For example, to have the condition that the attribute NAME contains the substring tr you select the relational operator contains the substring and enter tr in the green text box. (Do not enter the surrounding double quotes for a string; this is automatically inserted by the user interface when the query is sent to the server.)
When selecting an attribute in a condition, or the query output, the list shown is in increasing alphabetical order. Moreover, any attribute that refers to another object, or list of objects, is shown with a light blue background. These attributes allow you, among other things, to go from one class of objects to another class of objects. For example, the attribute Product of class Genes refers to a list of Polypeptides or RNA. The background color when trying to select it, from an attribute selector, is light blue. (note: the color is actually modifiable from the style sheet of the server. It may vary from one server to another.) When selecting such an attribute, a subcondition is created to specify a condition based on the attribute(s) of the object or objects of this selected attribute.
There are several string relational operators available when an attribute is a string. Typical relational operators used are "is equal to", "contains the substring", etc. It is also possible to use the more complex "is similar to (regular expression)" or "is not similar to (regular expression)". In this case, the right operand entered in the green box should be a regular expression. The regular expression syntax follows the Perl language syntax rules (See Perl regexp at Wikipedia ). For example, the regular expression t[a-z]*b corresponds to all strings that contains the letter 't' followed by any number of lower-case letters and have eventually a 'b'. A string is similar to this regular expression if it contains a substring that matches the regular expression. If you want to search for strings that entirely match the regular expression, and not one of its proper substrings, you must use the beginning and end Perl regular expression operators, '^' and '$' respectively. For example, ^t[a-z]*b$ matches all strings that start with a 't', is followed by any number of lower-case letters, and end with a 'b', but not strings that only have such strings as proper substrings: trpb matches this regular expression, but not trpba as it does not end with a 'b'. Note that all the letters in a regular expression are case-sensitive.
A variable selector is provided for the left operand if more than one variable is active at the location of the term. The right operand will have a button on its right with the label switch to variable entry under the same condition. If clicked, it will modify the right operand into a variable/attribute pair of pull-down menus (selectors). See Subsection 2.8 for more information about this button. See Subsection 2.10 for more information about variables.
In the case of a right operand as a variable/attribute pair of selectors, the list of attributes that can be selected depends on the type of the left operand: the list of attributes shown depends on the type of the right variable selected, and some attributes may be grayed out since their type is such that no valid operation can be done with the left attribute.
Several conditions can be added by selecting a logical operator from the pull-down menu labeled add a condition. There are four provided logical operators: and, or, and not, or not. When selecting an operator, an initial term is created to its right. The add a condition button will always be at the bottom of the list of conditions. To remove one specific condition (i.e., a term), you can use the pull-down menu of the term and select remove condition.
The grouping of the terms is as follows: the first two terms are combined together; then this combined term is combined with the third, and so on. That is, if the terms that appear from top to bottom are written down from left to right, the operations are done from left to right. No other grouping is available.
2.4 Repeat Operators
Some database attributes can contain a list of possible values. For example, the attribute APPEARS-IN-BINDING-REACTIONS of class GENES has type list of Binding-Reactions. That is, this attribute has a value that is a list of objects belonging to the class Binding-Reactions. (This list may be empty depending on the database selected.)
For attributes that can have lists of values, a repeat operator (e.g., for some object ...) is provided on its left. An appropriate repeat operator should be selected before adding any condition for this attribute.
For attributes of type list of some class, the repeat operators are at least one object of, every object of, exactly one object of, for no object of, the number of objects of, for some object ..., for all objects ..., for exactly one object ..., for no objects .... These are explained in more detail in the next section. Note that a repeat operator name ending with ellipsis (...) means that, once selected, a sub-condition will open up where a specific attribute can be selected.
For attributes of type list which are not objects (e.g., string) the repeat operators are similarly named by replacing object for element except for operator with ellipsis (e.g., for some object ...) which exists only for objects, not elements. For example, the attribute SYNONYMS has the type list of string -- in this case the first repeat operator is at least one element of.
In this section we describe the repeat operators associated with objects, but the descriptions apply also to non-objects (e.g., numbers, strings) as well. All the repeat operators other than the for each object ... can be applied to any attribute of type list.
The four repeat operators at least one object of, every object of, exactly one object of, and for no object of are similar. Once selected, the term on its left is used to specify a condition to be met by a certain number of elements. For example, in the case of operator at least one object of, the number of elements satisfying the condition must be greater than 0. For every object of, this number must be equal to the number of elements in the list, that is, all of them. For exactly one object of, this number is 1. Finally, for for no object of, this number is 0.
The operator the number of objects of counts the number of objects in the list of the attribute and compares it to a value provided as the right operand. The desired relation (e.g., is equal to) should be selected. The condition will be true if the number of objects in the list of the attribute satisfies the relation.
In some cases an attribute of an object is a list of objects. For example, the attribute product of an object of the class Genes is a list of Polypeptides or RNA; if you want to search through every gene which does not have RNA as a product, you are interested in applying a condition to all object of the list of products. This is a nested search through a list of objects inside of another search (e.g., genes). This is what the following repeat operators will allow you to do.
The repeat operator for all objects ... is provided for attributes that are a list of some class -- not for a list of other types like strings or numbers. For example, the attribute REACTION-LIST of Pathways as type list of generalized-reactions -- the for all objects ... is provided for it. On the other hand, the attribute NAMES of class GENES has type list of string so the operator for all objects ... is not provided for it. Note that this is not a limitation of the BioVelo language but rather a design decision to introduce some simplicity to the graphical user interface.
When for all objects ... is selected, it creates an initial conditional expression that starts with the text we have and introduces a new variable. It is essentially a where clause, where the list of objects of the attribute are iteratively bound to the new introduced variable. The conditional expression can refer to this new variable and any previous ones already active. The condition will be true if for every object of the list the condition is satisfied.
For example, enzyme) has a single value which is an object. When is an object ... is selected, a conditional expression similar to a where clause is open under the attribute. It also introduces a new variable. This variable can be used in the conditional expression to refer to the object bound to the attribute (e.g., enzyme).
Most of the time, the right operand to an operation can be freely
entered. For example, a number (e.g. 10) can be entered by simply
typing it in the box provided as a right operand to is greater
than. But there are situations where you want to compare to
the attribute of an object. In this case, the button labeled
switch to variable entry is provided. If clicked, two
selectors will replace the free entry box. One selector allows you to
select a variable, the other an attribute. The button that you just
clicked should now be labeled
Essentially, each search component allows you to search different classes in the same or different databases. Each search component does an iterative search of the objects of a class. By combining several search components, a multidimensional Cartesian search is performed. For example, if the first search component is done over proteins of E. coli K-12, and the second search component is done over genes of the same organism, the search is potentially over all combination of proteins and genes. More precisely, all the conditions of the first search component must be true before the second search component starts.
By clicking the button labeled insert a new search component here a new search component is introduced at the location of the button. A search component is visually delineated by a rectangular box around it. The order of the search components is important. You can remove a search component, but not the first one, by clicking the x icon on its right.
For efficiency, it is important to order your search components appropriately by specifying the first search component as the most restrictive. Indeed, for the example above, this is potentially a time consuming query, since E. coli K-12 has more than 4500 genes and more than 5100 proteins: the search space is potentially 4500 x 5100 (over 22 million) pairs of proteins and genes. Nevertheless, it is possible to do multidimensional search if the number of satisfied combinations is reasonable (less than 10000 say). If your search is too time consuming the server may stop processing your search with an error message to that effect.
When more than one search component is specified, variables are introduced. The first search component is always associated with the variable x1. The other search components use variables with a higher index (e.g., x3). The variables allow cross-referencing between the search components and in the output contents specification.
Variables are introduced to reference different objects in one query. For example, if two search components are specified, the first one has its main objects (the objects from the class specified in the head of the search component) associated with x1. We also say that the objects are bound to x1. The second search component will have a different variable name, say x2. If a where clause is added to it, the variable x1 can be used in it to refer to the current object from the first search component.
Some operations introduce variables in a query, such as adding search components, or using one of the operators for each object ... or is an object .... The variable names are prefixed by the letter x as in x1 or x2. These variables are automatically introduced by the interface when they are needed; you cannot change their names and there is no need to do so.
The interface takes care of adding a pull-down menu to select a variable next to an attribute pull-down menu when such a variable selection makes sense. The list of selectable variables is always complete and non redundant. That is, you can select any variable in such pull-down menus without worrying about a syntactical error in the resulting query; when no such pull-down menu is available it is not possible to reference such a variable.
The output contents will also provide pull-down menus to select a variable if more than one variable exist in your query.
When using some of the repeat operators (e.g., for every element of) internal variables will be automatically created. These are not directly visible in the interface although they can be seen in the translated BioVelo query when the submit button is clicked. You do not have to know their names but it can be instructive to see how they are used if you want to better understand BioVelo.
The result of your query will be a table. The number of rows of this table is the number of objects that satisfied your query. The number of columns is user specified in the contents of the output section of the web page. Initially, this section has only one column specified with the attribute NAME preselected, which means the unique PGDB identifier of each object. The button add a column will add a column to this section. It specifies an additional column in the result. In the resulting table, this column will contain the value of the selected attribute. You can select an attribute of your choice other than NAME.
If two or more search components exist in your query specification, a variable selector is present in each column. You should select the desired variable and its attribute.
You can remove a column by clicking its x icon. All columns to its right are moved to the left and a renumbering of the columns is done. That is, if you have four columns and you delete the second column, the third column becomes the second one and the fourth becomes the third one with their headers renumbered correspondingly.
This subsection applies to both the Structured Advanced Query Page and the Free Form Advanced Query Page.
There are two possible output formats: HTML and Tab Delimited Text. The desired output format of your query can be selected by clicking the radio button next to HTML or Tab Delimited Text. When you load the initial Advanced Query web page the default is the HTML format.
The HTML format provides links to the Pathway Tools display page for each object found. This is the format preferred by most users.
The Tab Delimited Text format creates a text formatted table whose columns are separated by the tab character. The web page returned has a MIME type of text/plain and can be saved as a parsable text file that can be imported into a spreadsheet for offline processing.
Query submission is always performed by clicking the Submit Query button at the bottom of the page. If some simple error is detected (e.g., an input box is empty) an error box will be displayed. You should correct this error and retry. The result of the query will be displayed in a new Web page. When this page opens up, it is blank, and depending on the complexity of your query, it may take some time before the results are shown. If the server detects an error in your query (e.g., a syntactical error, type errors) it will send back an error message.
This form allows you to enter more advanced queries than the Structured Advanced Query Page because the full BioVelo query language is accessible from the Free Form page. The Free Form page is more complex to use since it requires knowledge of the syntax of the BioVelo language. Please consult BioVelo Documentation for the syntax and semantics of the BioVelo language.
The Free Form page can be reached by clicking the button labeled Switch to the Free Form Advanced Query Page near the top of the initial page. (If this button says Switch to the Structured Advanced Query Page, you are already on the Free Form Advanced Query Page.)
In this form, the query must be entered in the text box area on the left -- this is the query text box area. The text box on the right contains a list of query examples.
A query can be any BioVelo expression. Such expressions have more power than the Structured Advanced Query Page, where only tables can be returned. In most cases you will formulate a query that starts with [ and ends with ] -- such queries return results in a table. But you can also write a query in the form of a tuple (i.e., of the form (..., ...)), or even a query that will return a single numerical value as in #dbs.
For a table resulting from a query of the form [...], the head of the query, that is what comes before the colon :, is either an expression that is not a tuple or a tuple of two or more expressions. In the former case, this will return a table of one column; in the latter case this will return a table of as many columns as there are expressions in the tuple. Since the head of the query determines the number of columns, the Free Form Advanced Query Page does not provide an output content section as in the Structured Advanced Query Page.
A query that is a tuple will return as many results as there are expressions in the tuple. It can thus return several tables.
Under these two text areas is a row of selectable options. These are not used for formulating the query, but are provided as reference documentation. Selectors are provided for the available database names, class names, attribute names, and operators of the BioVelo language. These can be primarily used as a reference for the right spelling of database names (note: database names are in parentheses, they do not have spaces in them, next to the species name which are usually longer and may have spaces in them), class names, attribute names and operators. For some browsers, a small yellow box (i.e., a tooltip) appears when you hover the mouse pointer over the attribute and operator options of these selectors. The tooltips work for Mozilla/Firefox 1.5 and the Safari 2.0.3 browser -- but do not work for IE 6 and 7.
Selecting a class name will change the list of attributes. B
Under these selectors you can specify the output format to either HTML (the default) or Tab Delimited Text. Consult the Subsection 2.12 of the Structured Advanced Query Page for more information on these output formats.
Click the Submit Button at the bottom of the page to submit your query. A new browser window will open containing the query result, or with a an error report in case of an error found in the query. You can edit or reenter a completely different new query in the query form and submit again. A new web page result will appear, allowing you to compare different results from different queries.
You should be able to cut and paste a query, or any parts of it, into the query text box area. You could store your queries in a separate document on your computer and copy them back in the query text box area.
For the complete BioVelo query language syntax and semantics please consult the BioVelo Documentation.