Using GoFish v1.0

GoFish allows the user to select a subset of Gene Ontology (GO) attributes, and ranks genes according to the probability of having all those attributes.

GoFish should be largely self-explanatory, but below is a brief description of how to use it.

Note: Leaving the GoFish page in your browser causes the GoFish applet to quit, discarding all current results. Therefore, if you intend to browse the Web while using GoFish it is a good idea to open GoFish in its own browser window.

  1. To run GoFish first choose an organism from the Organism menu. (Currently GoFish v1.0 supports three organisms: S. cerevisiae, D. melanogaster, and M. musculus; GoFish v1.11alpha supports C. elegans as well. )

  2. Next, select a set of GO attributes of interest. To select an attribute, you can either click on it in the Gene Ontology Browser panel (upper left corner of the GoFish window), or click on it among the Search results area in the Search GO attributes panel (bottom center of the GoFish window). Your selection is added to the list in the Selected GO attributes area (top center of the GoFish window).

    The GO attribute viewer (bottom left corner of the GoFish window) shows a detail of all ancestors and descendants of the last selected GO attribute.

  3. Once you have chosen a subset of attributes of interest, click on the GoFish! button. A ranked list of genes will appear in the top right panel. The value in the column labeled Score can be between 0 to 1, and corresponds to the probability that the gene on that row holds all the selected GO attributes. The integer in the column labeled E is the number of GO attributes (among the GO attributes selected by the user) for which evidence in the scientific literature exists that that row's gene has that attribute. This is the column by which the list is originally sorted, in descending order. To sort by the contents of a different column, click on the column's header. An up-arrow or down-arrow appears in the header depending on whether the sorting order is ascending or descending. Clicking repeatedly on the same header toggles the sort order.

  4. Upon clicking on a given gene in this list, more detailed information on this gene will appear in the Gene viewer area (bottom right corner of the GoFish window), including which of the selected attributes that gene holds.

  5. To save the list of ranked genes, select Save from the File menu. The dialog that comes up gives you the option of saving the entire ranked list, or only a range of genes in that list. For example, choosing the range 1 to 100 saves the first 100 genes in the list, as currently ranked. The saved list consists of records of tab-separated fields, so it can be viewed using a spreadsheet program such as Excel.

  6. The Selected GO attributes list can be modified by adding or removing elements. An attribute can be removed from the Selected GO attributes by clicking on it and then clicking on the Remove button. They can also be removed by clicking to corresponding node in the Gene Ontology browser or in the Search results area in the Search GO attributes panel. The entire list can be cleared by selecting Reset from the File menu.

    In fact, choosing Reset returns GoFish to its original state (except that the originally chosen organism remains in effect).

  7. To quit GoFish simply point your browser to a different URL.


Using GoFish v1.11alpha

Version 1.11alpha of GoFish differs from version 1.0 mainly in that it supports arbitrary Boolean queries, instead of only "AND" queries. These are described more fully below.

Other minor differences to note:

  1. Selected attributes get assigned letters "A," "B," "C," etc. as they are added to the Selected Attributes List. This facilitates writing down arbitrary Boolean queries involving those attributes. (See Arbitrary boolean queries below.) These letters are also used to label new fields in the table of ranked gene products, each corresponding to a different attribute in the boolean query. These fields have values 1 or 0 depending on whether there is at least one annotation in support of the gene product having the corresponding attribute. (It may be necessary to scroll horizontally to see all of these query-dependent columns).
  2. The field "Score" of ranked list of gene products has been relabeled "P-score" to indicate that it is a probability estimate.
  3. The field "E" of the ranked list of gene products in v. 1.0 has been replaced by a field labeled "QS". This field lists the Query Satisfaction score, which is described below.
  4. The "Description" field of the ranked list of gene products has been eliminated, but this information can still be viewed in the Gene Viewer, by selecting the row corresponding to the gene product of interest.
  5. The currently selected organism is indicated in the Organism menu.

Arbitrary boolean queries

The main difference between v1.1 and v1.0 is that the newer version allows users to query about the likelihood of gene products' satisfying an arbitrary boolean expression. (In contrast, v1.0 effectively allowed users to enter boolean queries containing (implicitly) only the & [AND] operator.)

In v1.11alpha, when one selects attributes, either from the Gene Ontology Browser panel or from Search results panel, as before, the attribute is added to the list in the Selected GO attributes area, but now the added attribute is preceded by an upper case letter label. In addition, a new clause, represented by the label for the newly added attribute, is added to a boolean query in the field immediately below the Selected GO attributes list. If the attribute is the first attribute chosen, the letter symbol is added by itself; otherwise it is preceded by an ampersand (&). For example, if one first select the attribute "kinase" followed by the attribute "membrane", the Selected GO attributes list will show:

A. kinase
B. membrane
and the query immediately below will read
A & B
This query string is the default query string for the two selected attributes. If at this point one clicks on the GoFish! button, the current organism's gene products will be ranked by the probability of having both attributes A (kinase) and B (membrane), and shown in the table on the Ranked gene products area on the right. This is exactly the same outcome that one would obtain with GoFish v1.0 if one selected the same organism and the same two attributes. But with v1.11alpha, one can edit the query line to produce a query that combines the selected attributes differently. For example, if one wanted to rank gene products by the probability of having either attribute A or attribute B, we would replace the & in the query string with a |, i.e.
A | B
To represent negation, one uses an exclamation mark (!). For example, if one wanted to rank gene products by the probability of having attribute A and not having attribute B, one would replace the query string by
A & ! B

Now, suppose one selects an additional attribute, "intracellular signaling cascade", which gets assigned the letter C, and suppose we edit the query string so that it reads:

A | B & C
Then this query results in ranking the gene products according to the probability of either having attribute A or having both attributes B and C. In other words, in this case gene products are ranked according to the chances of their being either a kinase, or a membrane protein involved in an intracellular signaling cascade.

But suppose that what we really wanted was to rank gene products according to their chances of coding either for a protein involved in an intracellular signaling cascade and that is either kinase or a membrane protein (or both). For this we would need to edit the query to read:

(A | B) & C
In words
(kinase OR membrane) AND intracellular signaling cascade
In other words, parentheses are required to specify which of the two boolean operators to apply first. When parentheses are left out, by default, the & operator is applied before the | operator, as in the previous example. In general, it is easiest to use parentheses to specify one's intentions, instead of relying on the default precedence of the boolean operators.

Since arbitrary boolean queries are now possible, in the table showing the ranked gene product, there is now a column labeled "QS" (for "Query Satisfaction"). (This column replaces the "Evidence" column in GoFish v1.0.) The score in this column ranges from 0 to 1, with 1 being, in a sense, best (the query is fully satisfied). The meaning of this score is best explained with an example. Consider the same three attributes in the previous example, and suppose that the query is

(A | B) & ! C
Or, in words,
(kinase OR membrane) AND NOT intracellular signaling cascade
Suppose that upon ranking the gene products according to this query, one gene product gets a QS score of 0.67 = 2/3. This says (roughly) that for this gene product 2 out of the 3 attributes satisfy the query, and 1 doesn't. More precisely, it says that if the available evidence (now regarded as a binary classification: the gene product either has or doesn't have the attribute) for 1 out of the 3 attributes were the opposite of what it is, this gene product would satisfy the query. If instead a gene product has a QS score of 1 for the query above, it means that there is evidence in the literature for that gene product being either a kinase or a membrane protein, and that, in addition, that there is no evidence that it is involved in intracellular signaling cascade. In contrast, a QS of 0 means that the evidence contradicts the query for all attributes; the evidence would have to be exactly the opposite of what one finds in the literature for the gene product in question to satisfy the query.

Removing attributes from the Selected Attributes list

As with GoFish v1.0 we can remove individual elements from the selected attributes list by first clicking on the attributes we want to remove, and then clicking on the Remove button. In GoFish v1.11alpha however, this action has two additional effects. First, if necessary, the remaining attributes in the list get relabeled so that there are no gaps in the sequence of label (A, B, C, etc.). Second, the query string gets reset to its default query string for the current number of selected attributes. For example, suppose 3 attributes are selected, labeled A, B, and C, and that the query string is

A & ! (B & C)
then if we remove the attribute labeled B, the label for the attribute originally labeled C will change to B; the label for the A-attribute remains unchanged; and the query string becomes
A & B