<< Chapter < Page Chapter >> Page >

Bayesian networks

The well-studied statistical tool, Bayesian networks (Friedman et al., 2000; Pearl, 1988), represent the dependence structure between multiple interacting quantities (e.g., expression levels of different genes). Bayesian networks are a promising tool for analyzing gene expression patterns. First, they are particularly useful for describing processes composed of locally interacting components; that is, the value of each component directly depends on the values of a relatively small number of components. Second, statistical foundations for learning Bayesian networks from observations, and computational algorithms to do so, are well understood and have been used successfully in many applications. Finally, Bayesian networks provide models of causal influence: Although Bayesian networks are mathematically defined strictly in terms of probabilities and conditional independence statements, a connection can be made between this characterization and the notion of direct causal influence. (Heckerman et al., 1999; Pearl and Verma, 1991; Spirtes et al., 1993). Although this connection depends on several assumptions that do not necessarily hold in gene expression data, the conclusions of Bayesian network analysis might be indicative of some causal connections in the data.

A Bayesian network (also known as causal probabilistic networks) is an annotated directed acyclic graph that encodes a joint probability distribution of a set of random variables X. Formally, a Bayesian network for X is a pair B=(G,Q) . The first component, G , is a directed acyclic graph (DAG) whose vertices correspond to the random variables x1, . . . , xn , and whose edges represent direct dependencies between the variables. The graph G encodes the following set of independence statements: each variable xi is independent of its nondescendants given its parents G. The second component of the pair, namely Q , represents the set of parameters that quantifies the network and describes a conditional distribution for each variable, given its parents in G. Together, these two components specify a unique distribution on x1, . . . , xn . The graph G represents conditional independence assumptions that allow the joint distribution to be decomposed, economizing on the number of parameters. The graph G encodes the Markov Assumption:(Each variable Xi is independent of its nondescendants, given its parents in G. Given a Bayesian network, we might want to answer many types of questions that involve the joint probability (e.g., what is the probability of X = x given observation of some of the other variables?) or independencies in the domain (e.g., are X and Y independent once we observe Z?). The literature contains a suite of algorithms that can answer such queries efficiently by exploiting the explicit representation of structure (Jensen, 1996; Pearl, 1988).

Biological example

Let apply the approach to the data of Spellman,(Spellman et al., 1998). This data set contains 76 gene expression measurements of the mRNA levels of 6177 S. cerevisiae ORFs. These experiments measure six time series under different cell cycle synchronization methods. Spellman et al., (1998) identified 800 genes whose expression varied over the different cell-cycle stages. In learning from this data, one treat each measurement as an independent sample from a distribution and do not take into account the temporal aspect of the measurement. Since it is clear that the cell cycle process is of a temporal nature, compensatation is done by introducing an additional variable denoting the cell cycle phase. This variable is forced to be a root in all the networks learned. Its presence allows one to model dependency of expression levels on the current cell cycle phase.3Two experiments were performed, one with the discrete multinomial distribution, the other with the linear Gaussian distribution. The learned features show that we can recover intricate structure even from such small data sets. It is important to note that a learning algorithm uses no prior biological knowledge nor constraints. All learned networks and relations are based solely on the information conveyed in the measurements themselves. These results are available at the following web page: (External Link) . The Figure2. illustrates the graphical display of some results from this analysis.

Svs1 gene interaction network

The graph shows a local Bayesian network for the gene SVS1. The width (and color) of edges corresponds to the computed con. dence level. An edge is directed if there is asuf. ciently high con. dence in the order between the genes connected by the edge. This local map shows that CLN2 separates SVS1 from several other genes. Although there is a strong connection between CLN2 to all these genes, there are no other edges connecting them. This indicates that, with high con. dence, these genes are conditionally independent given the expression level of CLN2.

Boolean Networks

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Introduction to bioinformatics. OpenStax CNX. Oct 09, 2007 Download for free at http://cnx.org/content/col10240/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Introduction to bioinformatics' conversation and receive update notifications?

Ask