<< Chapter < Page Chapter >> Page >

In bioinformatics, many of the major service providers are providing Web Service interfaces to their resources, including the NCBI, EBI, and DDBJ; many more are embracing this technology each year. This widespread adoption of Web Services has enabled workflows to be more commonly used within scientific research. Data held in the NCBI can now be analysed with tools available at the EBI, within analysis pipeline.

In silico workflows

One possible solution to the problem of integrating heterogeneous resources is the use of in silico workflows. The use of workflows in science has only emerged over the last few years and addresses different concerns to workflows used within the business sector. Rather than co-ordinating the management and transactions between corporate resources, scientific workflows are used to automate the analysis of data through multiple, distributed data resources in order to execute complex in silico experiments.

Workflows provide a mechanism for accessing remote third-party services and components. This in turn reduces the overheads of downloading, installing, and maintaining resources locally whilst ensuring access to the latest versions of data and tools. Additionally, much of the computation happens remotely (on dedicated servers). This allows complex and computationally intensive workflows to be executed from basic desktop or laptop computers. As a result, the researchers are not held back by a lack of computational resources or access to data.

A workflow provides an abstracted view over the experiment being performed. It describes what analyses will be executed, not the low-level details of how they will be executed; the user does not need to understand the underlying code, but only the scientific protocol. This protocol can be easily understood by others, so can be reused or even altered and repurposed. Workflows are a suitable technology in any case where scientists need to automate data processing through a series of analysis steps. Such mechanisms have the potential to increase the rate of data analysis, from a cottage-scale to industrial scale operation.

There are many workflow management systems available in the scientific domain, including: Taverna (Hull et al . 2006), Kepler (Altintas et al. 2004) and Triana (Taylor et al. 2003). Taverna, developed by the the myGrid consortium (http://www.mygrid.org.uk/), is a workflow system that was built with the Life Sciences in mind but it has since been used in other fields as well, including Physics, Astronomy and Chemistry. Like many others, the Taverna Workbench provides:

  • an environment for designing workflows;
  • an enactment engine to execute workflow locally or remotely;
  • support for workflow design in the form of service and workflow discovery;
  • and provenance services to manage the results and events of workflow invocations.

Understanding disease resistance in model organisms

Taverna workflows are used in many areas of Life Science research, notably for research into genotype-phenotype correlations, proteomics, genome annotation, and Systems Biology. The following case study demonstrates the use of Taverna workflows in the Life Sciences domain for genotype-phenotype studies (Stevens et al . 2008).

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Research in a connected world. OpenStax CNX. Nov 22, 2009 Download for free at http://cnx.org/content/col10677/1.12
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Research in a connected world' conversation and receive update notifications?

Ask