Workshop on Provenance in Scientific Simulations
Fall 2010
Organiser: Matthias Troyer
Record keeping has always been an essential component of science and engineering, but it has become even more so recently. As computers get faster, we perform increasingly complex computations—and as storage gets cheaper, we accumulate larger volumes of data. The complete process, from data acquisition through analysis, is inherently exploratory: users experiment with different simulation models, parameters, and data mining and visualization techniques. But when they find an interesting result, it can be hard to remember which of the many trial-and-error paths produced a particular result without a detailed record. For complex computations that manipulate a lot of data, the traditional laboratory notebook or other manual approaches to maintaining this information just aren’t feasible.
Ad-hoc approaches to the construction of computational tasks have been widely used, but have serious limitations. In particular, scientists and engineers need to expend substantial effort managing data (e.g., scripts that encode the tasks, raw data, data products, and notes) and recording provenance information so that basic questions can be answered, such as: Who created this data product and when? When was it modified and by whom? What was the process used to create the data product? Were two data products derived from the same raw data? Not only is the process time-consuming, but also error-prone.
Systematic mechanisms for capturing this information are at the heart of a new field of research called computational provenance. Most dictionaries define provenance as an object’s source or origin—a record of an item’s ultimate derivation and passage through various owners. Ultimately, provenance helps determine an object’s value, accuracy, and authorship. But in addition to enabling result reproducibility, provenance for computational tasks and the data they manipulate and derive has other benefits as well. In particular, it helps users interpret and understand results—in some cases, it can be more important than the actual results themselves.
In this workshop we want to bring together computer scientists in the field of provenance research, developers of provenance-enabled systems, and application scientists performing large-scale computer simulations, to facilitate an exchange of ideas. The goal is to make application scientists aware of current results in provenance research, and to make computer scientists developing provenance systems aware of the needs of large-scale applications. The workshop will feature talks, open discussion sessions, and presentations of provenance-enabled systems.