Modern computers are largely opaque, capable of processing data but providing little visibility into the activities that lead to a particular result. This vision for computing has grown increasingly untenable in regards to cyber security. Opaque computing limits insight into the nature of cyber threats and makes it harder to detect and defend against attack. These problems also appear in cloud computing environments, where opaque computing impairs our ability to assess the value of migratory data and outsourced computing tasks.
Data provenance stands in opposition to data opacity, facilitating transparent computing. Provenance describes the actions taken on a data object from its creation up to the present. Provenance can be used to answer a variety of historical questions about the data it describes, such as "What processes and datasets were used to generate this data?" and "In what environment was this data produced?" Provenance-aware systems automatically gather and report this metadata, providing insight into the history of each object on the system.
The security needs of provenance-aware systems are not yet fully understood. This site catalogs multiple efforts to address critical cyber security challenges through designing systems that facilitate the reliable capture and management of data provenance. By introducing mechanisms that record high integrity provenance we hope to accomplish such goals as mitigating the insider threat, detecting system intrusions, and assuring the integrity and confidentiality of data.
This work was made possible through collaboration with security researchers at the following institutions. Please see the project pages for authorship details.
To encourage further experimentation with data provenance, we are providing source code for each of our projects. Links to source can be found below. More project descriptions and source code releases will become available at this site as we continue to publish our work.