Submitted by: Darren Bell --------------------------------- Bio I work for the UK Data Archive which specialises in the curation and long term preservation of data relevant to social scientists; particularly survey and administrative population data. I’m the repository architect at the UKDA with a background in data theory, data modelling and full stack development. We model schema for statistical data (e.g. using the Data Documentation Initiative – DDI4) and ontologies (e.g. SKOS vocabularies) as part of defining the full data and repository lifecycle. That lifecycle is implemented on a Hadoop infrastructure with a linked data graph as the canonical data store (a labelled property graph rather than pure RDF), which enables cross-disciplinary analysis and linkage at scale. I am currently responsible for the design and delivery of the UK Smart Meter Research Portal infrastructure, a five year national project to ingest and disseminate smart meter (IoT) energy data for analysis by approved researchers. This data that will be enriched with contextual survey information from the Social Sciences and other domains, such as climate data. Organisationally we are effecting a digital transformation from a “traditional” survey based repository where the totemic artefact is the SPSS file, to a fully standards-based repository where all data is flattened into a large scale linked data graph for cross-disciplinary analysis. We work within Trusted Digital Repository and InfoSec standards like ISO27001 and CoreTrustSeal and are adopting a number of standard vocabularies like ODRL for management of licences and IPR, extending ODRL to handle machine-actionable mediation of access to data, PREMIS for preservation metadata and PROV for publishing provenance chains borne of QA and ingest operations. However, in the privacy/disclosure and data linkage arenas, most of our processes and frameworks are human-mediated, executed as clerical processes that do not readily scale, and are inimical to automation or even semi-automation. We are keen to adapt standards based approaches to these areas as well. LinkedIn: https://www.linkedin.com/in/dsaap/ --------------------------------- Your goals As a digital repository, and in the context of this working group, these are our priority issues: (1) There are no standards (to our knowledge) currently available that formally model and describe user-given consent - such that (a) usage of data by administrators/researchers and other parties can be brokered deterministically, rather than by human-mediated committees (b) the model could intersect with a typology of data linkage operations, to determine what linkage operations are actually permissible. (2) There are no machine actionable mechanisms to handle disclosure risk as an emergent property of data linkage operations. That is to say, existing classifications of disclosure risk are made in situ for single datasets, but once datasets are joined/combined/linked, there do not appear to be any empirical approaches to interpreting the disclosure risk of the resultant outputs, informed by a typology of linkage operations. Techniques like differential privacy use a different paradigm, where the data is partially obfuscated but provides analytical consistency between the original and the synthetic data. However, this is not an approach that is acceptable to many Social Science Researchers and introduces issues around reproducibility/repeatability. In this workshop, we hope to: (1) gain a better understanding of the current landscape around W3C privacy modelling initiatives by networking with people with similar domain interests. (2) To assess the current common understanding of often fuzzily differentiated terms like Access, Privacy, Rights and Linkage, and to reach an agreed approach to clarify them more formally (3) To assess how the UK Data Service can make a constructive and substantive contribution to the development of vocabularies specifically for consent handling and also for data linkage operations (whether performed locally or executed remotely), informed by a real national-scale project that we are currently delivering. --------------------------------- Workshop Goals (1) A number of problem statements identifying where existing vocabularies do not adequately describe privacy and consent, and what the benefits of developing those vocabularies will be, both at an operational level (e.g. providing machine-actionable and interoperable descriptions) and at a macro policy level e.g. addressing public concerns over how privacy can be guaranteed in an age of petabyte scale data linkage. (2) Agreement on baseline definitions for terms such as Privacy, Linkage, Consent, Access, Disclosure among others, to inform further work on specific vocabularies (3) A first-cut roadmap for prioritisation of delivery of relevant vocabularies and how they might align or intersect with existing vocabularies like ODRL. --------------------------------- Your interests Please select the rank-order (1 to 10) for the options you think are acceptable (i.e. you can live with it), where 1 is the most preferred, 2 the next best and so on... * Vocabularies to model privacy policies, regulations, and involved (business) processes: [ Ranked 2 ] * Identity management vocabularies: [ Ranked 4 ] * Modeling personal data usage, processing, sharing, and tracking: [ Don’t mind ] * Interlinking aspects of privacy and provenance: [ Ranked 3 ] * Modeling consent and making it transportable: [ Ranked 1 ] * New ways to put the user in control benefiting from semantic interoperability of policy information: [ Ranked 5 ] * Modeling permissions, obligations, and their scope: [ Don’t mind ] * Reasoning about formally declared privacy policies: [ Ranked 6 ] * Exploring links and synergies using Linked Data vocabularies in the context of related efforts: [ Don’t mind ] * Visualizations of data and policy information to help data self determination: [ Don’t mind ] We are interested in all of these topics but the killer issue for us is modelling consents (and the interoperability of that consent information.) --------------------------------- Other Thoughts