Benjamin Heitmann is a post-doctoral researcher at the Fraunhofer Institute for Applied Information Technology FIT and at the Information Systems chair (Informatik 5) at RWTH Aachen University. His research interests include knowledge modelling with graph data, as well as balancing the requirements of data science and big data with the need to respect confidentiality and privacy of the end-users. In order to achieve this, he is working on transferring cryptographic research to state of the art algorithms and industry problems in the domain of data science. He is a member of the program committee for the SEMANTiCS conference and the ESWC Semantic Web Conference. He holds a Ph.D. from the National University of Ireland, and an M.Sc. in computer science from the Karlsruhe Institute of Technology. One of the main impediments for uptake of data markets on which anonymised data is sold and bought, is the lack of standardisation for meta-data to describe the details of the anonymisation and the remaining utility of the data. In order to enable future data market places, protecting the privacy of individuals contained in the data will be a necessary requirement, for reasons of market forces, legal compliance and ethics. Such data market places will allow individuals to monetise the data which they have collected about themselves, in addition to enabling ecosystems of services to aggregate, analyse and refine such data. This in turn has the potential to flip the “free by default” business model of the current Web on its head, by paying end-users for their data. However, in order to enable such data markets, both sellers and buyers need to be incentivised. On the one hand, end-users need to be incentivised to sell their data, by providing them the means to protect their privacy with e.g. anonymisation. On the other hand, buyers need to be incentivised to buy data directly from users or indirectly from value-adding services by providing them the means to assess the value and utility of the data before buying it. Most anonymisation approaches remove details of the data or add noise to it. So if the data has been anonymised, the buyer needs to be able to assess how much utility is remaining in the data. As an example, consider a table about the electricity consumption of city dwellers. First, all personally identifiable items (PIIs) such as the name have to be removed. However, all other aspects of the anonymisation depend on the approach and the parameterisation. One way to continue the anonymisation process, is to generalise all attributes related to the location, but to keep age, gender and electricity consumption metrics almost unmodified. This would maintain the utility of the table for social scientists who are interested in correlations between age, gender and electricity consumption. However, the table would be mostly useless for city planners who want to know in which city parts future electricity demand is most likely to grow, and which require precise locations in the data. The city planners would require a different anonymisation of the attributes. This example shows, that different ways to classify the risk of attributes, and the resulting modification of the data, is important to include in the meta-data, as it is required by the buyer to assess the value of the data. In addition, a buyer would also need to know the remaining utility after the anonymisation, which can be quantified e.g. by specifying the total loss of information in comparison with the original data. The seller can facilitate this by attaching meta-data about the following aspects to the anonymised data: (1) data model or format and attributes/properties of the data. (2) anonymisation approach applied to data. (3) parameterisation of the algorithm. (4) risk classification of attributes in the context of the anonymisation approach. (5) a measure of utility loss. While it is feasible to standardise a vocabulary to describe the listed details for tabular data, the same would need to be done for other data formats and for other anonymisation approaches. To illustrate this, consider movement data which has been anonymised with regards to geospatial resolution of the coordinates. This could be e.g. data captured by self-driving, autonomous cars or by athletes engaging in sports activities like running or cycling. As a third example, consider anonymisation of homogeneous graph data, such as social networking data, or heterogeneous graph data, such as Linked Data. Again, the details of the anonymisation are different, and would require new properties to be standardised. While there are some aspects related to the anonymisation approach which stay similar between data models and formats, and which are common to all anonymisation approaches, many unique details remain. Therefore the challenge of standardising a vocabulary to describe the aspects of anonymised data, requires an active exchange between data anonymisation experts and data buyers in prospective future data markets. However, as an added benefit, standardising the meta-data for such data markets, also allows the possibility of buying/selling encrypted data within future data market places. The data payload would be unavailable for a buyer until certain conditions are met, such as payment via blockchain-based cryptocurrency, or fulfilment of a smart contract in a computation-enabled blockchain. In addition, new cryptographic approaches are emerging which could allow some forms of processing of encrypted data without decrypting it. Such approaches include homomorphic encryption and functional encryption. However, in order to enable such approaches on future data market places, meta data is required which describes how to gain access to the data payload, i.e. how to gain access to required keys and how to parameterise the respective cryptographic approach. Related standardisation activities: ISO/TC 307 — Blockchain and distributed ledger technologies: https://www.iso.org/committee/6266604.html An open consortium of industry, government and academia to standardize homomorphic encryption: http://homomorphicencryption.org/ -- Benjamin Heitmann (Ph.D.), Post-Doctoral Researcher Fraunhofer Institute for Applied Information Technology FIT RWTH Aachen University Ahornstr. 55, 52062 Aachen, Germany Room 6012 (Building E2), Tel.: +49 241 8021528 http://dbis.rwth-aachen.de/cms/staff/heitmann https://www.researchgate.net/profile/Benjamin_Heitmann https://de.linkedin.com/in/benjaminheitmann