Benjamin Heitmann is a post-doctoral researcher at the Fraunhofer
Institute for Applied Information Technology FIT and at the
Information Systems chair (Informatik 5) at RWTH Aachen University.

His research interests include knowledge modelling with graph data, as
well as balancing the requirements of data science and big data with
the need to respect confidentiality and privacy of the end-users. In
order to achieve this, he is working on transferring cryptographic
research to state of the art algorithms and industry problems in the
domain of data science.

He is a member of the program committee for the SEMANTiCS conference
and the ESWC Semantic Web Conference. He holds a Ph.D. from the
National University of Ireland, and an M.Sc. in computer science from
the Karlsruhe Institute of Technology.


One of the main impediments for uptake of data markets on which
anonymised data is sold and bought, is the lack of standardisation for
meta-data to describe the details of the anonymisation and the
remaining utility of the data.


In order to enable future data market places, protecting the privacy
of individuals contained in the data will be a necessary requirement,
for reasons of market forces, legal compliance and ethics.

Such data market places will allow individuals to monetise the data
which they have collected about themselves, in addition to enabling
ecosystems of services to aggregate, analyse and refine such data.
This in turn has the potential to flip the “free by default” business
model of the current Web on its head, by paying end-users for their
data.


However, in order to enable such data markets, both sellers and buyers
need to be incentivised. On the one hand, end-users need to be
incentivised to sell their data, by providing them the means to
protect their privacy with e.g. anonymisation. On the other hand,
buyers need to be incentivised to buy data directly from users or
indirectly from value-adding services by providing them the means to
assess the value and utility of the data before buying it.


Most anonymisation approaches remove details of the data or add noise
to it. So if the data has been anonymised, the buyer needs to be able
to assess how much utility is remaining in the data.


As an example, consider a table about the electricity consumption of
city dwellers.

First, all personally identifiable items (PIIs) such as the name have
to be removed.

However, all other aspects of the anonymisation depend on the approach
and the parameterisation.

One way to continue the anonymisation process, is to generalise all
attributes related to the location, but to keep age, gender and
electricity consumption metrics almost unmodified. This would maintain
the utility of the table for social scientists who are interested in
correlations between age, gender and electricity consumption. However,
the table would be mostly useless for city planners who want to know
in which city parts future electricity demand is most likely to grow,
and which require precise locations in the data. The city planners
would require a different anonymisation of the attributes.


This example shows, that different ways to classify the risk of
attributes, and the resulting modification of the data, is important
to include in the meta-data, as it is required by the buyer to assess
the value of the data.


In addition, a buyer would also need to know the remaining utility
after the anonymisation, which can be quantified e.g. by specifying
the total loss of information in comparison with the original data.


The seller can facilitate this by attaching meta-data about the
following aspects to the anonymised data:

(1) data model or format and attributes/properties of the data.

(2) anonymisation approach applied to data.

(3) parameterisation of the algorithm.

(4) risk classification of attributes in the context of the
anonymisation approach.

(5) a measure of utility loss.


While it is feasible to standardise a vocabulary to describe the
listed details for tabular data, the same would need to be done for
other data formats and for other anonymisation approaches. To
illustrate this, consider movement data which has been anonymised with
regards to geospatial resolution of the coordinates. This could be
e.g. data captured by self-driving, autonomous cars or by athletes
engaging in sports activities like running or cycling. As a third
example, consider anonymisation of homogeneous graph data, such as
social networking data, or heterogeneous graph data, such as Linked
Data. Again, the details of the anonymisation are different, and would
require new properties to be standardised.


While there are some aspects related to the anonymisation approach
which stay similar between data models and formats, and which are
common to all anonymisation approaches, many unique details remain.

Therefore the challenge of standardising a vocabulary to describe the
aspects of anonymised data, requires an active exchange between data
anonymisation experts and data buyers in prospective future data
markets.


However, as an added benefit, standardising the meta-data for such
data markets, also allows

the possibility of buying/selling encrypted data within future data
market places. The data payload would be unavailable for a buyer until
certain conditions are met, such as payment via blockchain-based
cryptocurrency, or fulfilment of a smart contract in a
computation-enabled blockchain.

In addition, new cryptographic approaches are emerging which could
allow some forms of processing of encrypted data without decrypting
it. Such approaches include homomorphic encryption and functional
encryption. However, in order to enable such approaches on future data
market places, meta data is required which describes how to gain
access to the data payload, i.e. how to gain access to required keys
and how to parameterise the respective cryptographic approach.


Related standardisation activities:

ISO/TC 307 — Blockchain and distributed ledger technologies:

https://www.iso.org/committee/6266604.html

An open consortium of industry, government and academia to standardize
homomorphic encryption:

http://homomorphicencryption.org/


--

Benjamin Heitmann (Ph.D.), Post-Doctoral Researcher
Fraunhofer Institute for Applied Information Technology FIT
RWTH Aachen University
Ahornstr. 55, 52062 Aachen, Germany
Room 6012 (Building E2), Tel.: +49 241 8021528

http://dbis.rwth-aachen.de/cms/staff/heitmann
https://www.researchgate.net/profile/Benjamin_Heitmann
https://de.linkedin.com/in/benjaminheitmann