Meeting minutes
Open Issues/Actions on W3C tracker
see issue/tracker https://
Presentation by Rana
On: Building a Corpus of Physical Health Data Disclosure on Twitter during the Covid-19 Pandemic
rana: users share personal data in social media networks (OSN) which has implications regarding privacy threats
rana: NLP can help with detecting such threats, more specifically: discrimination in job searchers, harassment, bullying, identity theft, misuse of health information
rana: detection of persona ldata or PII is a (current and topical) challenge
rana: PHDD: A Corpus of Physical Health Data Disclosure on Twitter during COVID-19
rana: tweets wre collected using keywords, hastags, regex, and were tagged based on criteria regarding health information or subject
rana: we published corpus in RDF and JSON; for RDF we created lightweight ontology "privacy tags for health information"
rana: DPV provides broad categories for personal data and health categories (physical, mental health)
rana: used HL7 concepts regarding confidentiality and sensitivity
Health Level Seven International (HL7) is a standard for health data https://
rana: descriptive blog with example at https://
rana: future work includes use of supervised ML techniques for detection of health sensitive information, to notify users if shared content are sensitive, implement fine-grained access control mechanism
Q&A / discussion
beatriz: why did you not use the HL7 concepts to tag the tweets?
rana: for this work, we focused on the data subject; whereas the concepts from HL7 regarding sensitivity or confidentiality are relevant for information providers
beatriz: how do you measure confidentiality?
rana: based on contents or related concept in sensitive data categories e.g. special categories in GDPR
paul: why did you break up the text into nouns, pronouns, etc.?
rana: this information is relevant and useful for NLP tasks
harsh: regarding HL7, is this the same as FHIR, or are they tw o separate things?
rana: we used HL7 v4 privacy and security ontology
harsh: do you think we should take up some concepts from HL7 related to privacy/sensitivity?
rana: HL7 is more focused on healthcare, whereas DPV is more generic
terms proposed by Beatriz and Rana
See https://
beatriz: Rana and myself have been analysing fitness apps policies, and we extracted terms missing in DPV
There are a lot of fine-grained concepts which might not be suitable to DPV
Date of birth is relevant to age
Some others are not immediately relevant e.g. water intake as they can be broken down into more related concepts e.g. number of glasses
We seem to have a gap in concepts regarding Health, Medical Health, physical health, etc. and the concepts proposed
We need to figure out the proper structure for this in terms of hierarchy
next meeting
We will move to having a meeting next week 24-FEB 13:00 WET / 14:00 CET
Tentative topic for the agenda is continued discussion on proposed topics
RDF Vocab tool by Nishad
nishad: https://
nishad: I have added DPV to the tool, so all terms from DPV show up within the term/vocab finder
nishad: so when we update vocabulary, the urls must stay same for the updates