W3C

– DRAFT –
DPVCG Meeting Call

17 02 2021

Attendees

Present
marklizar
Regrets
-
Chair
harsh
Scribe
harsh

Meeting minutes

Open Issues/Actions on W3C tracker

see issue/tracker https://www.w3.org/community/dpvcg/track/issues/open

Presentation by Rana

On: Building a Corpus of Physical Health Data Disclosure on Twitter during the Covid-19 Pandemic

rana: users share personal data in social media networks (OSN) which has implications regarding privacy threats

rana: NLP can help with detecting such threats, more specifically: discrimination in job searchers, harassment, bullying, identity theft, misuse of health information

rana: detection of persona ldata or PII is a (current and topical) challenge

rana: PHDD: A Corpus of Physical Health Data Disclosure on Twitter during COVID-19

rana: tweets wre collected using keywords, hastags, regex, and were tagged based on criteria regarding health information or subject

rana: we published corpus in RDF and JSON; for RDF we created lightweight ontology "privacy tags for health information"

rana: DPV provides broad categories for personal data and health categories (physical, mental health)

rana: used HL7 concepts regarding confidentiality and sensitivity

Health Level Seven International (HL7) is a standard for health data https://www.hl7.org/

rana: descriptive blog with example at https://ranasaniei.now.sh/posts/corpus

rana: future work includes use of supervised ML techniques for detection of health sensitive information, to notify users if shared content are sensitive, implement fine-grained access control mechanism

Q&A / discussion

beatriz: why did you not use the HL7 concepts to tag the tweets?

rana: for this work, we focused on the data subject; whereas the concepts from HL7 regarding sensitivity or confidentiality are relevant for information providers

beatriz: how do you measure confidentiality?

rana: based on contents or related concept in sensitive data categories e.g. special categories in GDPR

paul: why did you break up the text into nouns, pronouns, etc.?

rana: this information is relevant and useful for NLP tasks

harsh: regarding HL7, is this the same as FHIR, or are they tw o separate things?

rana: we used HL7 v4 privacy and security ontology

harsh: do you think we should take up some concepts from HL7 related to privacy/sensitivity?

rana: HL7 is more focused on healthcare, whereas DPV is more generic

terms proposed by Beatriz and Rana

See https://lists.w3.org/Archives/Public/public-dpvcg/2021Feb/0005.html

beatriz: Rana and myself have been analysing fitness apps policies, and we extracted terms missing in DPV

There are a lot of fine-grained concepts which might not be suitable to DPV

Date of birth is relevant to age

Some others are not immediately relevant e.g. water intake as they can be broken down into more related concepts e.g. number of glasses

We seem to have a gap in concepts regarding Health, Medical Health, physical health, etc. and the concepts proposed

We need to figure out the proper structure for this in terms of hierarchy

next meeting

We will move to having a meeting next week 24-FEB 13:00 WET / 14:00 CET

Tentative topic for the agenda is continued discussion on proposed topics

RDF Vocab tool by Nishad

nishad: https://github.com/zazuko/rdf-vocabularies tool for working with ontologies as datasets / prefixes

nishad: I have added DPV to the tool, so all terms from DPV show up within the term/vocab finder

nishad: so when we update vocabulary, the urls must stay same for the updates

Minutes manually created (not a transcript), formatted by scribe.perl version 127 (Wed Dec 30 17:39:58 2020 UTC).

Diagnostics

Maybe present: beatriz, harsh, nishad, On, paul, rana