Re: Monolithic vs Modular Taxonomy/Ontology

Hi everyone,

as a theoretician, I would prefer the "modular" approach, that is, 
different namespaces for each taxonomy (which is cleaner).

However, if a single taxonomy is preferred, then Harsh's proposal sounds 
good, as it resembles the solution with different namespaces/prefixes. 
Option 1) is my favorite.

Best regards
Piero

On 02/11/21 20:16, Harshvardhan J. Pandit wrote:
> Hello. We have a pending issue which is causing confusion to adopters 
> where terms defined as PersonalDataCategory (e.g. Location or Country) 
> are used in ways that are not compatible with stating they are personal 
> data.
> 
> For example, specifying `StorageData hasLocation Location` which makes 
> the location personal data instead of a generic location. This can be 
> fixed by using StorageLocation, but the issue stands for several other 
> concepts. We've renamed there as and where they arose but as more 
> concepts are added, this clashing and confusion is increasing as well.
> 
> This is an important issue for the application of DPV, and a cause of 
> common confusion and mistakes. I appreciate any thoughts/suggestions as 
> you may have, but we need to make a decision by December if DPV is to 
> progress towards showing applications.
> 
> In recent meeting calls [1] we decided to go with option #3 to resolve 
> personal data categories clashing with other concepts (see July email 
> [2]). This solution consists of prefixing/suffixing the category IRIs 
> with something akin "pd_" so that "Location" becomes "pd_Location" which 
> differentiates it from storage / company location.
> 
> The discussion consisted of going over the alternatives, and choosing 
> this option since present members preferred having the data categories 
> within DPV rather than as a separate taxonomy (preferring single 
> monolithic vocabulary provision). The argument for this was that DPV can 
> provide commonly used/needed categories, and thus the 'single 
> vocabulary' packaging is attractive for adopters.
> 
> Some stylistic options discussed:
> 1) pd_Location
> 2) Location_pd
> 3) LocationPD
> 4) PDLocation
> 5) PD_Location
> 6) LocationData (from GitHub Issue #27 [3])
> 7) LocationPersonalData
> 
> Personally, I still prefer providing personal data categories as a 
> separate taxonomy that can grow on its own, but if the group consensus 
> is to choose from this list, I'd pick #1 and #4 (in that order).
> 
> Again, suggestions for how to resolve this are needed, are welcome, and 
> appreciated.
> 
> [1] https://www.w3.org/2021/10/13-dpvcg-minutes.html
> [2] https://lists.w3.org/Archives/Public/public-dpvcg/2021Jul/0006.html
> [3] https://github.com/w3c/dpv/issues/27
> 
> Regards,
> Harsh
> 
> 
> On 29/07/2021 11:18, Harshvardhan J. Pandit wrote:
>> Hello.
>> As DPV continues to grow, we're reaching a stage where there is a 
>> noticeable impact in terms of personal data categories and other 
>> concepts. The approach to rename personal data category or other 
>> non-data concepts can only work so many times, and will eventually 
>> cause confusion in adopters. What are potential solutions for this?
>>
>> For example, (i) Certification as Personal Data and (ii) Certification 
>> as Organisational Measure. We resolved this by renaming (i) to 
>> ProfessionalCertification. This measure cannot always be used.
>>
>> Another example, (i) Location as personal data category, and (ii) 
>> Location for indicating personal data storage. We resolved this by 
>> avoiding (i) in not providing hasStorage property with any range and 
>> defining StorageLocation.
>>
>> Problem if we define both concepts using same label or IRI: Any time 
>> someone wants to specify a Certification or Location for data storage 
>> or transfer, it is defined as personal data as well, or the label 
>> causes confusion and they use the wrong concept. Not a good design IMHO.
>>
>> Solutions:
>> 1) Keep only the 'top-tier' personal data taxonomy in DPV and move 
>> others outside into a dpv-personal-data extension. This is my 
>> preferred approach because it keeps concepts in other modules (E.g. 
>> technical measures) with the commonly used words without overlap with 
>> personal data. AFAIK the issue only exists with overlap between 
>> personal data categories and other concepts.
>>
>> 2) Keep only the 'top-tier' concepts for all modules and move other 
>> concepts outside into specific taxonomies. Not my preferred option 
>> because it means adopters need to import a lot of vocabularies to get 
>> commonly used concepts e.g. technical measures.
>>
>> 3) Keep concepts as they are, with same label for multiple concepts in 
>> different modules, but different IRI. E.g. pd_Location for personal 
>> data categories and Location for the generic concept.
> 
> 

Received on Wednesday, 3 November 2021 06:56:37 UTC