[dxwg] Question: how to catalog relational database data in DCAT? (#1240) from ds-merck via GitHub on 2020-06-14 (public-dxwg-wg@w3.org from June 2020)

From: ds-merck via GitHub <sysbot+gh@w3.org>
Date: Sun, 14 Jun 2020 12:54:31 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issues.opened-638346959-1592139270-sysbot+gh@w3.org>

ds-merck has just created a new issue for https://github.com/w3c/dxwg:

== Question: how to catalog relational database data in DCAT? ==
Dear DCAT team,
I have a question regarding the correct use of DCAT to catalog data sitting in relational databases such as Oracle/MySQL/Postgres and data lake engines such as Apache Hive.
I aim to use the the Magda open source data catalog (https://magda.io/) to catalog datasets across our organization.
In my thoughts, I'd either model a database table or an entire database as a distribution of a dataset. This could apply to open accessible and private databases.

For referring to an **entire database**, I probably could just refer the JDBC string as "accessURL", e.g. _jdbc:oracle:thin:@hostname:1521:my-database_ or _jdbc:hive2://hostname:8443/my-database_ (or would that even be allowed as an "URL"?)

However, if I want to refer to a **single table**, things get more complicated. With JDBC strings I can not refer to individual tables, so can option could be to use combination of "accessURL" for the database and reference the table name in the "title". However, from my understanding of the dataset distribution attributes, "title" refers to a speaking name of the distribution, which might be different to the technical table name. Of course it could be an option to artificially attach the table name in the "accessURL" or put the table name in the description, but neither of it would be machine-readable

In order to apply DCAT correctly, what would be your proposal?

Thanks and kind regards,
Dominik

Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1240 using your GitHub account

Received on Sunday, 14 June 2020 12:54:33 UTC