12 Jun 2018


EricP, David_Booth, Pawel, Kasia, Harold_Solbrig
David Booth


IPSM and ShEx example

<scribe> ACTION: David to find a shex translation example.

Previous discussion: https://www.w3.org/2018/05/29-hcls-minutes.html

Eric: End up having to do lots of ETL tricks. Would be good to share them.

David: Yes, that's the idea of crowdsourcing translations, in the Yosemite Project vision.

eric: Terminology mapping can be treated as a simple case of translation.

pawel: IPSM approach assumes that what you get as input is just a dataset, and you try to match the translation against the dataset, and ignore anything that doesn't match.
... But shex starts with a focus node, and report an error if it doesn't match.
... Would you consider an approach that does not require manually specifying the focus node?

eric: shex has a process that takes a focus node and a shape, and tests it. There's also a shexmap that gives you a set of ordederd shape nodes to test.
... You could generate a shapemap that tests all the nodes that you think it may be. But you could also gen that list some other way.
... The stuff that enables multiple nodes to match can have triple patterns. E.g., everything that is object of an :occursDuring statement.
... In IPSM, how clever does it get in trying to apply the mapings?

pawel: It just looks at an ordered list of nodes. We plan to have it choose the most specific, but not implemented yet.
... Currently a lot of intelligence goes into writing the translation files. If you put the less specific one first, it will mess up.
... Translation removes the part that was translated.

david: That approach we were trying two years ago, but didn't reach a conclusion about whether it was a good idea. Your experience?

pawel: For us it is the way to go, fitting our processing model.
... But we considered having it configurable.
... Alternative would be leaving the old data (duplicated).
... But that leads to problems, translating the same cell multiple times.
... The resulting structure might not follow your desired taxonomy.

eric: Your use case is assuming that the data will be understandable by default.

pawel: Not exactly. We assume that we don't want to lose any data, but enable processors to understand it.

david: Assumes that translation does not involve information loss?

pawel: yes.

david: To avoid info loss, could carry the original data along with the translated result.
... My other thought: supply a bidirectional test harness and verify against test data that the translation is reversable.

eric: How did bidirectional translation work in the CHCS-to-FHIR translation?

david: It was a pain, but shex helped, because of the way it takes both source and target schema, and you can just swap them to translate the other way.
... Glenna and Eric added a bi-directional regex support, for parsing apart "LASTNAME, FIRSTNAME" into separate lastname and firstname.

eric: But it wouldn't always work if both "f" and "F" map to "Female".

pawel: Example of translating ODOAL from something specific to more general, e.g., translating "bordeaux" to "red wine"

david: That's what lead to the idea of carrying the (otherwise lost) information separately, alongside the translation result, in case you want to reconstruct the original.

Turtle files previously discussed: https://lists.w3.org/Archives/Public/www-archive/2018May/0003.html

Slides from earlier presentation: https://lists.w3.org/Archives/Public/www-archive/2018Apr/att-0002/INTER-IoT_Semantic_Interoperability_3-04-2018.pdf

david: What translates the bound "hasUserLastName and hasUserName into a concatenated iiot:hasName "Jane Doe"

pawel: These get translated into SPARQL Updates
... The translations are defined in the align:map. entity2 defines the target (result). The bound variables are named "iiot:node_" (prefix).

Next week

pawel: away next week

Next meeting in two weeks.


Summary of Action Items

[NEW] ACTION: David to find a shex translation example.

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2018/06/12 16:19:00 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.152  of Date: 2017/02/06 11:04:15  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/make logs public//
Present: EricP David_Booth Pawel Kasia Harold_Solbrig
No ScribeNick specified.  Guessing ScribeNick: dbooth
Inferring Scribes: dbooth

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: david

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)

[End of scribe.perl diagnostic output]