W3C

Automotive Working Group Teleconference

21 May 2019

Attendees

Present
Frank, PatrickL, PatrickB, Marty, Laurent, Daniel, Ted, Ulf, Paul, Benjamin, Josh
Regrets
Chair
PatrickL
Scribe
Ted

Contents


Gen2 Ideas

https://lists.w3.org/Archives/Public/public-automotive/2019May/0009.html

Audio problems, stick with mail discussion

Uber ontology overview

Uber ontology overview slides

Josh: I was asked to share details about our data models
... at present going through process to open them fully and will do so before the September workshop, this is an overview
... 10k ft view at Uber
... 200k managed datasets, we passed the 10B trip mark last year, quite a bit more sensor data
... we measure the on-demand data by rows and other by TB
... we have been working on schemas for on-demand and streaming datasets, RPC
... not much top down structure, standardizing internally and putting quite a bit of effort in using some central standard schemas
... mostly for RPC, less for streaming and storage
... need to track drivers, devices, vehicles - easy and obvious identifiers but needed normalizing to be able to bring datasets together
... we are using this internally and interested in opening up
... UUIDs, timestamps - you would be surprised at all the different types of timestamps being used
... sensors, money, geospacial
... we have a notion of entities and their relationships linked by UUID
... we use primarily protobuf, trift, avro, RDF and Property Graph
... we have a common data model we can map from
... we have tooling to map data schemas
... all of these 200k+ datasets have metadata around them, around privacy (GDPR), retention etc
... it is not readily intuitive what is user information or PII
... we are annotating our data to make clear what it is, user, vehicle etc so we can share information across schemas

[algebraic pg slide]

Josh: attended W3C workshop in May on Graphs
... here are some of the main formats used. at the top is our common logical format. we use rdf, yaml...
... this will give some glimpses of our schemas - position events...
... we author the schemas in YAML format, we can them map them to protobuf, thrift, avro, rdf, generate documentation

Ted: your YAML2x tools open source by chance? we do similar with our data model VSS - YAML2x

Josh: which language?

Ted: Benjamin or Daniel? (as I don't use it much)

Daniel: Python

[interest in leveraging each others' tooling]

Benjamin: can we go back to your mapping
... don't you loose information or need to draw assumptions in going between these formats?

Josh: if you ignore name graphs in RDF drops a subset, going back from it a challenge
... avro or protobuf, eg position event, you have ordering but not in rdf. there are things that don't map correctly and what we tend to do is embed the rest in comments and that way the conversion can contain and recover when possible

Benjamin: ok understand your 1:1 with comments. based on your experiences which do you prefer, what is the primary choice for analysis?

Josh: our logical model which I called algebraic property graphs in earlier slide
... I gave a talk at GraphDay, title is Graph is a graph is a graph that explains in more detail these mappings

<PatrickLue> Other presentation from Joshua: https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012

Josh: RDF mapping is used mainly for visualization and integration with other systems

Ted: can you comment on other systems used?

Josh: we use an in-house system on top of sandra for oltp, realtime graph
... patches for apache spark for our jobs, submitted by data scientists manually and they can get results back in hours
... for realtime we get 100ms responses
... we use graph sage for knowledge graphs and related techniques
... in the case of the metadata, we have table and field level and a service to retrieve it on a dataset basis

Ted: field level granularity mostly for privacy handling or other metadata?

Josh: yes, privacy and retention
... we are considering different options for our graph db
... our PoC for inference on categories is cql (category query language by david spinnak at mit and his group)

Daniel: this effort is pretty transformative, how much manpower was behind this in the last year?

Josh: wish we had more people, looking to grow the team. it is an organizational challenge about where it should reside
... it is a diverse team, coordinating between the different departments

Benjamin: anxious to learn about the specific ontologies of course
... from the sample excerpt and your mention of coordinating on eg timestamp are you following spatial data best practices from w3c?

Josh: initial push is internally and then we need to work on externalizing, standardizing
... we focused on schema.org for alignment. of the thousands of ontologies
... our time ontology is roughly based on the owl one. we have a number of existing schemas dealing with time which was an issue

Benjamin: wanting something stable and not requiring major changes and wanting to map in a future proof way

Josh: mapping is hard and will remain so
... we did not find a good standard ontology for addresses for example. we started with a schema.org like format
... our address experts have pushed for a more nuanced internal standard. it is similar to format in google api, we have addresses for display and storage (key/value map), interservice communication

Ted: can you provide a list of your schemas under consideration for this workshop?

Josh: simple type defs URLs, UUIDs, timestamps. contact information emails, addresses. spherical geography, currency amounts, exchange rates

[all on Data Standardization slide]

Josh: Trip table is not a standard schema but a table
... I cannot elaborate too much on sensor events yet

Ted: sensors, along with vehicle signals could be useful for federated learning for autonomous vehicles. issue would be in the variation of placement etc across manufacturers

Benjamin: that would be just another schema to handle how they differ across vehicles for consumption
... another challenge would be the volume of data
... do you intend to contribute to schema.org as well some of your fundamental ontologies

Josh: we consider to contributing to them or extending. the rdfs format wasn't a good fit and do not have a clear path for contributing to them at present

Benjamin: you can propose schemas for peer review. you should queue up what may be used by the community

Josh: we would have to look at the different datasets on the web and their relationships with our schemas
... there is an iot.schema.org but have not looked at vehicles one

Benjamin: best to not create competing ones. concerning information about vehicle going simply from a to b but it is limited
... you will find lots of datasets, eg public taxi, it is too basic
... you do not have much about vehicle sensors, it is a different case
... you are more into the data mobility than vehicle sensor area

Josh: we have looked around very carefully for some schemas

Ted: could you perhaps ask someone from ATG to talk with us on sensors, adas etc signals?

Josh: hope to bring them in

Benjamin: there are some ways to coordinate via iot.schema.org and maybe bring in their perspective
... simple interaction patterns
... schema.org prefers step by step extensions, it is meant to be really central

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/05/22 18:47:25 $