Data on the Web Best Practices Working Group Teleconference

23 Mar 2016


See also: IRC log


PWinstanley, ericstephan, phila, BernadetteLoscio, annette_g, newton, laufer, antoine, riccardoAlbertoni
Caroline, Yaso, Deirdre


I'll check the WebEx but zakim doesn't know about any conferences since the phone bridge was killed.

<hadleybeeman> okay, good. Just checking :)

But that doesn't know about timings etc.

<PWinstanley> hi ... what's the webex pwd please?

I think we should be in https://mit.webex.com/mit/j.php?MTID=m60aee05d2d79493fa43d347c93c8bcbc

Which is not the usual one :-(

I thought it would be

I havea been arranging and reareranging WebEx sessions a lot recently and I think there's a problem with the system - me

<PWinstanley> webex is needing a room id

646 117 896

<annette_g> what is our room id?

<laufer> hi all, I can´t get access to webex... it asks for a room id...

<hadleybeeman> @laufer and @annette_g ID no. is 646 117 896

<newton> @laufer the room id is 646 117 896\

<ericstephan> sounds like a high energy physics call

<laufer> tnx, hadley...

<annette_g> @laufer, 646 117 896 worked

<riccardoAlbertoni> hi all!

<BernadetteLoscio> hi Riccardo!

<newton> hi Riccardo

<scribe> scribe: phila

<scribe> scribeNick: phila

Previous minutes

<hadleybeeman> https://www.w3.org/2016/03/11-dwbp-minutes

PROPOSED: Accept last Telco minutes https://www.w3.org/2016/03/11-dwbp-minutes

<hadleybeeman> +1

<ericstephan> +1

<newton> +1

<antoine> +1

<PWinstanley> +1


<annette_g> +1

<riccardoAlbertoni> +1

RESOLUTION: Accept last Telco minutes https://www.w3.org/2016/03/11-dwbp-minutes

<laufer> +1

PROPOSED: Accept minutes of the Zagreb F2F https://www.w3.org/2016/03/14-dwbp-minutes and https://www.w3.org/2016/03/15-dwbp-minutes

<hadleybeeman> +1

<BernadetteLoscio> +1

<antoine> +1

<newton> +1


<ericstephan> +1 first day and 0 for second day I slept through

<riccardoAlbertoni> +1

RESOLUTION: Accept minutes of the Zagreb F2F https://www.w3.org/2016/03/14-dwbp-minutes and https://www.w3.org/2016/03/15-dwbp-minutes

BP on subsetting data http://w3c.github.io/dwbp/bp.html#EnableDataSubsetting

hadleybeeman: Anything to say, BernadetteLoscio and newton?

BernadetteLoscio: Yes, I sent a message about this today.
... I really appreciate that annette_g wrote the BP on this
... But I think the BP is too generic and too narrow, esp the aproach to implementation and how to test.
... Long discussion on e-mail (60+ messages) and I don't feel comfortable with this, since our rec should be technical. How can we test
... We need to define things like granularity, can we say the expected subset of the dataset. The subset depends on the domain, the query etc.
... It's not an easy subject

<laufer> +1 to bernadette

BernadetteLoscio: So in my opinion we should keep the text in the intro to the acccess session.

hadleybeeman: I'd like to talk about it for 5-10 mins and then see.

antoine: I agree that the BP is general, but that's what we agreed on. So that people like SDW can then specialise it. I miss a little the reference to the spatial domain. Add that and I think it's fine.
... i.e. our mission is not to tell people how it's done, but it's for otehrs to take this and expand on it.

hadleybeeman: I think we said that annette_g would write some text and we'd respond, so we're on target.

laufer: I agree with both Berna and Antoine.
... We have a BP that says that you shoudl make a bulk download available. Therefore it makes sense to talk about subsets.
... But we can be general.
... The approaches for implementation only talk about APIs. I think we have some examples where we have URIs using a rule system
... Maybe using the RFC on URI templates
... The process for providing these subsets need to be more detaiuled than the text has.

PWinstanley: Making sure I'm commenting on the right thing...

<BernadetteLoscio> yes Peter!

hadleybeeman: That's BP 21

PWinstanley: And the example is a 3 lines of text about MyCity
... I'd like to flesh it out, see the example, see what the diff is between getting a single route cf. all the routes.
... And in BP 22 there's more of an illustration with code.
... So I'd like to see for Example 22, a link to the API and an indication of how much of an improvement it makes to get one route cf. all of them.

hadleybeeman: So that's a proposal for making the example more concrete.

PWinstanley: yes.

annette_g: So, antoine, were you thinking about adding in a spatial example?

antoine: Not necessarily. The notes in the minutes may be too strong. I think it's SDW that wants the Best practice.
... We felt it was odd that we should be making suggestions

annette_g: I've been saying this for months, not just since the SDW got involved.

antoine: Explicitly saying that we recognise that we don't have a ready made solution but we expect others to.

annette_g: I'm confused. I don't think our goal was to make specific suggestions.
... We need to give them enough of a possible approach.

hadleybeeman: So including the wrods 'specific domains'... this may depend on the topic of data...

antoine: Yes, That gives a hint as to why we're not saying more with more numerous examples.

annette_g: So... it seems that we want to say implementation is dependent on your domain, but whether you follow it is not domain dependent.

newton: On the how to test section... I;m not so sure about this part.
... I'm not comfotrable with the how to test section.

BernadetteLoscio: I agree with Newton. But I also agree with Antoine that it should be generic and can be specialised.
... My problem is that I really don't know how (breaking up)

<newton> @Berna, may you speak again?

<newton> it's cutting

BernadetteLoscio: I think it will be hard for someone who didn't follow our discussion to see how to test this BP, but I agree that it should be generic

hadleybeeman: I have a potential solution... we acknowledge that in some circs we find it useful but we're struggling how to test.

<BernadetteLoscio> +1 Hadley

hadleybeeman: Can we say in BP 20 on Bulk Download that sometimes bulk download might be a subset

annette_g: A subset is not a bulk download.
... Can we do both?

<BernadetteLoscio> +q

hadleybeeman: I was thinking of it as a smaller blk download.

annette_g: A bulk download suggests you get everything, a subset is part of it

<newton> I liked the Hadley idea, or we could move it to a descriptive section...

phila: It's the word 'bulk'

<PWinstanley> I agree with hadley

annette_g: If you define a dataset as including any subsets

<BernadetteLoscio> We can move the text to the introduction of the Data Access section

<ericstephan> +1 annette_g I view of bulk and subset differently as well

annette_g: I can't imagine a definition of bulk download tghat covers the case of subsetting

laufer: I agree that 'bulk' download could be a subset, the biggest set. But this separation is OK.
... The bulk download - giving an API, or maybe a link, the method that you have to access data may be decided by the publisher.
... We don't focus too much on APIs, we talk more about thinks that look like search queries, and SPARQL endpoints.
... I think we can expand the implementation approach.
... A set of URIs is a way to provide subsets. I think we can talk more about these implementations, which remain general.

<ericstephan> -1 sparql endpoint would only appeal to a specific audience, I think we need something a bit more relatable to a larger audience

laufer: We have a gov org with info about @@ that are a set of URIs following a template.

hadleybeeman: Your LD perspectuve resonates with me.

annette_g: I chose to use APIs as the example as that's the way it's most often done.
... That's the way it's done in most places. URL templates look more like how to select a bunch if IDs at once. that's pointing into things, rather than retrieving them.
... I wouldn't mind adding stuff about SPARQL queries as well as. But generally, it's APIs people use.

laufer: I don't agree it's the majority. It's the majority in a domain.
... The way the subset is returned is not given by an API, which is a query mechanism. I think we have to differentiate between an API that returns subsets and a query.

annette_g: You don't think an API gives a subset?

laufer: You can have things other than subsets back from an API.
... I can have an API that does SQL, or CONSTRUCT...

annette_g: You can def do this sort of subsetting with APIs. I can find you 20 examples where transport info is available throuigh an API that returns subsets. I doubt I'd find a SPARQL endpoint.

newton: I like your ide, hadleybeeman, to move it to the intro part of the data access section. If it's too generic, we can't get evidence of it being implemented. And I don't want to lose the text/ideas

<BernadetteLoscio> +1 to Newton!

hadleybeeman: It feels as if we're deciding whether a separate BP on subsetting is/could be testable
... How far do you feel, Annette, that you can make it testable.

annette_g: I can make it's as testable as many of the others, including the bulk download one.

<ericstephan> +1 annette_g

annette_g: I don't think the testing here is any weaker than others.

<ericstephan> sub-bulk

annette_g: I could take the bulk download one and simply substitute the work 'subset' for 'bulk'

hadleybeeman: If we can get to the point where there's something testable, then maybe we could agree on this.

BernadetteLoscio: If we do the test with the bulk download, then it's not subsetting, it's bulk download.

<laufer> if we have a site that uses uri template we can test the bp

annette_g: I'm suggesting that we change it so that it can also apply to subsets

BernadetteLoscio: If you make the test considering bulk downloada, then you can test the subsetting too.
... And if I understand correctly, there are other ways of doing subsetting.
... We need to say how someone can test the BP. You talk about creating subsets of data

<laufer> here we have an example of subsetting in Brasil: http://orcamento.dados.gov.br/api-config

<laufer> using an uri template

BernadetteLoscio: On the how to test section, if we just look at bulk download, how can we include a test on downloading a subset?

hadleybeeman: Question to the editors - how much time do we have to add new things.
... In a perffect world, I'd give it several weeks but...?

newton: I don't think Berna heard you. I'm asking her by text. I think we have a deadline of 1 April to start freezing the doc. We need to close the dicsussion by the next meeting

hadleybeeman: I think we have a clear idea of what everyone's concerns are. Is this testable, is this represenatative, does it belong in its own BP.
... It seems Annette is clearest on what should be in there.
... Doable by Monday?

newton: Offers a Skype call.

annette_g: As long as wel cooperate on moving it forward.

hadleybeeman: We can work on it on Monday, discuss it on Tuesday.
... Ideally it goes to the group on Monday so the Wg can look at it before next Friday's call.

<hadleybeeman> phila: I think, though we resolved we would ave a meeting today and next Weds, I think by the end of the F2F in Zagreb, we decided that next week's meeting would be on Friday.

<hadleybeeman> ...Also the SDW BP editors are meeting in 12 minute's time.

<hadleybeeman> ...I will ask them to look at Annette's draft here in bP 21, and see if they have any comments.

hadleybeeman: if it's OK, Annette and Newton, I'd still like to focu your attention on Monday, even though we're not meeting until Friday

annette_g: We can do that if we're all responsive.

hadleybeeman: Thank you

<laufer> me

<hadleybeeman> http://agreiner.github.io/dwbp/bp.html#Re-use

data Reuse BP http://agreiner.github.io/dwbp/bp.html#Re-use

<laufer> http://orcamento.dados.gov.br/api-config

laufer: I just want to say to the editors and annette_g, it would be nice to see ^^ example

hadleybeeman: So, data reuse BP.
... The question is whether or not we include it

<ericstephan> I'm afraid I'm going to have to leave a bit early today...

BernadetteLoscio: We also discussed this
... We don't agree with the creation of a new section, but we like the idea of the BP - citation, feedback etc.
... If it's a way of republishing, then it shouold be separate.
... The BPs we have can also be followed by someone using an existing dataset.
... We have things like the feedback BP that are useful
... If we consider that reuse is a kind of publishing then we shouldn't make a special thing of this.
... We can include Annette's proposal in existing sections and BPs.

hadleybeeman: I am concerned on 2 levels. All of our charter and what we do is about encouageing reuse, so it seems odd to pull it out.
... and it also feels like a non-tech issue to me. i.e. out of scope.

<hadleybeeman> phila: Sorry, I think this is a good BP. I leave to the editors whether it gets its own section

<hadleybeeman> ...I guess it could be turned around to say what a publisher could expect, but I know that strays into policy issues which hadley is concerned we not go into

<hadleybeeman> ...Some of this crosses over into the Dataset Usage Vocab

<hadleybeeman> ...Which is in our charter precisely to encourage people to publish more. Publishers want to know more about who uses their data and why.

<hadleybeeman> ...Somehow it's beholden to this group to address the topic, without straying into policy, but that also mentions the DUV — I think we should include.

laufer: I agree with Hadley. I think all the things we do are about inbcreasing reuse. I don't like the term use/reuse - it's all use.
... It would be nice to have a paragraph on this topic. Maybe we have to talk about versioning as well as licensing, and feedbaclk and provenance
... How new vocs will be added to be compaitble with what's there already. We end up with a list of BPs of reuse.
... We can do it in a paragrpah or a deeper section.

hadleybeeman: I think Phil's right inasmuch as wanting to encourage use of the dataset usage voc
... and think about why we created it
... If someone wants to demonstrate where the data came from, then, OK.
... Maybe if we said 'If you want to do this, here's how...'

<laufer> I have one

laufer: We say for example that we have to put the license of the original data in the new data, but I don't know what it is, so I can't.

<laufer> no problem

hadleybeeman: Reusing my own data doesn't need to cite where it came from

<BernadetteLoscio_> this is about provenance!

annette_g: I don't agree with that - people always need to know where it came from.

<laufer> but we have a BP about provenance

<laufer> I do not understand why we have to repeat this

hadleybeeman: I suggest we need to take this to the mailing list and try and come vbacxk with something for Monday. If we can't we can't.

<laufer> we have the bp for license

BernadetteLoscio_: I'm OK with continuing by mail but I'm afraid it's going to be hard to get consensus.

<laufer> we have the bp for feedback

hadleybeeman: If we can't get consensus, someething gets dropped.

BernadetteLoscio_: I agree of course.
... We have two threads on subsetting and one on reuse.

<annette_g> bye!

<laufer> bye all

<laufer> thank you, annette

<riccardoAlbertoni> bye all .

<BernadetteLoscio_> bye! thanks everyone!

Summary of Action Items

Summary of Resolutions

  1. Accept last Telco minutes https://www.w3.org/2016/03/11-dwbp-minutes
  2. Accept minutes of the Zagreb F2F https://www.w3.org/2016/03/14-dwbp-minutes and https://www.w3.org/2016/03/15-dwbp-minutes
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.144 (CVS log)
$Date: 2016/03/23 15:03:37 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.144  of Date: 2015/11/17 08:39:34  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/shiuld/should/
Succeeded: s/dodn't/don't/
Succeeded: s/I;m/I'm/
Succeeded: s/shoudl/should/
Succeeded: s/shoujld/should/
Succeeded: s/separateion/separation/
Succeeded: s/WE/We/g
Found Scribe: phila
Inferring ScribeNick: phila
Found ScribeNick: phila
Present: PWinstanley ericstephan phila BernadetteLoscio annette_g newton laufer antoine riccardoAlbertoni
Regrets: Caroline Yaso Deirdre
Agenda: https://www.w3.org/2013/dwbp/wiki/Meetings:Telecon20160323
Found Date: 23 Mar 2016
Guessing minutes URL: http://www.w3.org/2016/03/23-dwbp-minutes.html
People with action items: 

[End of scribe.perl diagnostic output]