RE: My BP comments

… and spaghetti!
Un spaghetto (also linguino, trofia, orecchietta …) is an interesting concept ☺

From: Frans Knibbe [mailto:frans.knibbe@geodan.nl]
Sent: Wednesday, 13 January 2016 8:49 PM
To: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
Cc: Bill Roberts <bill@swirrl.com>; Krzysztof Janowicz <janowicz@ucsb.edu>; Jeremy Tandy <jeremy.tandy@gmail.com>; SDW WG Public List <public-sdw-wg@w3.org>
Subject: Re: My BP comments

Whether 'data' is used as a plural or singular noun probably does not have much to do with British English versus US English. The problem exists in Dutch language too and I can imagine in some others too. I think it has to do with awareness of the word being a plural form. When someone recognizes that 'data' is the plural form of 'datum' she or he will probably be more likely to treat it as a plural form. A similar word is 'media'. I think it is used as a singular when the word is not recognized as the plural form of 'medium'. It happens with Italian words too - I often hear or read words like 'grafitti' or 'panini' being used as singular nouns.

Greetings,
Frans

2016-01-12 19:11 GMT+01:00 Andrea Perego <andrea.perego@jrc.ec.europa.eu<mailto:andrea.perego@jrc.ec.europa.eu>>:
The Wiktionary may help here:

https://en.wiktionary.org/wiki/data#English


Quoting:

[[
Usage notes

This word is more often used as an uncountable noun with a singular verb than as a plural noun with singular datum.
]]


Andrea

On 12/01/2016 18:50, Bill Roberts wrote:
not perhaps our most important issue, but my opinion is that 'data'
reads most naturally as a singular word - probably because it's often
thought of as a non-countable noun, like water - you can have 'some
data', but few people would say 'I have 100 data'.

Some people like to be more faithful to its Latin roots and have plural
'data' and singular 'datum' - but use of 'datum' is very rare in English
(UK English anyway).  'Data point' is probably a more common way to
refer to a datum.

So probably either approach is acceptable if we are self-consistent, but
I would vote for singular 'data'.

Bill





On 12 January 2016 at 16:54, Krzysztof Janowicz <janowicz@ucsb.edu<mailto:janowicz@ucsb.edu>
<mailto:janowicz@ucsb.edu<mailto:janowicz@ucsb.edu>>> wrote:
    > 2. I notice the word 'data' is taken as singular. That looks
    funny to me, but I know there are differences of opinion in that
    respect. Do W3C or OGC have a recommendation on whether to treat
    'data' as a singular or plural noun?

    As a native English speaker (OK, that doesn't mean much) "data"
    looks and sounds correct.

    @phila ... any comment from W3C perspective; I know I'm supposed
    to write in US-english :-)

    To the best of my knowledge data is plural, datum is the singular form.

    Krzysztof



    On 01/12/2016 08:44 AM, Jeremy Tandy wrote:
    Hi Frans. Thanks for your commentary ... responses below.

    @lvdbrink ... can you comment on number #4? Also, can you consider
    a redraft of Section 2 (see points #7 and #8 below) and the
    opening of section 6.1 (see point #11).

    > 1. (already discussed in the teleconference) The introduction or
    scope section could do with an explanation of how the document
    relates to the description of the Best Practices deliverable in
    the charter, especially the first and last bullet points.

    See PR 203 <https://github.com/w3c/sdw/pull/203> (already merged)
    ... hopefully this does the trick.

    > 2. I notice the word 'data' is taken as singular. That looks
    funny to me, but I know there are differences of opinion in that
    respect. Do W3C or OGC have a recommendation on whether to treat
    'data' as a singular or plural noun?

    As a native English speaker (OK, that doesn't mean much) "data"
    looks and sounds correct.

    @phila ... any comment from W3C perspective; I know I'm supposed
    to write in US-english :-)

    > 3.In paragraph 1.1 discoverability and accessibility are listed as
    the key problems. I think interoperability (between different
    publications of spatial data and between spatial data and other
    types of data) could be listed as a third main problem; many
    requirements have to do with interoperability.

    Created new issue for discussion: ISSUE 205
    <https://github.com/w3c/sdw/issues/205>

    > 4. section 1.1: problems that are experienced by different
    groups (commercial operators, geospatial experts, web developers,
    public sector) are described. I get the impression that those
    problems are the only or main problems that are experienced by a
    certain group, but I don't think that is the case. Perhaps the
    listed problems could be marked as examples? Or the list of
    problems per group could be expanded?

    Indeed- the list of problems is not exhaustive, only illustrative.
    As an introduction I felt that this reads OK. @lvdbrink - wdyt?

    > 5.secion 1:1 “we've adopted a Linked Data approach as the underlying
    principle of the best practices ”: Such a statement might drive
    away people that for some reason resist the idea of Linked Data,
    or in general don't like to have to adopt a new unknown paradigm.
    It also looks like the WG was biased in identifying best practices
    (Linked Data or bust). How about stating that upon inspection of
    requirements and current problems and solutions concepts from the
    Linked Data paradigm transpired to be most applicable? Or perhaps
    Linked Data does not need to be mentioned at all.... Requirements
    like linkability, discoverability and interoperability
    automatically lead to recommending using HTTP(S) URIs and common
    semantics.

    The WG has agreed on several occasions (including F2F at
    Nottingham) that we would "adopt the linked data approach" because
    we feel this is the best way to surface spatial data on the web.
    Rereading the BP text, I can see how a bias might be taken. I've
    reworded as follows ...

    "Analysis of the requirements derived from scenarios that describe
    how spatial data is commonly published and used on the Web (as
    documented in [[UCR]]) indicates that, in contrast to the workings
    of a typical SDI, the <a
    href="<http://www.w3.org/standards/semanticweb/data>http://www.w3.org/standards/semanticweb/data">Linked
    Data</a> approach is most appropriate for publishing and using
    spatial data on the Web. Linked Data provides a foundation to many
    of the best practices in this document."

    Hope that works for you.

    > 6. I think an explanation of the term 'spatial data' should be
    somewhere very high up in the document (abstract and/or
    introduction), especially that spatial <> geographic (geographical
    data is a subset of spatial data)

    Agreed. New issue added to the document at beginning of Intro.
    ISSUE 206 <https://github.com/w3c/sdw/issues/206>

    > 7. Section 2: There seems to be overlap with description of user
    groups in the introduction (1.1). This leads (or could lead) to
    duplicate information. Why not just mention in the introduction
    that there are multiple audiences and that they are described in
    section 2?

    Agreed. New issue added. ISSUE 207
    <https://github.com/w3c/sdw/issues/207>

    > 8. Section 2: I wonder if the three groups that are described
    cover all audience types. Some more I can think of are [...]

    Good point. Added toISSUE 207
    <https://github.com/w3c/sdw/issues/207> as additional copy for a
    potential redraft of section 2.

    > 9. Section 3: “SDW focuses on exposing the individual; the
    entities, the SpatialThings, within a spatial dataset ”. That
    seems to exclude spatial metadata, which is an important subject
    in SDW.

    Agreed. Now, referencing the deliverables from the charter, the
    Scope states: "The use of metadata to complement spatial data".

    > 10.“Can be tested by machines and/or data consumers ”: I consider
    data consumers to be humans or machines. In fact, it could be used
    as a useful way of avoiding having to write ''humans or machines'
    each time. Most best practices should benefit both humans and
    machines. Only in some cases the distinction is meaningful.

    Reworded to: "Compliance with each best practice in this document
    can be tested by programmatically and/or by human inspection."

    > 11.6.1: Is the discussion about features, information resources and
    real world things really necessary? I find it slightly confusing
    and I can imagine other will too. Why not just say that if you
    want spatial data to be referenceable on the web you need to use
    URIs? Just that makes a lot of sense and could be less confusing.

    @lvdbrink has attempted to capture the discussion that occurred
    during the Sapporo F2F; this discussion certainly had value at the
    time. I'm wary of reducing the context to the single statement you
    suggest but agree that it's not currently straight forward. We may
    also want to talk about the difference between Features
    (information resources) and Spatial Things (the resources
    described by the information) and the fact that in the end, the
    distinction is often not helpful.

    I've added a new issue to capture this point. ISSUE 208
    <https://github.com/w3c/sdw/issues/208>

    > 12. Best practice 3: I notice best practices 1 and 2 are phrased
    as solutions or recommendations . I think it is a good idea to try
    to do that for all best practices. So instead of “Working with
    data that lacks globally unique identifiers for entity-level
    resources” one could write “make spatial relationships explicit”

    See ISSUE 193 <https://github.com/w3c/sdw/issues/193> that echoes
    your sentiment for BP style. That said, your suggested text misses
    the intended point. There's more content needed for BP3 (and
    perhaps a major redraft?) as stated in ISSUE 102
    <https://github.com/w3c/sdw/issues/102> ... the concern is not so
    much making spatial relationships explicit, but what to do if your
    data doesn't use URIs. How do you convert from locally scoped
    identifier to URI?

    > 13.I appreciate seeing references to BP requirements from the UCR
    document. But they are placed in the 'Evidence' section of the BP
    template now. Is it appropriate to count requirements derived from
    use cases as evidence of a best practice? I would expect
    references to use cases and requirements to occur in the 'Why'
    section of the template. Or in a template section that is
    especially reserved for requirements, e.g 'Relevant requirements'.

    We're following the pro-forma set out by DWBP (for example, see
    <http://w3c.github.io/dwbp/bp.html#identifiersWithinDatasets>http://w3c.github.io/dwbp/bp.html#identifiersWithinDatasets).
    I'll admit to not thinking too hard about this so far. I have
    raised an issue in the WG tracker (ISSUE 36
    <https://www.w3.org/2015/spatial/track/issues/36>) so that we come
    back to this discussion post release of FPWD.

    > 14. Best practice 8: Is this based on theCRS wiki page
    <https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems>?
    It seems that WGS84 is recommended. But that is debatable and
    could be considered American-centric. European guidelines
    recommend ETRS89. Also, high-precision is not defined. Also, no
    mention is made of the need to add temporal data if a CRS with an
    increasing error with time (like WGS84) is needed. Also no mention
    is made of how to reconcile local CRSs (as in a building plan)
    with global CRSs. I think CRSs are one of the areas that do
    require some extra standardisation efforts outside of this
    document, but which could be instigated by our working group.

    I've added your comment to ISSUE 128
    <https://github.com/w3c/sdw/issues/128> which is associated with
    BP 8. We can improve the content post FPWD release.

    > 15.BP 10: I would at least recommend to be aware of significant digits.

    Added your comment to ISSUE 125
    <https://github.com/w3c/sdw/issues/125>

    > 16. Appendix C: Why are all UC requirements listed? Why not only
    the BP requirements? That would make a more compact table.

    There were many requirements that were not specifically marked for
    the BP- but turned out to be related ... so we captured those.
    Also, while we are working on the BP, it's good to have this full
    list. Perhaps when we're complete, it would make sense to truncate.

    Thanks for all your efforts. Jeremy

    On Thu, 7 Jan 2016 at 12:30 Frans Knibbe
    <<mailto:frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>>frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>
    <mailto:frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>>> wrote:

        Hello,

        Following are my comments, after reading the BP draft from top
        to bottom:

         1. (already discussed in the teleconference) The introduction
            or scope section could do with an explanation of how the
            document relates to the description of the Best Practices
            deliverable in the charter, especially the first and last
            bullet points.
         2. I notice the word 'data' is taken as singular. That looks
            funny to me, but I know there are differences of opinion
            in that respect. Do W3C or OGC have a recommendation on
            whether to treat 'data' as a singular or plural noun?
         3. In paragraph 1.1 discoverability and accessibility are
            listed as the key problems. I think interoperability
            (between different publications of spatial data and
            between spatial data and other types of data) could be
            listed as a third main problem; many requirements have to
            do with interoperability.
         4. section 1.1: problems that are experienced by different
            groups (commercial operators, geospatial experts, web
            developers, public sector) are described. I get the
            impression that those problems are the only or main
            problems that are experienced by a certain group, but I
            don't think that is the case. Perhaps the listed problems
            could be marked as examples? Or the list of problems per
            group could be expanded?
         5. secion 1:1 “we've adopted a Linked Data approach as the
            underlying principle of the best practices ”: Such a
            statement might drive away people that for some reason
            resist the idea of Linked Data, or in general don't like
            to have to adopt a new unknown paradigm. It also looks
            like the WG was biased in identifying best practices
            (Linked Data or bust). How about stating that upon
            inspection of requirements and current problems and
            solutions concepts from the Linked Data paradigm
            transpired to be most applicable? Or perhaps Linked Data
            does not need to be mentioned at all.... Requirements like
            linkability, discoverability and interoperability
            automatically lead to recommending using HTTP(S) URIs and
            common semantics.
         6. I think an explanation of the term 'spatial data' should
            be somewhere very high up in the document (abstract and/or
            introduction), especially that spatial <> geographic
            (geographical data is a subset of spatial data)
         7. Section 2: There seems to be overlap with description of
            user groups in the introduction (1.1). This leads (or
            could lead) to duplicate information. Why not just mention
            in the introduction that there are multiple audiences and
            that they are described in section 2?
         8. Section 2: I wonder if the three groups that are described
            cover all audience types. Some more I can think of are
            A) People working with spatial data that is not
            geographical (e.g. SVG, CAD, BIM).
            B) People involved in development of standards that have
            something to do with spatial data on the web .
            C) People involved in development of software that can
            work with spatial data.
         9. Section 3: “SDW focuses on exposing the individual; the
            entities, the SpatialThings, within a spatial dataset ”.
            That seems to exclude spatial metadata, which is an
            important subject in SDW.
        10. “Can be tested by machines and/or data consumers ”: I
            consider data consumers to be humans or machines. In fact,
            it could be used as a useful way of avoiding having to
            write ''humans or machines' each time. Most best practices
            should benefit both humans and machines. Only in some
            cases the distinction is meaningful.
        11. 6.1: Is the discussion about features, information
            resources and real world things really necessary? I find
            it slightly confusing and I can imagine other will too.
            Why not just say that if you want spatial data to be
            referenceable on the web you need to use URIs? Just that
            makes a lot of sense and could be less confusing.
        12. Best practice 3: I notice best practices 1 and 2 are
            phrased as solutions or recommendations . I think it is a
            good idea to try to do that for all best practices. So
            instead of “Working with data that lacks globally unique
            identifiers for entity-level resources” one could write
            “make spatial relationships explicit”
        13. I appreciate seeing references to BP requirements from the
            UCR document. But they are placed in the 'Evidence'
            section of the BP template now. Is it appropriate to count
            requirements derived from use cases as evidence of a best
            practice? I would expect references to use cases and
            requirements to occur in the 'Why' section of the
            template. Or in a template section that is especially
            reserved for requirements, e.g 'Relevant requirements'.
        14. Best practice 8: Is this based on the CRS wiki page
            <https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems>?
            It seems that WGS84 is recommended. But that is debatable
            and could be considered American-centric. European
            guidelines recommend ETRS89. Also, high-precision is not
            defined. Also, no mention is made of the need to add
            temporal data if a CRS with an increasing error with time
            (like WGS84) is needed. Also no mention is made of how to
            reconcile local CRSs (as in a building plan) with global
            CRSs. I think CRSs are one of the areas that do require
            some extra standardisation efforts outside of this
            document, but which could be instigated by our working group.
        15. BP 10: I would at least recommend to be aware of
            significant digits.
        16. Appendix C: Why are all UC requirements listed? Why not
            only the BP requirements? That would make a more compact
            table.


        Greetings, and keep up the good work!

        Frans


    --
    Krzysztof Janowicz

    Geography Department, University of California, Santa Barbara
    4830 Ellison Hall, Santa Barbara, CA 93106-4060

    Email:jano@geog.ucsb.edu<mailto:Email%3Ajano@geog.ucsb.edu> <mailto:jano@geog.ucsb.edu<mailto:jano@geog.ucsb.edu>>
    Webpage:http://geog.ucsb.edu/~jano/

    Semantic Web Journal:http://www.semantic-web-journal.net


--
Andrea Perego, Ph.D.
Scientific / Technical Project Officer
European Commission DG JRC
Institute for Environment & Sustainability
Unit H06 - Digital Earth & Reference Data
Via E. Fermi, 2749 - TP 262
21027 Ispra VA, Italy

https://ec.europa.eu/jrc/

Received on Wednesday, 13 January 2016 23:14:35 UTC