Warning:
This wiki has been archived and is now read-only.
Requirements analysis
From CSV on the Web Working Group Wiki
Mapping Requirements to Specifications
New requirements added (Jeremy, 2nd June 2014):
- R-LinkFromMetadataToData
- R-ListsAsRepeatedFields
- R-MultilingualContent
- R-CsvToXmlTransformation
- R-IndependentMetadataPublication
- R-SpecificationOfPropertyValuePairForEachRow
- R-RightToLeftCsvCheck
Restructured to reflect current UCR doc (Jeremy, 2nd June 2014)
Removed R-HeadingColumns requirement; the intent was akin to R-PrimaryKey (Jeremy, 2nd June 2014)
Removed the following requirements as duplicates (Jeremy, 3rd June 2014)
New requirements added (Jeremy, 5th June 2014)
Empty requirements categories removed; remaining categories renumbered (Davide & Jeremy, 12th June 2014)
- 3.2.2 Requirements relating to annotation of CSV
- 3.2.3 Requirements relating to metadata discovery
3.2.1 Requirements relating to parsing of CSV
| Requirement Name | Definition | Davide's Comments | Eric's Comments | Jeremy's Comments |
|---|---|---|---|---|
| R-WellFormedCsvCheck | Ability to determine that a CSV is syntactically well formed | "This requires the CSV file to adhere to the tabular data format defined by Jeni et al. I think that this is a crucial requirement, and that is should be accepted. I only wonder whether this might be too strict: what if that format is too strict with respect to an existing CSV file? Should that existing CSV file be just considered as non-conformant?" | Recommend Acceptance | Agree. |
| R-RightToLeftCsvCheck | Ability to determine that a CSV is using RTL. | New | New | New |
3.2.2 Requirements relating to applications
| Requirement Name | Definition | Davide's Comments | Eric's Comments | Jeremy's Comments |
|---|---|---|---|---|
| R-CsvValidation | Ability to validate a CSV for conformance with a specified metadata definition | "This requirement is also related to R-ExternalDataDefinitionResource, but it adds to the functional part." | Recommend Acceptance | Accept |
| R-CsvToRdfTransformation | Ability to automatically transform a CSV into RDF | Recommend removing. Is this perhaps a best practice as opposed to a requirement? | Retain this requirement - this underpins the function to be able to transform CSV into a different encoding | |
| R-CsvToJsonTransformation | Ability to automatically transform a CSV into JSON | Recommend removing. Is this perhaps a best practice as opposed to a requirement? | Retain this requirement - this underpins the function to be able to transform CSV into a different encoding | |
| R-CsvToXmlTransformation | Ability to transform a CSV into XML | New | New | New |
| R-CanonicalMappingInLieuOfAnnotation | Ability to transform CSV conforming to the core tabular data model yet lacking further annotation into a object / object graph serialisation | "This seems to be related to R-CsvToRdfTransformation, R-CsvToJsonTransformation. If these two are kept does this requirement go away?" | "This is intended to fulfil the requirement to transform a CSV file (to RDF, JSON or other!) when there is no additional metadata annotation available ... retain this requirement" | |
| R-IndependentMetadataPublication | Ability to publish metadata independently from the tabular data resource it describes. | New | New | New |
| R-SpecificationOfPropertyValuePairForEachRow | Ability to define a property-value pair for inclusion in each row. | New | New | New |
| R-ConditionalProcessingBasedOnCellValues | Ability to apply conditional processing based on the value of a specific cell. | New | New | New |
| R-CommentLines | Ability to identify comment lines within a CSV file and skip over them during parsing, format conversion or other processing. | New | New | New |
3.2.3 Non-functional requirements
| Requirement Name | Definition | Davide's Comments | Eric's Comments | Jeremy's Comments |
|---|---|---|---|---|
| R-ZeroEditAdditionOfSupplementaryMetadata | Ability to add supplementary metadata to an existing CSV file without requiring modification of that file | See above comments | Recommend Acceptance | "Retain this requirement - but include a note that ""as a result of being able to add supplementary metadata without modification of the original CSV file, those CSV files will continue to be compatible with existing tooling""" |
3.2.4 Data Model requirements
| Requirement Name | Definition | Davide's Comments | Eric's Comments | Jeremy's Comments |
|---|---|---|---|---|
| R-CellMicrosyntax | Ability to parse internal data structure within a cell value | "This requirement regards the possibility to define internal structures of cells (i.e., additional separators, etc.). It is important, in my opinion, and should be covered also by the tabular data model. " | Recommend Acceptance | Agree. |
| R-NonStandardFieldDelimiter | Ability to parse tabular data with field delimiters other than comma | "Also this requirement could covered by R-WellFormedCsvCheck, by the delimiter parameter." | Agree with Davide. Recommend removing. | Agree. |
| R-PrimaryKey | Ability to determine the primary key for entities described within a CSV file | * | Recommend Acceptance | Accept |
| R-ForeignKeyReferences | Ability to cross reference between CSV files | "* explicit dependency with R-LinksToExternallyManagedDefinitions, R-AssociationOfCodeValuesWithExternalDefinitions and R-ExternalDataDefinitionResource" | Recommend Acceptance | Accept |
| R-AnnotationAndSupplementaryInfo | Ability to add annotation and supplementary information to CSV file | All starred (*) requirements could be considered as sub-requirements of this requirement. | Recommend Acceptance | Sub-requirements would be OK - but then a sub-requirement is still a requirement! I'd be concerned about making the requirement scope too broad so as the relationship to the motivating use case becomes meaningless. Would be happy for the starred (*) requirements to be specified as dependencies of this requirement. |
| R-AssociationOfCodeValuesWithExternalDefinitions | Ability to associate a code value with externally managed definition | * explicit dependency with R-LinksToExternallyManagedDefinitions, R-ForeignKeyReferences and R-ExternalDataDefinitionResource | Recommend Acceptance | Merge with R-LinksToExternallyManagedDefinitions |
| R-CsvAsSubsetOfLargerDataset | Ability to assert how a single CSV file is a facet or subset of a larger dataset | * | Recommend Acceptance | Accept |
| R-SyntacticTypeDefinition | Ability to declare syntactic type for data values | * | Recommend Acceptance | Accept |
| R-SemanticTypeDefinition | Ability to declare semantic type for data values | * | Recommend Acceptance | Accept |
| R-MissingValueDefinition | "Ability to declare a ""missing value"" token and, optionally, a reason for the value to be missing" | I think that this requirement is important and should also be incorporated in the tabular data model | Recommend Acceptance | Accept |
| R-URIMapping | Ability to map the values of a CSV row/column into corresponding URI (e.g. by concatenating those values with a prefix). | Recommend Acceptance | Accept | |
| R-UnitMeasureDefinition | Ability identify/express the unit of measure for the values reported in a given column. | subrequirement of R-SemanticTypeDefinition | Recommend Acceptance | Assert dependency on R-SemanticTypeDefinition |
| R-GroupingOfMultipleTables | Ability to group multiple data tables into a single package for publication | "This requirement is related to R-CsvAsSubsetOfLargerDataset, although I think they are clearly two distinct ones." | Recommend Acceptance | Accept - and agree with the relationship noted by Davide... and they are distinct requirements |
| R-LinkFromMetadataToData | Ability for a metadata description to explicitly cite the tabular dataset it describes | New | New | New |
| R-MultilingualContent | Ability to declare a locale / language for content in a specified column | New | New | New |
| R-ListsAsRepeatedFields | Ability to provide multiple values for a given property within a single row using repeated columns | New | New | New |
3.3 Deferred requirements
| Requirement Name | Definition | Davide's Comments | Eric's Comments | Jeremy's Comments |
|---|---|---|---|---|
| R-MultipleHeadingRows | "Ability to handle headings spread across multiple initial rows, as well as to distinguish between single column headings and file headings." | "Related to the requirement above, the header column count and header row count parameters of the tabular data format should fit this need, so I propose to merge these requirements with the first one (R-WellFormedCsvCheck), unless these really need to be treated separately." | Recommend Acceptance | Agree with merge. |
| R-RandomAccess | Ability to access and/or extract part of a CSV file in a non-sequential manner. | This is a functional requirement that might be useful when handling large CSV files. | Recommend Acceptance | "I think that this requirement is the same as R-CsvAsSubsetOfLargerDataset and should be merged - if I read it correctly, we are wanting to describe regions of a LARGE dataset so that the content of those regions can be assessed and the data can be accessed by loading just that region (e.g. without needing to load the entire LARGE and/or distributed dataset)?" |
| R-TableNormalization | Ability to normalize data that is not in normal form and possibly vice-versa. | "I'm dubious about this requirement. It is important to be able to normalize/denormalize tables, and to find correspondences between normalized and denormalized tables. However, I wonder whether this is a requirement of the CSV itself or, rather, of the software demanded to handle it." | "This requirement is referenced in 3.13 Use Case #13 - Representing Entities and Facts Extracted From Text. I recommend the requirement name changed to ""R-TableRowsWithNullCellValues"". I also recommend the definition as ""Ability to treat tables with irregular numbers of cells as appended with null values""" | "I think that this(use case #13) is an example where we have a ""not well-formed"" CSV file that needs to be ""normalised"" such that it is well-formed before any transformation can be applied. Personally, I think that the requirement itself is valid (e.g. the conversion to a well-formed CSV) - but, as Davide says, that this is something that is outside the scope of ""CSV"" itself." |