Requirements analysis

From CSV on the Web Working Group Wiki
Jump to: navigation, search

New requirements added (Jeremy, 2nd June 2014):


Restructured to reflect current UCR doc (Jeremy, 2nd June 2014)


Removed R-HeadingColumns requirement; the intent was akin to R-PrimaryKey (Jeremy, 2nd June 2014)


Removed the following requirements as duplicates (Jeremy, 3rd June 2014)


New requirements added (Jeremy, 5th June 2014)


Empty requirements categories removed; remaining categories renumbered (Davide & Jeremy, 12th June 2014)

  • 3.2.2 Requirements relating to annotation of CSV
  • 3.2.3 Requirements relating to metadata discovery


3.2.1 Requirements relating to parsing of CSV

Requirement Name Definition Davide's Comments Eric's Comments Jeremy's Comments
R-WellFormedCsvCheck Ability to determine that a CSV is syntactically well formed "This requires the CSV file to adhere to the tabular data format defined by Jeni et al. I think that this is a crucial requirement, and that is should be accepted. I only wonder whether this might be too strict: what if that format is too strict with respect to an existing CSV file? Should that existing CSV file be just considered as non-conformant?" Recommend Acceptance Agree.
R-RightToLeftCsvCheck Ability to determine that a CSV is using RTL. New New New

3.2.2 Requirements relating to applications

Requirement Name Definition Davide's Comments Eric's Comments Jeremy's Comments
R-CsvValidation Ability to validate a CSV for conformance with a specified metadata definition "This requirement is also related to R-ExternalDataDefinitionResource, but it adds to the functional part." Recommend Acceptance Accept
R-CsvToRdfTransformation Ability to automatically transform a CSV into RDF Recommend removing. Is this perhaps a best practice as opposed to a requirement? Retain this requirement - this underpins the function to be able to transform CSV into a different encoding
R-CsvToJsonTransformation Ability to automatically transform a CSV into JSON Recommend removing. Is this perhaps a best practice as opposed to a requirement? Retain this requirement - this underpins the function to be able to transform CSV into a different encoding
R-CsvToXmlTransformation Ability to transform a CSV into XML New New New
R-CanonicalMappingInLieuOfAnnotation Ability to transform CSV conforming to the core tabular data model yet lacking further annotation into a object / object graph serialisation "This seems to be related to R-CsvToRdfTransformation, R-CsvToJsonTransformation. If these two are kept does this requirement go away?" "This is intended to fulfil the requirement to transform a CSV file (to RDF, JSON or other!) when there is no additional metadata annotation available ... retain this requirement"
R-IndependentMetadataPublication Ability to publish metadata independently from the tabular data resource it describes. New New New
R-SpecificationOfPropertyValuePairForEachRow Ability to define a property-value pair for inclusion in each row. New New New
R-ConditionalProcessingBasedOnCellValues Ability to apply conditional processing based on the value of a specific cell. New New New
R-CommentLines Ability to identify comment lines within a CSV file and skip over them during parsing, format conversion or other processing. New New New

3.2.3 Non-functional requirements

Requirement Name Definition Davide's Comments Eric's Comments Jeremy's Comments
R-ZeroEditAdditionOfSupplementaryMetadata Ability to add supplementary metadata to an existing CSV file without requiring modification of that file See above comments Recommend Acceptance "Retain this requirement - but include a note that ""as a result of being able to add supplementary metadata without modification of the original CSV file, those CSV files will continue to be compatible with existing tooling"""

3.2.4 Data Model requirements

Requirement Name Definition Davide's Comments Eric's Comments Jeremy's Comments
R-CellMicrosyntax Ability to parse internal data structure within a cell value "This requirement regards the possibility to define internal structures of cells (i.e., additional separators, etc.). It is important, in my opinion, and should be covered also by the tabular data model. " Recommend Acceptance Agree.
R-NonStandardFieldDelimiter Ability to parse tabular data with field delimiters other than comma "Also this requirement could covered by R-WellFormedCsvCheck, by the delimiter parameter." Agree with Davide. Recommend removing. Agree.
R-PrimaryKey Ability to determine the primary key for entities described within a CSV file * Recommend Acceptance Accept
R-ForeignKeyReferences Ability to cross reference between CSV files "* explicit dependency with R-LinksToExternallyManagedDefinitions, R-AssociationOfCodeValuesWithExternalDefinitions and R-ExternalDataDefinitionResource" Recommend Acceptance Accept
R-AnnotationAndSupplementaryInfo Ability to add annotation and supplementary information to CSV file All starred (*) requirements could be considered as sub-requirements of this requirement. Recommend Acceptance Sub-requirements would be OK - but then a sub-requirement is still a requirement! I'd be concerned about making the requirement scope too broad so as the relationship to the motivating use case becomes meaningless. Would be happy for the starred (*) requirements to be specified as dependencies of this requirement.
R-AssociationOfCodeValuesWithExternalDefinitions Ability to associate a code value with externally managed definition * explicit dependency with R-LinksToExternallyManagedDefinitions, R-ForeignKeyReferences and R-ExternalDataDefinitionResource Recommend Acceptance Merge with R-LinksToExternallyManagedDefinitions
R-CsvAsSubsetOfLargerDataset Ability to assert how a single CSV file is a facet or subset of a larger dataset * Recommend Acceptance Accept
R-SyntacticTypeDefinition Ability to declare syntactic type for data values * Recommend Acceptance Accept
R-SemanticTypeDefinition Ability to declare semantic type for data values * Recommend Acceptance Accept
R-MissingValueDefinition "Ability to declare a ""missing value"" token and, optionally, a reason for the value to be missing" I think that this requirement is important and should also be incorporated in the tabular data model Recommend Acceptance Accept
R-URIMapping Ability to map the values of a CSV row/column into corresponding URI (e.g. by concatenating those values with a prefix). Recommend Acceptance Accept
R-UnitMeasureDefinition Ability identify/express the unit of measure for the values reported in a given column. subrequirement of R-SemanticTypeDefinition Recommend Acceptance Assert dependency on R-SemanticTypeDefinition
R-GroupingOfMultipleTables Ability to group multiple data tables into a single package for publication "This requirement is related to R-CsvAsSubsetOfLargerDataset, although I think they are clearly two distinct ones." Recommend Acceptance Accept - and agree with the relationship noted by Davide... and they are distinct requirements
R-LinkFromMetadataToData Ability for a metadata description to explicitly cite the tabular dataset it describes New New New
R-MultilingualContent Ability to declare a locale / language for content in a specified column New New New
R-ListsAsRepeatedFields Ability to provide multiple values for a given property within a single row using repeated columns New New New

3.3 Deferred requirements

Requirement Name Definition Davide's Comments Eric's Comments Jeremy's Comments
R-MultipleHeadingRows "Ability to handle headings spread across multiple initial rows, as well as to distinguish between single column headings and file headings." "Related to the requirement above, the header column count and header row count parameters of the tabular data format should fit this need, so I propose to merge these requirements with the first one (R-WellFormedCsvCheck), unless these really need to be treated separately." Recommend Acceptance Agree with merge.
R-RandomAccess Ability to access and/or extract part of a CSV file in a non-sequential manner. This is a functional requirement that might be useful when handling large CSV files. Recommend Acceptance "I think that this requirement is the same as R-CsvAsSubsetOfLargerDataset and should be merged - if I read it correctly, we are wanting to describe regions of a LARGE dataset so that the content of those regions can be assessed and the data can be accessed by loading just that region (e.g. without needing to load the entire LARGE and/or distributed dataset)?"
R-TableNormalization Ability to normalize data that is not in normal form and possibly vice-versa. "I'm dubious about this requirement. It is important to be able to normalize/denormalize tables, and to find correspondences between normalized and denormalized tables. However, I wonder whether this is a requirement of the CSV itself or, rather, of the software demanded to handle it." "This requirement is referenced in 3.13 Use Case #13 - Representing Entities and Facts Extracted From Text. I recommend the requirement name changed to ""R-TableRowsWithNullCellValues"". I also recommend the definition as ""Ability to treat tables with irregular numbers of cells as appended with null values""" "I think that this(use case #13) is an example where we have a ""not well-formed"" CSV file that needs to be ""normalised"" such that it is well-formed before any transformation can be applied. Personally, I think that the requirement itself is valid (e.g. the conversion to a well-formed CSV) - but, as Davide says, that this is something that is outside the scope of ""CSV"" itself."