Smart Descriptions & Smarter Vocabularies (SDSVoc)
How we search for data? - Towards User-Driven dataset descriptions
Emilia Kacprzak, Laura Koesten, Luis-Daniel Ibáñez, Elena Simperl
We propose to the workshop the problem of understanding how people searches for data in the current portals and search engines. Our ultimate goal is to design and implement a dataset search engine, where describing datasets in the most appropriate way is critical. We argue that to converge to a description vocabulary optimal for dataset discovery and engines, we require to understand the needs of the actual data users and identify the key differences with other kinds of search and descriptions, thus, making the process more user-driven, instead of being completely driven by the dataset themselves.
We analyzed this question from a qualitative angle by conducting interviews with 20 data practitioners and from a quantitative angle by analysing 3 years worth data of queries submitted to data.gov.uk and ons.gov.uk. We would like to share with the audience of the workshop our findings regarding:
- How the interviewed data practitioners see the process of searching for data. We found that there is an evaluate and explore phase following the search: once a dataset is found, it is needed to evaluate its relevance, quality and usability by exploring it both in isolation and in connection with other previously found datasets, before the dataset is actually used
- The types of queries used in data portals: we found that most queries are short in length and are underspecified with respect to Web Search queries. Queries are often single keywords representing entities like "London" without further specification about what do you want about London, or even categories like "Food" or "Crime", suggesting a strong exploratory use. We also characterize the frequency of time-intervals and numerical values in the queries
We expect to spark the discussion among the attendees about the implications of our findings for current vocabularies and their evolution.