This extended abstract is a contribution to the Easy-to-Read on the Web Symposium. The contents of this paper were not developed by the W3C Web Accessibility Initiative (WAI) and do not necessarily represent the consensus view of its membership.
One of the factors that can decrease readability is the presence of specialized language and advanced terminology. Such terminology can influence the understanding and the ease of read for lay persons. Several research studies have identified that many of the existing web sites within specialized domain use language with advanced terminology that is often inappropriate for their target audience. Examples of domains where this can happen are medicine, technology, low, finance and others. There are several Natural Language Processing (NLP) tools that are trying to cope with this. In this paper we evaluate the efficiency of a NLP tool that is labeling medical terminology with the terms definition.
Research in the field revealed a big amount of web sites having content with advanced terminology. For example, in the medical domain, a research done by Miller et al. (2007) evaluating websites with medical content/language with a custom classifier has concluded: 'The classifier was then applied to existing consumer health Web pages. We found that only 4% of pages were classified at a layperson level, regardless of the Flesch reading ease scores, while the remaining pages were at the level of medical professionals. This indicates that consumer health Web pages are not using appropriate language for their target audience'. Apart from having content author choosing a lay language for the information presented, there are several NLP tools designed to increase the readability of text with specialized terminology.
In this research we intended to evaluate the efficiency of terminology labeling method for increasing the readability of texts. More precise, the evaluation was done on content with medical language. By terminology labeling we mean appending the definition of the term (in different ways) to the text. For this, we based our tests on the way the text4all tool (available at Text 4 all website (2012)) is working. We choose this tool because it can recognize terms that are not in their canonical form by using a fuzzy matching approach. For the evaluation we have designed a questioner that has been answered by 23 participants. The questioner tried to evaluate several aspects related to readability increase on a text that was adapted using text4all tool, and had the medical terms labeled; The participants were chosen to be people not involved in medical area (although, when asked if they have high interest and knowledge in medial area, 5 responded affirmative). The age distribution was six participants under 25 years, 14 were between 25 and 50, and the other 3 were over 50 years. The participants were native Romanian speakers. The evaluation was done on a medical text in Romanian language about the treatment of thyroid cancer selected from an interview with a doctor. The interview was published in a popular Romanian web site (Medlive Hotnews website (2012)) offering medical information for end users.
The evaluated aspects were:
We asked the participants if they understood the adapted text (having terms definition appended) better than the original and offered three possible answers, the answers distribution is illustrated in table 1.
|I understood the message better||It made no difference||It confused me|
We asked the participants about the impact of the explained terminology over the reading ease, the answers distribution is presented in table 2.
|It helped me||It made no difference||It disturbed me|
This shown that inserting the definition of the term into the original text can harm the readability. However, this depends on the mode the terms definition are being added to the original text. On the web, this can be done in several ways, like: append the definition term in parenthesis in the original text; tooltip over the term, mark the term and let the user ask for definition by right click and others;
The last question in the questioner was about the preference on the presentation mode of the definition of the term. The answers are presented in Table 3.
|Inline (inserted into the text)||Tooltip over the term||Click / Right click on the term||I don't know|
The method tested here (appending terms definition) proved to be efficient in 20% in terms of message understanding increase, and even less in terms of reading ease increase. Other NLP methods, like replacing terminology with synonimes or short definitions and/or rephrasing can have a bigger impact on readability. However, the main challenge in deciding which NLP methods to use goes beyond the impact on readability. In cases like medical texts, where the message can have a high importance over the persons accessing it (imagine the text is representing a diagnostic or medical recommendation) there are other risks that needs to be taken into account. An interesting study by Ogden et al. (2003) has shown that by only using lay vocabulary instead of medical terms in diagnostics, the patients can under evaluate their health status. The study concludes: Although much current prescriptive literature in general practice advocates the use of lay language in the consultation as a means to promote better doctor-patient partnerships, the issue of diagnosis is more complex than this. Patients attribute greater benefits to the use of medical labels for themselves and state that such medical labels are of greater benefit to the doctor . So there has to be an equilibrium among readability increase and side effect user risks that can occur in language adaptation.
Another challenge when using this term labeling method is, how to present the terms definition more efficiently. Our study shown that the participants tend to prefeer having the definition requested by right click, or as tooltip.
Other challenges are related to correctly identifying terms that are not in their canonical form by using fuzzy matching, but we will not go into details of this problem because it is not very relevant to this consortium.
The usage of NLP tools for adapting specialized language to increase readability has to take into account the side effect risks related to the change of message impression and authority over the user, that can occur from replacing the specialized terminology with lay language.
The method of adapting specialized text by adding the term definition into the text seems to be the ideal compromise between readability increase and user safety (related to the risks induced by replacing the original terms).
The preferred solutions for labeling terminology seem to be by request (for example right click on the term) and tooltip over the term.
Finding the ideal presentation mode for terms definition (tooltip, integrated into the text, footnotes, by request via click/right click...).
Looking how existing or new HTML tags or tag attributes can enhance the presentation of terms definition on the web.
Terminology can be classified on levels of difficulty. Explore how a web page can be markup so that a user can see only terminology of a specific level.
Research for methods that allow language adaptation for better readability while not inducing risks as change of impression over the message.
Research what types of texts/areas are more suitable for increasing such risks when language adaptation is done.