W3C   W3C Internationalization (I18n) Activity: Making the World Wide Web truly world wide!

Latest del.icio.us tags

Blog searches

Contributors

If you own a blog with a focus on internationalization, and want to be added or removed from this aggregator, please get in touch with Richard Ishida at ishida@w3.org.

All times are UTC.

Powered by: Planet

Planet Web I18n

The Planet Web I18n aggregates posts from various blogs that talk about Web internationalization (i18n). While it is hosted by the W3C Internationalization Activity, the content of the individual entries represent only the opinion of their respective authors and does not reflect the position of the Internationalization Activity.

May 04, 2016

Global By Design

Adobe: The best global consumer technology website of 2016

For the 2016 Web Globalization Report Card, we studied the following 15 consumer technology websites: Adobe Apple Canon Dell HP HTC Lenovo LG Microsoft Nikon Panasonic Samsung Sony Toshiba Xiaomi The consumer technology sector includes many of the most globally successful companies. So it’s no surprise that the top four companies are also in the top  … Read more

by John Yunker at 04 May 2016 04:08 PM

May 03, 2016

W3C I18n Activity highlights

For review: Ruby Markup

A draft of a new article, Ruby Markup is out for wide review. We are looking for comments by 5 May.

The article describes how to mark up HTML for ruby support. (It will later be followed by a similar article describing how to style ruby.)

Please send any comments as github issues by clicking on the link “Leave a comment” at the bottom of the article. (This will add some useful information to your comment.)

by Richard Ishida at 03 May 2016 07:22 PM

April 25, 2016

Global By Design

American Express: The best global financial services website of 2016

For the 2016 Web Globalization Report Card, we studied 9 financial services websites: Allianz American Express Axa Citibank HSBC Marsh MasterCard Visa Western Union American Express emerged on top with support for an impressive 41 languages; it most recently added Bosnian. Allianz finished in second place in regards to languages. The AmEx home page, shown here,  … Read more

by John Yunker at 25 April 2016 12:02 AM

April 12, 2016

Global By Design

What’s the most multilingual website?

I often point to Wikipedia as one of the most multilingual websites on the Internet. Which is a major reason why Wikipedia finished in third place in the 2016 Web Globalization Report Card. But Wikipedia is not the most multilingual website. For that, I’d have to point toward the Jehovah’s Witnesses website. As only partially illustrated by the screen  … Read more

by John Yunker at 12 April 2016 01:26 AM

April 04, 2016

Global By Design

You can now register the Japanese equivalent of .com: .コム

And so it begins. Verisign, the registrar that manages .com domains, has begun its rollout of non-Latin .com equivalents, beginning with Japanese: Now, if you don’t have a Japanese domain name, slapping .コム to the end of your company’s name probably doesn’t make much sense from a branding perspective (though absolutely from an intellectual property perspective). But more and more  … Read more

by John Yunker at 04 April 2016 01:35 AM

March 30, 2016

W3C I18n Activity highlights

Unicode Conference speaker submission deadline 4 April

For twenty-five years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. The 40th conference will be held this year on November 1-3, 2016 in Santa Clara, California.

The deadline for speaker submissions is Monday, 4 April, so don’t forget to send in an abstract if you want to speak at the conference.

The Program Committee will notify authors by Friday, May 13, 2016. Final presentation materials will be required from selected presenters by Friday, July 22, 2016.

Tutorial Presenters receive complimentary conference registration, and two nights lodging, while Session Presenters receive a fifty percent conference discount and two nights lodging.

by Richard Ishida at 30 March 2016 03:57 PM

March 29, 2016

Global By Design

Say hello to the first .google domain

Google announced the launch of domains.google. today, not a new service but a newly “domained” service. I think it’s fitting that the first public use of .google is applied to its domains business. The question is: What other business lines will begin using .google? And what will .google ultimately resolve to? A search window?  

by John Yunker at 29 March 2016 09:42 PM

March 27, 2016

Global By Design

Chinese marathoners suffer from lack of translation

According to People’s Daily, a number of runners in a South China marathon suffered from more than simply lack of hydration. Try lack of translation. The bar of soap shown above was included in each runner’s swag bag — apparently a number of runners thought they were energy bars. Yes, folks, translation does matter! And even in English, that  … Read more

by John Yunker at 27 March 2016 01:43 AM

March 19, 2016

ishida>>blog » i18n

UniView now supports Unicode 9 beta

Picture of the page in action.
>> Use the picker

UniView now supports the characters introduced for the beta version of Unicode 9. Any changes made during the beta period will be added when Unicode 9 is officially released. (Images are not available for the Tangut additions, but the character information is available.)

It also brings in notes for individual characters where those notes exist, if Show notes is selected. These notes are not authoritative, but are provided in case they prove useful.

A new icon was added below the text area to add commas between each character in the text area.

Links to the help page that used to appear on mousing over a control have been removed. Instead there is a noticeable, blue link to the help page, and the help page has been reorganised and uses image maps so that it is easier to find information. The reorganisation puts more emphasis on learning by exploration, rather than learning by reading.

Various tweaks were made to the user interface.

by r12a at 19 March 2016 10:22 PM

March 17, 2016

Global By Design

Intel: The best global enterprise technology website of 2016

For the 2016 Web Globalization Report Card, we studied 11 enterprise technology websites: Autodesk Cisco Systems EMC IBM Huawei Intel Oracle SAP Texas Instruments Xerox VMware With support for 23 languages, Intel is not the language leader in this category; Cisco Systems leads with 40 languages. But Intel leads in other ways. Such as global navigation. First  … Read more

by John Yunker at 17 March 2016 03:08 PM

March 11, 2016

W3C I18n Activity highlights

New article: Guiding users to translated pages

This new article addresses the question: If my site contains alternative language versions of the same page, what can I do to help the user see the page in their preferred language?

This article is relevant for pages for which there are complete translations of the content. If your alternative pages have different content, or are regional variants rather than translations, you may need to do things differently.

Read the article.

The article is accompanied by a Swedish translation, thanks to Olle Olsson.

by Richard Ishida at 11 March 2016 11:12 AM

March 07, 2016

Global By Design

Chinese drawing even with English on global websites

Over the past decade Simplified Chinese has grown to become one of the most popular languages on global websites, second only to English. According to the Web Globalization Report Card, which has long monitored languages supported by the world’s leading brands, Chinese was seen on only about six out of ten websites in 2006. Today,  … Read more

by John Yunker at 07 March 2016 07:52 PM

March 05, 2016

Internet Globalization News

Globalization for All? Or Just Connections?

This article brings up an interesting issue, in my opinion. Is globalization truly "accessible" to everyone? Are there really more trans-border business opportunities just because small companies have Facebook fans who live in a different country? Or, is it just that "connections" have been made easier? The question is if those "foreign" connections will someday generate new business to those small companies, or will just remain as generators of "likes" - and this is not easy to determine. via www.mckinsey.com Globalization was once driven almost exclusively by the world’s governments, large multinational corporations, and major financial institutions. But now —thanks to digital platforms with global reach— artisans, entrepreneurs, app developers, freelancers, small businesses, and even individuals can participate directly. New research from the McKinsey Global Institute (MGI) uses novel data to analyze the extent of the connections and their economic impact. Facebook, the biggest of these online platforms, has grown...

by blogalize.me at 05 March 2016 08:18 PM

February 25, 2016

ishida>>blog » i18n

New picker: Egyptian hieroglyphs

Picture of the page in action.
>> Use the picker

I have just published a picker for Egyptian Hieroglyphs.

This Unicode character picker allows you to produce or analyse runs of Egyptian Hieroglyph text using the Latin script.

Characters are grouped into standard categories. Click on one of the orange characters, chosen as a nominal representative of the class, to show below all the characters in that category. Click on one of those to add it to the output box. As you mouse over the orange characters, you’ll see the name of the category appear just below the output box.

Just above the orange characters you can find buttons to insert RLO and PDF controls. RLO will make the characters that follow it to progress from right to left. Alternatively, you can select more controls > Output direction to set the direction of the output box to RTL/LTR override. The latter approach will align the text to the right of the box. I haven’t yet found a Unicode font that also flips the glyphs horizontally as a result. I’m not entirely sure about the best way to apply directionality to Egyptian hieroglyphs, so I’m happy to hear suggestions.

Alongside the direction controls are some characters used for markup in the Manuel de Codage, which allows you to prepare text for an engine that knows how to lay it out two-dimensionally. (The picker doesn’t do that.)

The Latin Characters panel, opened from the grey bar to the left, provides characters needed for transcription.

In case you’re interested, here is the text you can see in the picture. (You’ll need a font to see this, of course. Try the free Noto Sans font, if you don’t have one – or copy-paste these lines into the picker, where you have a webfont.)
𓀀𓅃𓆣𓁿
<-i-mn:n-R4:t*p->
𓍹𓇋-𓏠:𓈖-𓊵:𓏏*𓊪𓍺

The last two lines spell the name of Amenhotep using Manuel de Codage markup, according to the Unicode Standard (p 432).

by r12a at 25 February 2016 05:43 PM

February 24, 2016

Global By Design

The best global automotive website of 2016: BMW

For the 2016 Web Globalization Report Card, we studied 13 automotive websites: Audi BMW Chevrolet Ford Honda Hyundai Land Rover Lexus Mercedes Mini Nissan Toyota Volkswagen I want to preface this post by saying that automotive websites have historically been strong on languages but weak on global consistency and global navigation. This year is no exception, though  … Read more

by John Yunker at 24 February 2016 07:23 PM

February 23, 2016

Global By Design

Join me in Santa Clara next month for a web globalization event

I’m pleased to be presenting next month in Santa Clara, California on website globalization best practices. I’ll be drawing heavily on the most recent Report Card. And I’ll also be joined by a panel of web globalization experts. Here are the details: March 22, 2016 11:30 am Santa Clara, CA Bourbon Steak & Pub at Levi’s  … Read more

by John Yunker at 23 February 2016 03:56 PM

February 18, 2016

Global By Design

Is your website losing the language race?

For the past 12 years, the Web Globalization Report Card has closely tracked the languages supported by the leading global websites, including companies such as Apple, IBM, 3M, GE, Microsoft, and Google. This year, the average number of languages supported by these websites surpassed 30 languages, up from 14 languages in 2006. If you want to reach  … Read more

by John Yunker at 18 February 2016 06:28 PM

February 17, 2016

Global By Design

Most global websites now use country codes

As part of the 2016 Web Globalization Report Card I note the use of country codes among the world’s leading brands. It’s an imperfect process because different companies use country codes in different ways. For example, some websites use country codes as redirects back to the .com domain (not ideal, but better than nothing). Others use  … Read more

by John Yunker at 17 February 2016 04:08 PM

February 12, 2016

Global By Design

The top 25 global websites of 2016

  I’m pleased to announce the publication of the 2016 Web Globalization Report Card and, with it, the top 25 websites: Google Facebook Wikipedia Hotels.com NIVEA Booking.com Nestlé Pampers Adobe Intel Twitter Microsoft American Express BMW 3M Hitachi Starbucks Nike Samsung Cisco Systems Nikon TNT Philips Autodesk ABB It’s hard to believe that this is  … Read more

by John Yunker at 12 February 2016 07:49 PM

February 04, 2016

W3C I18n Activity highlights

For review: What is Ruby?

A new article, What is Ruby? is out for wide review. We are looking for comments by 10 February.

This new article will replace an older page, simply called Ruby, with more complete and up to date information. Other articles in preparation will address how to use markup and styling in HTML and CSS.

Please send any comments as github issues by clicking on the link “Leave a comment” at the bottom of the article. (This will add some useful information to your comment.) You may find that some links in the article won’t work, because this is a copy of the article which will eventually be published on the W3C site. There is no need to report those.

by Richard Ishida at 04 February 2016 05:09 PM

Putting Linguistic Linked Data Standards in Action: Webinar on the FREME Framework

FREME is a project that is developing a Framework for multilingual and semantic enrichment of digital content. A key aspect of the framework is that it puts standards and best practices in the area of linguistic linked data and multilingual content processing in action. We will introduce the framework in a dedicated webinar on 22 February, 4 p.m. CET. If you are interested in participating please contact Nieves Sande and Felix Sasaki for further logistics.

by Richard Ishida at 04 February 2016 04:56 PM

January 18, 2016

Global By Design

Companies are blogging less and that’s a mistake

An interesting study courtesy of the Society for New Communications Research: Dr. Nora Ganim Barnes has been studying corporate communications strategies of the Fortune 500 for the past eight years. Key findings include: Twenty-one percent of the Fortune 500 has a corporate blog (103 corporations) (21%); a decrease of 10% from 2014. Twitter is more popular  … Read more

by John Yunker at 18 January 2016 10:42 PM

January 14, 2016

ishida>>blog » i18n

What characters are in or not in encoding X?

I just received a query from someone who wanted to know how to figure out what characters are in and what characters are not in a particular legacy character encoding. So rather than just send the information to her I thought I’d write it as a blog post so that others can get the same information. I’m going to write this quickly, so let me know if there are parts that are hard to follow, or that you consider incorrect, and I’ll fix it.

A few preliminary notes to set us up: When I refer to ‘legacy encodings’, I mean any character encoding that isn’t UTF-8. Though, actually, I will only consider those that are specified in the Encoding spec, and I will use the data provided by that spec to determine what characters each encoding contains (since that’s what it aims to do for Web-based content). You may come across other implementations of a given character encoding, with different characters in it, but bear in mind that those are unlikely to work on the Web.

Also, the tools I will use refer to a given character encoding using the preferred name. You can use the table in the Encoding spec to map alternative names to the preferred name I use.

What characters are in encoding X?

Let’s suppose you want to know what characters are in the character encoding you know as cseucpkdfmtjapanese. A quick check in the Encoding spec shows that the preferred name for this encoding is euc-jp.

Go to http://r12a.github.io/apps/encodings/ and look for the selection control near the bottom of the page labelled show all the characters in this encoding.

Select euc-jp. It opens a new window that shows you all the characters.

picture of the result

This is impressive, but so large a list that it’s not as useful as it could be.

So highlight and copy all the characters in the text area and go to https://r12a.github.io/apps/listcharacters/.

Paste the characters into the big empty box, and hit the button Analyse characters above.

This will now list for you those same characters, but organised by Unicode block. At the bottom of the page it gives a total character count, and adds up the number of Unicode blocks involved.

picture of the result

What characters are not in encoding X?

If instead you actually want to know what characters are not in the encoding for a given Unicode block you can follow these steps.

Go to UniView (http://r12a.github.io/uniview/) and select the block you are interested where is says Show block, or alternatively type the range into the control labelled Show range (eg. 0370:03FF).

Let’s imagine you are interested in Greek characters and you have therefore selected the Greek and Coptic block (or typed 0370:03FF in the Show range control).

On the edit buffer area (top right) you’ll see a small icon with an arrow point upwards. Click on this to bring all the characters in the block into the edit buffer area. Then hit the icon just to its left to highlight all the characters and then copy them to the clipboard.

picture of the result

Next open http://r12a.github.io/apps/encodings/ and paste the characters into the input area labelled with Unicode characters to encode, and hit the Convert button.

picture of the result

The Encoding converter app will list all the characters in a number of encodings. If the character is part of the encoding, it will be represented as two-digit hex codes. If not, and this is what you’re looking for, it will be represented as decimal HTML escapes (eg. &#880;). This way you can get the decimal code point values for all the characters not in the encoding. (If all the characters exist in the encoding, the block will turn green.)

(If you want to see the list of characters, copy the results for the encoding you are interested in, go back to UniView and paste the characters into the input field labelled Find. Then click on Dec. Ignore all ASCII characters in the list that is produced.)

Note, by the way, that you can tailor the encodings that are shown by the Encoding converter by clicking on change encodings shown and then selecting the encodings you are interested in. There are 36 to choose from.

by r12a at 14 January 2016 08:29 PM

January 12, 2016

W3C I18n Activity highlights

Unicode Tutorial Workshop in Oman (Feb 14-16, 2016)

This tutorial workshop, sponsored by the Unicode Consortium and organized by the German University of Technology in Muscat, Oman, is a three-day event designed to familiarize the audience with the Unicode Standard and the concepts of internationalization. It is the first Unicode event to be held in the Middle East.

The workshop program includes an introduction to Writing Systems & Unicode, plus presentations on Arabic Typography, web best practices, mobile internationalization, and more.

The workshop website provides full information about the event. Early bird registration lasts until January 31, 2016, but register early to ensure a place.

by Richard Ishida at 12 January 2016 11:46 AM

January 02, 2016

ishida>>blog » i18n

New picker: Old English

Picture of the page in action.
>> Use the picker

Following closely on the heels of the Old Norse and Runic pickers comes a new Old English (Anglo-Saxon) picker.

This Unicode character picker allows you to produce or analyse runs of Old English text using the Latin script.

In addition to helping you to type Old English latin-based text, the picker allows you to automatically generate phonetic and runic transcriptions. These should be used with caution! The transcriptions are only intended to be a rough guide, and there may occasionally be slight inaccuracies that need patching.

The picture in this blog post shows examples of old english text, and phonetic and runic transcriptions of the same, from the beginning of Beowulf. Click on it to see it larger, or copy-paste the following into the picker, and try out the commands on the top right: Hwæt! wē Gār-Dena in ġēar-dagum þēod-cyninga þrym gefrūnon, hūðā æþelingas ellen fremedon.

If you want to work more with runes, check out the Runic picker.

by r12a at 02 January 2016 11:02 PM

January 01, 2016

ishida>>blog » i18n

New pickers: Runic & Old Norse

Picture of the page in action.
>> Use the picker

Character pickers are especially useful for people who don’t know a script well, as characters are displayed in ways that aid identification. These pickers also provide tools to manipulate the text.

The Runic character picker allows you to produce or analyse runs of Runic text. It allows you to type in runes for the Elder fuþark, Younger fuþark (both long-branch and short-twig variants), the Medieval fuþark and the Anglo-Saxon fuþork. To help beginners, each of the above has its own keyboard-style layout that associates the runes with characters on the keyboard to make it easier to locate them.

It can also produce a latin transliteration for a sequence of runes, or automatically produce runes from a latin transliteration. (Note that these transcriptions do not indicate pronunciation – they are standard latin substitutes for graphemes, rather than actual Old Norse or Old English, etc, text. To convert Old Norse to runes, see the description of the Old Norse pickers below. This will soon be joined by another picker which will do the same for Anglo-Saxon runes.)

Writing in runes is not an exact science. Actual runic text is subject to many variations dependent on chronology, location and the author’s idiosyncracies. It should be particularly noted that the automated transcription tools provided with this picker are intended as aids to speed up transcription, rather than to produce absolutely accurate renderings of specific texts. The output may need to be tweaked to produce the desired results.

You can use the RLO/PDF buttons below the keyboard to make the runic text run right-to-left, eg. ‮ᚹᚪᚱᚦᚷᚪ‬, and if you have the right font (such as Junicode, which is included as the default webfont, or a Babelstone font), make the glyphs face to the left also. The Bablestone fonts also implement a number of bind-runes for Anglo-Saxon (but are missing those for Old Norse) if you put a ZWJ character between the characters you want to ligate. For example: ᚻ‍ᛖ‍ᛚ. You can also produce two glyphs mirrored around the central stave by putting ZWJ between two identical characters, eg. ᚢ‍ᚢ. (Click on the picture of the picker in this blog post to see examples.)

Picture of the page in action.
>> Use the picker

The Old Norse picker allows you to produce or analyse runs of Old Norse text using the Latin script. It is based on a standardised orthography.

In addition to helping you to type Old Norse latin-based text, the picker allows you to automatically generate phonetic and runic transcriptions. These should be used with caution! The phonetic transcriptions are only intended to be a rough guide, and, as mentioned earlier, real-life runic text is often highly idiosyncratic, not to mention that it varies depending on the time period and region.

The runic transcription tools in this app produce runes of the Younger fuþark – used for Old Norse after the Elder and before the Medieval fuþarks. This transcription tool has its own idiosyncracies, that may not always match real-life usage of runes. One particular idiosyncracy is that the output always regularly conforms to the same set of rules, but others include the decision not to remove homorganic nasals before certain following letters. More information about this is given in the notes.

You can see an example of the output from these tools in the picture of the Old Norse picker that is attached to this blog post. Here’s some Old Norse text you can play with: Ok sem leið at jólum, gørðusk menn þar ókátir. Bǫðvarr spurði Hǫtt hverju þat sætti; hann sagði honum at dýr eitt hafi komit þar tvá vetr í samt, mikit ok ógurligt.

The picker also has a couple of tools to help you work with A New Introduction to Old Norse.

by r12a at 01 January 2016 01:43 PM

December 05, 2015

ishida>>blog » i18n

New app: Encoding converter

Picture of the page in action.
>> Use the app

This app allows you to see how Unicode characters are represented as bytes in various legacy encodings, and vice versa. You can customise the encodings you want to experiment with by clicking on change encodings shown. The default selection excludes most of the single-byte encodings.

The app provides a way of detecting the likely encoding of a sequence of bytes if you have no context, and also allows you to see which encodings support specific characters. The list of encodings is limited to those described for use on the Web by the Encoding specification.

The algorithms used are based on those described in the Encoding specification, and thus describe the behaviour you can expect from web browsers. The transforms may not be the same as for other conversion tools. (In some cases the browsers may also produce a different result than shown here, while the implementation of the spec proceeds. See the tests.)

Encoding algorithms convert Unicode characters to sequences of double-digit hex numbers that represent the bytes found in the target character encoding. A character that cannot be handled by an encoder will be represented as a decimal HTML character escape.

Decoding algorithms take the byte codes just mentioned and convert them to Unicode characters. The algorithm returns replacement characters where it is unable to map a given byte to the encoding.

For the decoder input you can provide a string of hex numbers separated by space or by percent signs.

Green backgrounds appear behind sequences where all characters or bytes were successfully mapped to a character in the given encoding. Beware, however, that the character mapped to may not be the one you expect – especially in the single byte encodings.

To identify characters and look up information about them you will find UniView extremely useful. You can paste Unicode characters into the UniView Edit Buffer and click on the down-arrow icon below to find out what they are. (Click on the name that appears for more detailed information.) It is particularly useful for identifying escaped characters. Copy the escape(s) to the Find input area on UniView and click on Dec just below.

by r12a at 05 December 2015 05:23 PM

November 22, 2015

ishida>>blog » i18n

Mongolian picker updated: standardised variants

Picture of the page in action.
>> Use the picker

An update to version 17 of the Mongolian character picker is now available.

When you hover over or select a character in the selection area, the box to the left of that area displays the alternate glyph forms that are appropriate for that character. By default, this only happens when you click on a character, but you can make it happen on hover by clicking on the V in the gray selection bar to the right.

The list includes the default positional forms as well as the forms produced by following the character with a Free Variation Selector (FVS). The latter forms have been updated, based on work which has been taking place in 2015 to standardise the forms produced by using FVS. At the moment, not all fonts will produce the expected shapes for all possible combinations. (For more information, see Notes on Mongolian variant forms.)

An additional new feature is that when the variant list is displayed, you can add an appropriate FVS character to the output area by simply clicking in the list on the shape that you want to see in the output.

This provides an easy way to check what shapes should be produced and what shapes are produced by a given font. (You can specify which font the app should use for display of the output.)

Some small improvements were also made to the user interface. The picker works best in Firefox and Edge desktop browsers, since they now have pretty good support for vertical text. It works least well in Safari (which includes the iPad browsers).

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

by r12a at 22 November 2015 01:03 PM

November 17, 2015

Wikimedia Foundation

Search translations.png

Image from the UK Ministry of Defence, freely licensed under OGL 1.0.

Google Summer of Code and Outreachy are two software development internship programs that Wikimedia participates in every year. For the last nine years, college students have applied to be a part of the coding summer, one of many outreach programs operated by the Wikimedia Foundation. After being accepted, they work on their project from May to August and are invited to a Mentor Summit in Google in November.

For the first time, all Wikimedia projects that passed the evaluation were immediately deployed in production or Wikimedia Labs. Take a look!

Reinventing Translation Search

Search translations.png

Search translations. Screenshot by Phoenix303, freely licensed under CC BY-SA 4.0.

TranslateWiki is a popular translation platform used by many projects across Wikimedia and several times as many outside it. Developed single-handedly once by Niklas Laxström, the platform has expanded significantly since its launch in 2006. This project aims to add a Search feature to the Translate extension. Without a search feature, it is difficult for translators to find specific messages they want to translate. Traversing all the translations or strings of the project is inefficient. Also, translators often want to check how a specific term was translated in a certain language across the project. This is solved by the special page Special:SearchTranslations. By default, translators can find the messages containing certain terms in any language and filter by various criteria. After searching, they can switch the results to the translations of said messages, for instance to find the existing, missing or outdated translations of a certain term. You can check it out here. Dibya Singh was the project intern.

Crosswatch. Screenshot by Sitic, freely licensed under CC0 1.0.

Cross-wiki watchlist

Crosswatch is a cross-wiki watchlist for all Wikimedia wikis. The goal of the project is to help editors who are active in several wikis to monitor changes and generally to provide a better watchlist for all editors. Among other things, crosswatch includes cross-wiki notifications, dynamic filtering and the ability to show diffs for edits. As an external tool which uses OAuth to retrieve the watchlists on behalf of the user, it doesn’t have the same constraints as MediaWiki and can experiment with the design and functionality of a watchlist without breaking existing workflows or code. It’s design is much more similar to the mobile watchlist than classical MediaWiki watchlist layout, there is however an option to use the traditional layout. Crosswatch can show a unified watchlist for all wikis or a watchlist subdivided into wikis. One of the predominant features is the native support to show a diff for an edit. The project was completed by Jan Lebert.

Wikivoyage PageBanner extension

PageBanner extension Screenshot. Screenshot by Frédéric Bolduc and others, freely licensed under CC BY-SA 3.0.

Wikivoyage is a wiki about travel and holds rich details related to visiting places. This wiki has a special preference for showing page wide banners at the top of each of their articles to enhance their aesthetic appeal. An example of such a banner can be seen here. These banners are traditionally shown using a template on the wiki. The banners shown through templates however had a few shortcomings such as not delivering an optimum size banner for each device, not rendering properly on mobile devices, too small banners on small mobile screens, not being able to show more than one level, or table of contents inside the banners. The project is all about addressing these issues and adding capabilities through a Mediawiki extension to take the banner experience to the next level. You can test it out here. Summit Asthana was the project intern.

Language Proofing Extension for VisualEditor

LanguageTool extension screenshot. Screenshot by Frédéric Bolduc and others, freely licensed under CC BY-SA 3.0.

LanguageTool is an extension for VisualEditor that enables language proofing support in about twenty languages. This includes spelling and grammar checking. Before this tool, VisualEditor relied on the browser’s native spelling and grammar checking tools. LanguageTool itself is an open source spelling and grammar proofing software created and maintained by Daniel Naber. This extension is an integration of the tool into VisualEditor. You can test this feature here and learn more about it here. Ankita Kumari completed the project.

Newsletter Extension for MediaWiki

Newsletter extension for Mediawiki. Screenshot by Frédéric Bolduc and others, freely licensed under CC BY-SA 3.0.

Many Wikimedia projects and developers use newsletters to broadcast recent developments or relevant news to other Wikimedians. But having to find a newsletter main page and then subscribing to it by adding your username to a wiki page doesn’t really sound appealing.
The main motivation of this project is to offer a catalog with all the newsletters available in a wiki farm, and the possibility to subscribe/unsubscribe and receive notifications without having to visit or be an active editor of any wiki. You can see this project in action here and learn more about it here. Tina Johnson was the intern for this project.

Flow support in Pywikibot

Flow extension to Pywikibot. Screenshot by Frédéric Bolduc and others, freely licensed under CC BY-SA 3.0.

This was a project to add support for Flow, MediaWiki’s new discussion framework, to Pywikibot, a Python framework widely used for bots operating on Wikimedia wikis. To accomplish this task, a module was implemented in Pywikibot with classes mapping to Flow data constructs, like topics and posts. Code supporting Flow-related API calls was also added, and tests were included for loading and saving operations. As it stands, Pywikibot-driven bots can now load Flow content, create new topics, reply to topics and posts, and lock and unlock topics. Learn more about this task here. This project was completed by Alexander Jones.

OAuth Support in Pywikibot

OAuth extension to Pywikibot. Screenshot by Frédéric Bolduc and others, freely licensed under CC BY-SA 3.0.

MediaWiki supports OAuth v1.0a as a method of authentication via OAuth extension. This project adds OAuth v1.0a support for Pywikibot. Pywikibot may be used as an OAuth application for MediaWiki sites with OAuth extension installed and configured properly. Developers may use Pywikibot to authenticate accounts and replace password with OAuth authentication as an alternative login method. This project also includes switching of HTTP library from httplib2 to requests and unit tests related to OAuth authentication and its integration with Pywikibot. All integration builds of Pywikibot now test OAuth on Travis CI (Ubuntu) and Appveyor (Win32). This enables ‘logged in’ tests to be performed on some wiki sites, including beta wiki, which is deployed on Beta Cluster and is an environment where password’s are not considered secure. Learn more about this project here. The project was completed by Jiarong Wei.

Extension to identify and remove spam

SmiteSpam extension. Screenshot by Frédéric Bolduc and others, freely licensed under CC BY-SA 3.0.

SmiteSpam is a MediaWiki extension that helps wiki administrators identify and delete spam pages. Because wikis are openly editable, they make great targets for spammers. From product advertisements to absolute garbage, any kind of spam turns up on wikis. While accurate detection of a spam wiki page is an open problem in the field of computer science, this extension tries to detect possible spam using some simple checks: How frequently are external links occurring in the text? Are any of the external links repeating? How much wikitext is present on the page? The extension does a reasonably good job of finding particularly bad pages in a wiki and presents them to the administrators. They can see a list of pages, their creators, how confident SmiteSpam is of them being spam, the creation time of the page and options to delete the page and/or block the creator. They can also mark users as “trusted”. Pages created by trusted users are ignored by SmiteSpam and will hence reduce the number of false positives in the results. Vivek Ghaisas completed the project.

VisualEditor Graph module

VE graph extension. Screenshot by Frédéric Bolduc and others, freely licensed under CC BY-SA 3.0.

ve-graph is a module within the Graph extension that aims to bring graph editing tools to VisualEditor in order to bridge the gap between editors and Vega, the visualization engine powering graphs in MediaWiki pages. Before, the only way for users to create and maintain graphs was to directly alter their specification in raw wikitext, which was not only hard to grasp for beginners but also very prone to errors. Those errors would simply render the graph unusable without offering any kind of feedback to the user as to what went wrong. With ve-graph, it is now possible to display graphs within VisualEditor and open up an interface to edit graph types, data and padding. The UI also offers a way to edit the raw JSON specification within VisualEditor without having to switch to the classic wikitext editor, in case more advanced users want to tweak settings not supported by the UI. This first step serves as a stepping stone for many possibilities with graph editing within VisualEditor, and there are a lot of ways in which ve-graph can be improved and expanded. This project is in live in action and you can see a demo here. Frédéric Bolduc completed the project.

 

Niharika Kohli, Wikimedia Foundation

by Niharika Kohli at 17 November 2015 07:30 AM

November 16, 2015

W3C I18n Activity highlights

Video published: Linguistic Linked Data and the LIDER project explained

This video explains what Linguistic Linked Data is and summarizes the outcomes of the LIDER project. This includes best practices for working with Linguist Linked Data, a reference architecture and a roadmap for future activities around Linguistic Linked Data. The video has been produced by the LIDER project and has been published during the European Data Forum 2015 event.

by Richard Ishida at 16 November 2015 11:51 AM


Contact: Richard Ishida (ishida@w3.org).