Tooling

From Cognitive Accessibility Task Force

The Task Force members find several of the editorial and workflow tools commonly used by W3C groups to be unsuitable. This page collects requirements and potential solutions. I (Steve Lee) started this as a 2020 W3C Geek Week Project looking at Less Technical Editorial Workflow.

Note this document has not been reviewed by the task fource

Requirements

TLDR; The Taskforce want a cloud-based collaborative WYSIWYG content editing environment that supports comments while tracking and showing changes. They are more concerned about the content than its presentation (via W3C templates). This needs to be integrated with the W3C HTML document process based on GitHub-flow allowing S3C staff to support previews and publication of the HTML documents to TR space. Currently Google Docs works well but requires manual transfer of content to and from the HTML source used in the W3C process

The Taskforce also need an easier way to process external feedback via GitHub issues and their own actions.

Overview

Like any distributed and diverse group the Task Force need tooling to help them work both asynchronously in times that suite the individual and also synchronously in weekly meetings. Historically, such tooling has be the province of Open Source collaboration between highly technical contributors, namely developers. This sets up barriers for the taskforce whom really wish to concentrate on the content only - not the presentation.

The Taskforce do much work during the meetings but still want to use online collaboration tools then, often with one person sharing their desktop view over a video call. Currently Google Docs, IRC and zoom are used during meetings and google docs (esp comments) and email list are used between meetings.

The w3c process tooling for documents is definitely developer centric. Until recently it was CVS and script based but GitHub and GitHub-flow are now popular. HTML and metadata files are managed in git and GitHub tools with a GitHub-flow like process, enhanced with W3C tools like reSpec and BikeShed.

The WAI website uses completely separate tooling based again on GitHub but also using Jekyll (static site generator) and Netlify builds and deploys.

The technical experience of Taskforce members varies broadly as do their accessibility requirements. So the W3C tooling presents barriers to a significant number of members and most would appreciate improvements to be made. Typical barriers met with the tools include:

  • Complex and arcane processes or workflows that are hard to remember without making mistakes
  • Requirement to learn and recall concepts that are not directly related for the content work being undertaken
  • Complex and busy UI that is hard to learn and leads to mistakes - designed for developers
  • Non WYSIWYG editing - both for document source (eg HTML, CSS) and tooling (eg wikimarkup, markdown)
  • Hard to work with information that is spread about in many places and on many threads - eg Managing external feedback in issues
  • Previews and changes are not easy to view

Fortunately, it has so far proved possible for the Taskforce to use more comfortable tooling like Google Docs and more technical members and W3C staff can provide support with the more technical stages of integrating with W3C process, including previews and publishing. The transfer of content between Google Docs and HTML in git is flexible but a costly and error-prone manual process.

There is room for improvements to the Taskforce UX such as better integration or automation of conversion between tools and simplification of UIs. That's the point of this project.

Current Process

Source code is HTML kept in git. Content extracted to Google doc for collaborative editing and returned to git when ready

Collaborative Cloud Content editing in Google <-> web page source using reSpec in GitHub and previewed -> published document

  1. Editors transfer latest content manually from the published document into Google Doc
  2. Taskforce content collaboration mostly during calls with "suggesting changes" and comments (no use of diffs)
  3. Editors manually transfer content back to HTML source using a variant of GitHub-flow
  4. W3c staff help with reSpec and other deeply technical issues
  5. Editors manually make previews available
  6. All review previews
  7. W3C staff publish to TR according to W3 process

General Constraints

  • Git Docs not available in China - maybe ignore for now due to low need?
  • There might be upcoming W3C restrictions on using github.io for previews
  • Cloud based so no need for local installs of tooling
  • W3C publication tooling almost certainly depends on HTML source in GitHub

Specific issues with main editing workflow

Specific areas of activity that would benefit from attention are:

  • Collaborative editing - dev style local editing in branches, push, PRs and managed merges (eg GitFlow) are not popular
    • Using Google Docs WYSIWYG with inline comments is current preferred solution (but is not available in China)
    • Problems locating specific GitHub documents and sharing safely outside those who can edit
    • Dangers with document ownership and continued access
    • A forthcoming W3C GSuite instance might solve these issues
  • Only interested in the content and the final rendered page. HTML and CSS is a pure implementation detail
  • Generation of diffs and previews of the various versions from content edits and during publishing cycle
    • Differences in GitHub are raw HTML
    • Currently exploring "side by side" content diffs in Google docs with highlights and comments
    • Google docs versioning is far from ideal
    • HTMLDiffs may work but currently has some technical issues with complex modular documents like Content Usable
  • Transfer of content between Google to HTML in GitHub
    • currently manual both ways
    • Copy from rendered HTML to Google Doc is fairy easy with cut n paste
    • Regenerating HTML from content is slow manual process subject to errors

Other Issues

  • Processing external feedback that comes in via GitHub Issues and PRs
  • Action Tracking - GitHub issues are a problem for some due to distributed nature

Possible Approaches

  • Cloud based WYSIWIG collaborative content editing
    • Google Docs - used now
    • Office 365 - unlikely
    • Git hub editor only partly collaborative as requires PR etc
    • Need preview generation integrations
  • Content only Diffs
    • Google Docs has versions, track changes and compare files
    • GitHub diffs for HTML derived content
  • Preview Diffs
    • HTMLDiffs with preview build - Netlify and Htmldiff?
  • Simplify GitHub UX so no such a barrer

Potentially Interesting technologies

NB this project is requirements driven (see above) and tech should be selected based on them. That said, some technologies might be worth exploring for ideas and possibilities:

Thoughts

In a normal development flow the source content is kept under version control and a build process is used to derive the published artefacts. Changes to the source can then automatically trigger regeneration via CI and CD processes as required.

The current Taskforce workflow is using a kind of 'pre source' Google doc that is transferred from and to the HTML source in GitHub with highly expensive and error prone process (at least in one direction). A few (overlapping) ideas for straightening this out are:

Make the Google content the main source

The main source becomes the content in the google doc and everything is built from that

Pros:

  • No transferring content "back" from html to google doc, just from doc to HTML.
  • Traditional "straight line" build from source to derived content is easy to manage and automate
  • Ideal from taskforce collaboration point of view
  • Can extend Docs UI to provide integration with GitHub

Cons:

  • Google docs are not well organised and tend to 'float around' in Drive.
  • Google docs have own version control per document so different to usual code management
  • Need to programmatically access Google doc and trigger check-ins and builds (is manageable though)
  • The W3C process may expect HTML source to be in VC so would have to be modified if becomes derived.

Reduce the gap between the two formats

HTML is a big step from Google doc content. This gap could be reduced by using markdown in google docs or using templated content which compiles to HTML.

Pros:

  • Easier to manually transfer content
  • Easier to correctly automate. Possibly faster too

Cons:

  • Taskforce do not like working with like Markdown, preferring WYSIWYG.
  • Template syntax also complicates the content
  • Extended content grammar requires error checking and ideally alerts during edit
  • There is still a gap to be managed

Provide a WYSIWYG content editor

Ideally would edit HTML but only show simplified content

Pros:

  • great UX for taskforce
  • Can integrate workflow with GitHub

Cons:

  • complex collaborative interactive web app project
  • reinvent all Useful Google doc functionality

Simplify GitHub UX

Provide a simplified facade to GitHub, eg transfer format manage PRs etc. Could also work for problems with Issues

Pros:

  • Taskforce could more fully engagage in entire workflow
  • stepping stone to learning GitHub-flow
  • Might benefit many less technical users

Cons:

  • Might not make sense or be possible to do in a useful and meaningful way.
  • Complexity to manange
  • Might be better to provide training

Automate the format transfers

The transfer is expensive especially from content into HTML so automating this is a good step.

Pros:

  • reduce errors and cost
  • consistent results
  • build time so flexible use
  • can be triggered as part of automated workflow

Cons:

  • keeping two-way transfers idempotent can be hard so could introduce errors that get missed (one way is fine)
  • likely to require extra metadata or markup in content which must be kept up to date

Generate HTML from the Google doc source

Create the HTML + reSpec source from the google doc content using the Google API to get the structure like headings etc

Pros:

  • fully automated process from google doc content to HTML preview

Cons:

  • may not be possible to use required content forms - eg tables. Compare with markdown which lets you embed raw HTML
  • might be too complex or require extended syntax in google source
  • Google APIs can be a pain.

Proposal for discussion and proof of concept

  • Taskforce Use Google Doc content but it is actually stored in GitHub so files are temporary
  • A simple "check-out" process creates a doc with plain content that can be worked on and shared and deleted after check-in
  • A simple "check-in" creates a branch and PR for Editors to work with if required.
    • HTML + reSpec source code is used to generate a google doc
    • possibly commit and/or PR merge trigger a build and preview rather than commit process.
    • Google doc formatting data is used to generate HTML etc files
  • A final HTMLDiffs is available to compare actually pages
  • Limit to single file for now - thought Content Usable is multiple modules

Risks

  • Google formatting is too restrictive for creating required HTML and Taskforce will not accept extended syntax - only support minimal features
  • Can't easily create a usable HTML from Google doc content - is there an alternative?
  • Google APIs suck - yes they do but I've used sheets and Youtube - resort to manual cut and paste
  • W3C do not get a GSuite instance - probably makes life hard indeed for auth and management (can share docs though)!

Tech

  • GitHub actions / webhooks to run code transfer (node) on checkin/out events or perhaps Netlify Functions
  • Google API access (REST is OK) - auth (service) probably requires GSuite instance - use my own
  • Either: Netlify preview builds on CDN - eg trigger webhooks from from GitHub actions.
  • Or: do any required build during checkin and use githack etc for direct static preview deploy from GitHub
  • GApps code to add Check-out check-in menu items to google docs - or use a separate app, perhaps GitHub.

Research and Spikes

In suggested priority order:

  1. [ ] Review W3C_Tooling_Policy and new Process 2020
  2. [ ] Create Google doc with just content from HTML + reSpec - find the limitations
  3. [ ] Parse Google doc content back to update HTML + reSpec - find the limitations
  4. [ ] Create, Delete and read Google doc via API from nodeJS backed code (eg Netlify Function or GitHub Action)
  5. [ ] Integrate with GitHub to get source and create PR (eg again Netlify Function or GitHub Action)
  6. [ ] Google Doc extension or perhaps an App to trigger a check in / out of Google doc
  7. [ ] Build all files required for a W3C TR document - yaml, reSpec and other stuff
  8. [ ] Trigger build and preview deploy

Parse Google doc content to make HTML - find the limitations

  • Some content like HTML header including reSpec config are part of template and not wanted in Google Doc
  • The required HTML semantics are fairly restricted but reSpec adds it's own on top of HTML/CSS
  • for semantics (eg section id) to be kept between checkout and checkin they either need to be persisted in the Google Doc or some way of mapping and reattaching is needed
    • mapping and reattaching is probably too complex and fragile - check
    • If persisted in Doc then they must not get in the way of editing but must be visible enough so not accidentally deleted/changed
      • ???? Is there a suitable way to do this in Docs?
      • probably want to validate during checkin - though preview will show errors
      • We also need to allow adding new items, either in Doc or by highlighting in PR
analysis of Google Doc format a compatibility with HTML + reSpec

Google Document docs reSpec docs

  • Only actions are: get content as JSON, create empty (or copy) and add content using different data structures
  • Headings are Paragraphs with specific styles, including set of heading levels
  • styling is at paragraph and text level - paragraph looks most useful
  • Bullets are part of a paragraph declaration
  • Tables supported
  • No obvious way to store section ids - names ranges have problems. Auto Ids may work
    • ???? Do we actually needed manually specified sections ids?
analysis of HTML + reSpec and compatibility with Google Docs
  • sections with ids -??? can we avoid specific ids?