dai11y 29/08/2023 – A Deep Dive into Accessibility APIs

Your daily frequent11y newsletter, brought to you by @ChrisBAshton:

A Deep Dive into Accessibility APIs

I’ve read this three-part series by Neill Hadder, who works at Knowbility. Below is a perhaps oversimplified summary – I’d encourage you to click through to the articles themselves if you’re keen to learn more!

Part 1: Swinging Through the Accessibility Tree Like a Ring-Tailed Lemur

This is an introductory article that still goes into a fair bit of depth, explaining the history of the document object model (DOM) following the earlier Windows component object model (COM) and Mac’s OS X Coco API. The ‘accessibility tree’ is a web document object whose children include all the accessible objects from the DOM. Some elements, such as SVG, are omitted from the tree, unless they’re given an explicit role.

When something important happens, e.g. the display of new content, it’s up to the application to post an event notification to the platform API. Assistive tech (AT) registers what type of events it wants to listen for. The same responsibilities can work the other way too, where AT sends actions to the application. Passing messages between running applications is called inter-process communication (IPC).

Part 2: The Road to Good Intentions Is Paved with Hell

This article introduces the off-screen model (OSM), which is the idea of intercepting low-level drawing instructions from applications (and the operating system itself) “in order to build a database with which the screen reader could interact as if it were text mode”.

The first OSM program was OutSpoken for Mac, released in 1989 as a kind of “virtual white cane”. It used the numeric keypad to emulate a mouse to visualise and explore the screen layout. Mac’s next AT was VoiceOver, in 2005.

Meanwhile a number of mostly short-lived screenreaders were created for Windows. “Microsoft created the IAccessible specification, which put in place the now-familiar tree structure of semantic information about UI objects”.

Neill dives further into the inner workings of OSMs. Here’s a taster:

The work of an OSM is extremely complex. It starts with reading Strings of ASCII characters from text-drawing functions. The OSM also needs to keep track of the bitmap in order to insert text at the right place in the OSM when, for example, typed characters are being drawn on a line one instruction at a time as part of the same word. It has to keep track of what’s visible or not, too, such as when areas on the bitmap are transferred off screen and replaced in order to convey the illusion of 3D pull-down menus or layered windows

Evidently, this model wasn’t sustainable. “Developers had to slowly add special code for each application that didn’t exclusively use standard UI elements”. Misrecognitions, memory leaks and accumulated garbage were an issue.

Part 3: Your Browser May Be Having a Secret Relationship with a Screen Reader

Windows screen readers very early hit upon a terrific strategy for highly-efficient web page review that has endured, largely unchanged, for over twenty years. It centers around creating a buffered copy of a web page that the user can review like a standard text document.

Or, put a different way: “the one thing that screen readers no longer do is read the screen”.

Screen readers have two modes. “Browse mode” allows users to jump along headings, lists, tables and so on, and also to use virtual cursor movement (not unlike the “caret browsing mode” that is built into major browsers, and which I hadn’t heard of until today!). To interact with most controls on the page, screen reader users switch to something historically known as “forms mode”.

“Screen reader access to page semantics came all at once with Internet Explorer 5’s Microsoft Active Accessibility (MSAA) support”. MSAA later lacked the vocabulary for all the new control types being added into HTML: this is where ARIA comes in.

MSAA also lacked dynamic change support, for anything that happened after the initial page load. “One work-around was the introduction of a screen reader hotkey to refresh the virtual buffer” – this only worked intermittently.

Microsoft introduced its newer UIA accessibility API in 2006 with Windows Vista. In late 2006, the IAccessible2 API arrived, a platform-independent open standard developed by IBM, working closely with screen reader developers and corporate stakeholders. Unlike UIA, IAccessible2 extended MSAA’s IAccessible code library to fix its shortcomings. Firefox quickly implemented it alongside its existing MSAA support, while Google Chrome followed suit in the early 2010s. Meanwhile, Internet Explorer would ultimately rely on a scaled-down version of UIA. IAccessible2, which is what JAWS and NVDA use today, is not a Windows platform API: the libraries are part of the browser.

IPC (from part 1) is the secure, reliable means of handing info back and forth between applications through an operating system API. Low-level hooks, on the other hand, are effectively ‘code injection’, insofar as AT has forced some of its code to run inside the other application’s space. ATs are basically the only non malicious programs that use this technique.

IAccessible2 allowed screen reader developers “direct access to the browser’s API implementation” using low level hooks:

When a web page loads, JAWS and NVDA need to go through every element on the page to create a virtual buffer. If they were to use IAccessible2 only through IPC, then they’d have to send many, many messages back and forth between the screen reader and browser processes; and, even as fast as computers are, that’s relatively slow. But with code injection, some of the screen reader’s code can run directly inside the browser, gather all the information it needs for a virtual buffer (which requires some complex logic specific to the screen reader), then communicate back to the main screen reader process at the end.

However, “Apple and Google operating systems don’t allow code injection. Windows-based Firefox and Chrome increasingly keep their doors locked while continuing to give assistive technology a pass. [Code injection’s] days are numbered.”. There is little incentive for screen reader developers to migrate all of their code from low level hooks to IPCs, especially as this can cause significant slowdown. Neill suggests the developers may need help from browser developers or Microsoft.

As for the current state of play, taken more or less verbatim from the article:

Windows screen readers rely on MSAA, as well as a few other Windows APIs, in older areas of Windows like the desktop and taskbar, while UI Automation provides access to components added since Windows 8.
JAWS and NVDA use IAccessible2 in Chrome, Firefox, and Chromium-based Edge. They additionally use ISimpleDOM when they need information not able to be plucked from the accessibility tree. These are code libraries incorporated into the browsers, not Windows.
Both Firefox and Chrome have more or less ignored UI Automation for all this time. The Edge accessibility team have contributed their UIA implementation to Chromium, but it’s still not turned on by default in Chrome.
Microsoft incorporated a bridge that allows ATs that rely on UIA in web browsers (Narrator) to communicate with applications that use IAccessible2 (Chrome and Firefox). This bridge continues to interact with ATs solely through IPC but injects its code into the browser whenever possible for the performance boost. This is what’s happening under the hood when using Narrator in those browsers. On the other hand, Narrator predictably uses UIA in Microsoft Edge.

Neill concludes “Mac, IOS, and Android all implement their platform APIs throughout their systems, including third-party browsers. If VoiceOver began to support IAccessible2 or UIA, other Mac and IOS browsers would be ready. What seems likely is that Windows will sooner or later fall in line with other operating systems by shutting down third-party code injection. Screen reader developers will then be forced to undertake the [work to replace hooks with IPCs], and everyone will indeed use the Windows platform API, the performance of which will by then very likely be up to the task”.

Prefer longer newsletters? You can subscribe to week11y, fortnight11y or even month11y updates! Every newsletter gets the same content; it is your choice to have short, regular emails or longer, less frequent ones. Curated with ♥ by developer @ChrisBAshton.