Inside DictationBridge

For many years, I’ve been the lead programmer for Serotek, makers of the System Access screen reader among other products. Now I’m the lead programmer for the DictationBridge project. So far, I’ve kept a low public profile. But there has been some confusion about Serotek’s role in this project, the licensing status of the code, and the limitations of the software, particularly when used in conjunction with JAWS. So I’d like to clear these things up.

The roots of this project go back nearly 9 years. In the summer of 2007, we at Serotek wanted to enhance our System Access screen reader to take full advantage of the features of the then-new Windows Vista operating system. We were particularly excited about the speech recognition capabilities that were built into Windows for the first time with the release of Vista. We realized that speech recognition would be particularly useful for people with repetitive strain injury (RSI). Here was an opportunity to provide an alternative to expensive dictation products, using a feature that was built into the operating system itself. How could we resist that?

The trouble was that if speech recognition, particularly dictation, is going to be useful to a blind user, the dictated text needs to be echoed back, so the user can verify that the computer interpreted the input correctly. But the programmatic accessibility interfaces for Windows, such as Microsoft Active Accessibility (MSAA) and UI Automation (UIA), don’t provide a standard way for assistive technologies to detect when text is entered through an alternative input method such as speech recognition, thus it’s invisible to a screen reader via these standard techniques.

So, once the project got the green light, I rolled up my sleeves and looked for a way to detect the text coming from the Windows speech recognition subsystem. I found a solution using a technique called API hooking. API hooking basically means that the assistive technology product intercepts any attempt to invoke certain functions in the operating system, and does something with the information that it can gain from those intercepted calls. In this case, I figured out which OS functions were being used by the speech recognition subsystem to insert dictated text into an application, and wrote code to intercept calls to those functions, so System Access could get the text and read it back.

API hooking is a tricky and error-prone technique, and very few software developers have experience with it, but it’s indispensible for assistive technology on Windows. To the best of my knowledge, there is no other way that we could have implemented robust support for Windows’s built-in speech recognition, even today. API hooking is also essential in implementing support for the Dragon line of products from Nuance.

Because I had over two years of experience with API hooking when I started working on speech recognition support in 2007, I was able to get the code written in a reasonable amount of time. That code has been working reliably ever since, on every version of Windows from Vista through Windows 10. By paying Serotek for this code, the DictationBridge project will be able to leverage a battle-tested solution to a difficult problem. Without paying Serotek, the DictationBridge project would need to find someone who could re-implement something equivalent from scratch, a cost far greater than the fee we’re paying for the license. As I said, very few developers have experience using API hooking in the context of assistive technology, and I’m one of them. But because of my non-compete agreement with Serotek, I would not be able to work on this project, legally or ethically, unless Serotek is paid for the work I’ve already done. Serotek paid me for the time I took to write and maintain the original code, it’s only fair that they are compensated for allowing their software to be released for free and as open source.

It’s also important to emphasize that most of the funds we’re raising for DictationBridge will go toward new work, because there’s still a lot of work that needs to be done to make DictationBridge a success. Specifically:

  • The code that we’re licensing from Serotek currently depends on a proprietary, third-party component that Serotek purchased. We need to replace that component with an open-source equivalent.
  • While support for Windows Speech Recognition is mature and robust, support for the Dragon family of products from Nuance is currently at the prototype stage. A significant portion of my time on this project will be spent fleshing out and debugging that code, including more low-level API hooking.
  • We need to write scripts for NVDA, JAWS, Window-Eyes, ZoomText Fusion and Dragon Professional to make the Dragon and Windows Speech Recognition user interfaces fully usable with each of these screen readers. We will be paying other developers to help write these scripts.
  • We need to write end-user documentation, including instructions specific to each supported screen reader. We’ll also need to produce an audio demo of the finished product and we need to pay the people working on these tasks.
  • To ensure that users can get a useful level of technical support when they need it, we’ll need to hire and train the support staff, a process already underway.
  • Finally, there are incidental expenses, such as paying for Dragon licenses and administrative work.

So there’s a lot more to this project than the existing code I wrote for Serotek.

Even though this code is currently proprietary to Serotek, it will ultimately be included in the DictationBridge open-source release. Specifically, all DictationBridge code will be released under the Mozilla Public License, version 2. The MPL is compatible with the GNU General Public License (GPL), the license used by NVDA. But in contrast with the GPL, code released under the MPL can be mixed with proprietary code in the same program. This makes the MPL a better fit for an add-on package like DictationBridge, which will support commercial assistive technologies such as JAWS, Window-Eyes, and ZoomText Fusion, as well as the free NVDA.

Finally, some have asked about the features and limitations of DictationBridge when used in conjunction with JAWS. The main difficulty in working with JAWS is that, due to limitations of the JAWS scripting facilities, it is not possible to add new system-wide scripts without modifying the default scripts that ship with JAWS. If we modify those default scripts, then we have to re-do those modifications for every new version of JAWS. Unless the DictationBridge project receives ongoing funding to support JAWS, this kind of ongoing work is not feasible.

Therefore, DictationBridge will work with JAWS to the extent that we can make it work without modifying the default scripts. Specifically, dictated text will be echoed back, for both Windows Speech Recognition and Dragon NaturallySpeaking. The Dragon user interface for making corrections to the dictated text will be accessible. But access to the equivalent UI for Windows Speech Recognition, if we can make it work at all, will be more limited with JAWS than with the other screen readers. Of course, since all of DictationBridge will be open source, other developers are always free to implement more extensive integration with JAWS if they wish. We would be especially happy if Freedom Scientific itself chose to integrate DictationBridge into JAWS; there would be nothing to prevent that from happening if Freedom Scientific were so inclined.

In closing, I believe that a crowdfunded project like DictationBridge is the best way to develop assistive technology in the open and make it freely available to everyone who can benefit from it. But to make it work, we need the support of the whole community. If everyone contributes, it only takes a little from each person. So let’s all pitch in and make this dream a reality.

You can make a contribution on our Indie Go-Go crowdfunding page and I urge you to do so.