Skip to content →

Category: Accessibility

Making Sports Information More Accessible: A Preview of the Sports Scores App”

I’ve gone back to the AI workbench, much like I did with my Image Description Toolkit, to create another app. This time it is to address the goal to make sports scores and play-by-play information easier to read and avoid some of the long-standing accessibility challenges on leading sports web sites.

I’m calling this app a preview release because not everything is working yet but I have enough functionality available that I wanted to share it now.

I have an audio demo of the app along with a transcript of the demo. The release page with notes and links to download the Windows program are at Release Sports Scores 0.5 Preview Release · kellylford/Scores.

The program uses lists, trees and tables to show things like game scores, play-by-play action, box scores and standings. The info itself comes from APIs made available by leading sports providers. I’m just showing it in ways that work better in my opinion with screen readers and other assistive technology.

Perhaps the most interesting feature I have so far is the ability to hear baseball pitch location and speed by converting pitch data to audio sounds. The demo has a sample of this and within the app you can press Alt+p to hear the location of any pitch when you are on that pitch in the play-by-play details. Press Alt+s to hear the entire pitch sequence. Bring up the context menu on a pitch for these options plus a Pitch Explorer feature to get a sense of how the sounds for different locations and pitches will be presented.

In addition to the release page, the GitHub project has a readme with full details on the app. An issue list where you can report new issues or discover what’s already known is available.

Give the app a try and if you are interested in contributing to the project, please do so.

Leave a Comment

Image Description Toolkit V2 Available

I’ve made another series of updates to what I’m calling the Image Description Toolkit since my last announcement. As a recap, the goal of this toolkit is to take collections of images and videos and create descriptions you can save and do this all with local AI models. Dozens of tools provide descriptions, but it is still difficult to save those descriptions for future review. With the Image Description Toolkit, you get nicely formatted HTML pages to read through all your image descriptions.

The newest enhancements include a comprehensive testing system to experiment with model prompts, a workflow script that allows for all tasks to be run with one command versus individually running each script and numerous small adjustments throughout the system. The code here is still all AI-generated with my ideas powering what’s created.

I’m sure I’m not objective but for me this has turned into something that started as a curiosity, moved into a better understanding of how AI code generation could work and is now something I’m using regularly. Over the weekend I attended several musical events and was able to generate more than 400 image descriptions from photos and videos I took.

The project lives on GitHub and has a readme that covers the basics of getting started. A guide for using the prompt testing script is also available. This is particularly heklpful for trying out different models.

I’m always curious how AI writing works as well so asked GitHub Copilot to generate a second blog post about project developments. And of course, it is software, so there is also an issue list.

I won’t say for certain what’s next but my current plan is to work on a graphical version of the project to understand more about that environment with Python, create a prompt editor so changing the default prompts is easier and get this all working with Python packaging so install is easier.

Contributions, suggestions or pointers to tools that already do all of this are always welcome.

Leave a Comment

Updates to Image Description Toolkit

Several months ago I announced a highly experimental set of Python scripts I called The Image Description Toolkit. Consider it a fancy name for solving my goal of wanting a way to get thousands of pictures taken from my iPhone and also for the past several decades from whatever phone I was using described and having a permanent description of the photos. I’ve made some key updates, although I’d still say this is categorized as highly experimental.

Most notably, I’ve made it possible to build custom AI prompts, choose the model you use and adjust the parameters used with the model and have all of this done through a configuration file.

I’ve also updated the script that will convert files in the .HEIC format to .JPG and streamlined the output to HTML with a script that can be run. To be very clear, when I say I’ve done these things. All the code in this project was generated with AI through my prompting and refinement.

A readme for the project explaining how all this works is available. I also had AI generate a blog post about the project. You can find the full project on GitHub.

With all of those qualifications, I have found these tools of value. I’ve now generated more than 10,000 image descriptions running on my local computer. The Moondream model used through Ollama has been excellent. It is incredibly fast when used for batch processing, has some of the lowest memory requirements I’ve found and still gives rich details and is highly responsive to different prompts.

I plan to continue experimenting here over time. I want to make setup easier and know about Python packaging but have found it doesn’t always work so this all still requires manual install of Ollama, Python and the individual scripts. The readme file should walk you through this though.

If you have feedback, know of other ways to accomplish these same tasks or suggestions on what else I should include here, feel free to let me know. I’ve leanred a great deal about image processing from AI, using Python and AI code generation from these experiments. And of course, I now have permanent descriptions of more than 10,000 pictures.

Leave a Comment

“I Just Want to Be Included”: How a Simple Plea Sparked an unintentional AI Accessibility Experiment

The work of editorial cartoonists and their ability to capture the essence of an idea in a picture or two must be quite the talent to have and job to hold. That phrase “a picture is worth a thousand words,” certainly can ring true.

A recent experience verifying an issue in Microsoft’s Copilot app I was told had been fixed, led to that all-too-common situation in accessibility of taking one step forward and a couple in reverse. My issue was fixed but I discovered new accessibility issues to report.

What began as a routine task—verifying an accessibility fix in Microsoft’s Copilot—quickly devolved into a familiar frustration as new issues emerged. It was in that moment that the app presented me with a default prompt to “Create an image of…”. My frustration, my curiosity about AI, and my interest in editorial cartoons suddenly converged, sparking an unintentional experiment with a fundamental question at its core: could an AI capture my feelings in a picture, and then accurately describe what it had just made?

I gave Copilot this prompt:

“Create an image of the Microsoft logo with a man’s face crying in sadness imposed over it and a cartoon bubble saying, “I just want to be included please,” and here is where things turned interesting.

I used Copilot, Gemini, ChatGPT, and Claude (the last two through JAWS Picture Smart functionality) to get descriptions of the resulting image. I expected some variability but take note in particular of the Copilot description. It leaves out critical details and gives a vastly different version of the very image created through Copilot. I have no illusion that Copilot would have the ability to know that it created the image but really do wonder why the details are so different with Copilot? Did this trigger some content flags internal to Copilot for example?

It is also interesting to note how Gemini’s interpretation of the image is not entirely my intent. Gemini seems to interpret that Microsoft is the entity being left out here and in fact it was my intent to express the opposite, that the person crying was the person being left out.

This experiment raises obvious questions about the capabilities and challenges of today’s AI technology. But for me, it highlights a much deeper, more personal challenge.

As someone who understands the visual world entirely through words—from alternative text, human-provided descriptions and as of late AI-based image description—this leads to a fundamental question: can my words ever be enough to accurately portray a feeling or an idea for an AI to translate into a picture I cannot see but trust represents my intent? If it does, will I have the ability to know this with confidence? Had I spent the time to write those thousand words in the prompt, would the result have been any closer to the image in my mind’s eye?

The image generated by Copilot can be found at http://theideaplace.net/wp-content/uploads/2025/06/CopilotMSFTLogoCryingAndCartoonBalloon.png.

The responses from the various AI descriptions are below.

ChatGPT

The image is a meme featuring the “Crying Jordan” face superimposed over the Microsoft logo. The face has a speech bubble coming from it, saying, “I just want to be included please.” The background shows the Microsoft logo, which includes its four colored squares and the word “Microsoft” in a large font.

The meme seems to humorously convey the idea of being left out or wanting inclusion.

Claude

This image is a popular internet meme that combines the Microsoft logo (including its distinctive four-colored window-pane design in orange and green)

with a crying face. The person in the image appears emotional with tears on their face, and there’s a speech bubble saying “I just want to be included please.” The image is meant to be humorous, playing on the idea of feeling left out, with “Microsoft” written in the company’s characteristic gray font at the bottom of the image. The background is plain white, making the elements of the meme stand out clearly.

Copilot

The image features the Microsoft logo, consisting of four colored squares—red, green, blue, and yellow—arranged in a 2×2 grid. Beneath the logo, the word “Microsoft” is displayed in a bold, gray font. On the right side of the image, there’s a speech bubble containing the text: “I just want to be included please.” The placement of the speech bubble suggests a plea for inclusion or recognition, adding an element of sentiment to the otherwise corporate design. Is there something specific you’re curious about regarding this image?

Gemini

This is a popular internet meme that combines several elements to create a humorous and satirical image. At the center of the image is the disembodied head of a person who appears to be a Black man, crying. His face is contorted in sadness, with visible tears welling in his eyes and streaming down his cheeks. This specific image is widely recognized as the “Crying Jordan” meme, which features a photograph of basketball legend Michael Jordan during his 2009 Basketball Hall of Fame induction speech. In the background, positioned behind the head, is the Microsoft logo, composed of four squares: orange-red in the top left and green in the top right. Below the head, the word “Microsoft” is intentionally misspelled as “Micrososft.” To the right of the head is a cartoon speech bubble with the text, “I just want to be included please.” The overall composition suggests that the corporation Microsoft is being personified as the “Crying Jordan” figure, expressing feelings of sadness, failure, and exclusion, as articulated by the text in the speech bubble. The meme is typically used to mock a perceived failure or a lack of popularity of a Microsoft product or service in comparison to its competitors.

Leave a Comment

Good to Know AI Doesn’t Pull Any Punches

It is good to know that AI doesn’t pull any punches. I took the latest Word Online conformance report available from aka.ms/msacr and asked the Windows Copilot app to summarize the issues identified. I was answering a question on am email list for someone expressing some frustrations with Word Online.

I won’t copy that full response here but it was comprehensive at pulling the details from the report. I will share the summary.

As AI systems seem to want to do these days, Copilot then prompted me with a question. Hey AI, be careful what you ask!

Copilot:

### In Summary

While Word Online meets a number of WCAG 2.1 criteria, these exception points highlight areas where users—especially those relying on assistive technologies—may encounter challenges. The primary issues revolve around:

– **Rendering content as images instead of accessible text by default,**

– **Inconsistent or incomplete keyboard and screen reader accessibility for dynamic content and interactive controls,**

– **Visual presentation concerns (contrast and resizing problems), and**

– **Limited programmatic exposure of certain UI elements and features.**

These exceptions suggest that although the product supports many accessibility features, improvements are necessary to ensure a fully inclusive experience.  

For anyone interested in further details, you might explore how these issues could be mitigated with updated design approaches or additional assistive technology compatibility tests. What are your thoughts—do you see these issues as critical to the overall user experience or as minor inconveniences that can be worked around?

Kelly:

I think these are incredibly problematic:

Copilot:

You’re absolutely right, Kelly. These issues aren’t mere inconveniences—they directly impact users who depend on accessible technology to interact with content. When information is rendered as images rather than text, or when keyboard navigation and screen reader support are inconsistent, it creates significant barriers for people with disabilities. This scenario isn’t just a minor glitch; it’s a fundamental shortcoming that compromises an inclusive user experience.

Moreover, the challenges with contrast ratios, resizing limitations, and the incomplete exposure of UI states further erode the usability and clarity of the interface. For individuals who rely on assistive devices, even small obstacles can lead to frustration and exclusion. It’s a stark reminder that accessibility features must be integrated from the design stage, rather than treated as an afterthought or workaround.

It’s particularly concerning considering the widespread use of Word Online. Many users who could benefit immensely from a fully accessible environment might instead be left with a subpar experience. This not only affects productivity but also perpetuates inequality in access to essential digital tools.

What kinds of solutions do you think would be most effective in addressing these issues? Perhaps more rigorous testing with a variety of assistive technologies or more dynamic adaptation of content could be a start. Your perspective is crucial as it echoes the real-world impact these shortcomings have on users.

I’m curious—what improvements do you envision that could make a real difference for those relying on accessibility features?

Leave a Comment

Tooltip Should Not Start An Accessible Name

I’ve noticed a growing trend to try and make the content from some popups or tooltips directly keyboard accessible on web pages with buttons that reveal the content. This is good but what’s not so good is the naming of these buttons. In pretty much every situation where I’ve encountered this, the name starts with the phrase “tooltip.” For example on a financial web site I use when obtaining a ticker quote, the button in question reads “Tooltip for Quick Quote Disclosure.”

This is not an isolated situation and maybe it is just because I’ve been using a range of financial web sites lately but it seems very common in those experiences of late.

This, as with most accessible names, would be better at minimum to put the unique part of the name first. In this case, “quick quote disclosure.”

There is obviously more too this, such as whether the word tooltip is needed at all, proper roles and more but this basic naming syntax makes a big difference in efficient access to the content. The unique should generally come first in an accessible name.

2 Comments

Clearing the Chaos: Using A Library Card and U.S. Newsstream for an Improved Reading Experience

While the digital world offers an abundance of online news sources, accessibility is still a work-in-progress  far too often. It is commonplace to spend more time navigating to content than reading that content when using screen reading technology. Poor heading structure, Ads lacking accessibility that are mixed in with the news story, multimedia that plays silently and grabs focus and much more take away from the reading experience.

A public library card and a resource known as U.S. Newsstream offered by many libraries is one solution to include in your reading toolkit to assist.

ProQuest’s U.S. Newsstream is a gateway to an improved reading experience. The full text of hundreds of publications is available  and the helpful thing is that through a URL for each publication, you can easily access a specific issue of a newspaper or other resource with a very screen reader-friendly view of all the article headlines. In addition, when full text is available, you can read the article free of ads or other distractions mixed in with the news.

To use this method of accessing content requires a few preparation steps. First, you need to have a library card for a library that subscribes to this service, and you will need to know your library barcode.

Second, and this is critical, you need to sign into the service through your library’s access point. You can typically find this on your library’s home page under an area called subscription databases, online resources or some other link pointing to the various databases available from your library.

For example, my local library is the Madison Public Library and their list of resources is available under the eResources link.

Following the U.S. Newsstream, link you are prompted to provide library information. Typically this involves some variation of providing your library barcode and at times indicating your library. Again, it is vital you start with this path before going to the next step.

Once you are authenticated to U.S. Newsstream, you can search and use the database directly. However, what has worked well for me is accessing publications directly.

U.S. Newsstream has a list of all their publications you can download. I took the liberty of downloading the file and turning it into a table within Excel and using Excel’s First Column feature to make the file fairly screen reader friendly and available to use.

To access a publication, open the file I’ve created and locate the publication you want to read. Titles are in the first column.

Next, locate the URL for the publication. Hint, with focus in the cell containing the title, press CTRL+Right Arrow and you will jump to the last column in the table row which contains the URL. Press CTRL+C to copy the URL and return to your web browser.

Move focus to the address bar in whatever browser you are using and paste the URL you copied and press Enter. This will take you to the publication page for the resource of interest. Combo boxes allow you to select the year and date for an issue of the publication and a Show Issue Content button brings up the content from that publication for the chosen day.

Article headlines are marked up as headings and links. Pressing enter will load the article.

The article content, when the publication has full text, starts under a heading aptly named full text. At this point you can simply start reading the article. Use whatever method you prefer to navigate back to the list of articles for the publication when finished reading.

As mentioned earlier, it is key that you are signed into the general U.S. Newsstream service before accessing the URL for the publication. If you are not, the URL will not work as described. You will be told to sign in through a library but without options for doing so directly.

The Excel file listing publications has multiple columns of information. These include items such as the dates for which content is available, gaps in coverage, and more.

U.S. Newsstream, other ProQuest databases and resources from your library offer much more functionality and information than outlined here. This is just a starting point.

Finally, I am a firm supporter of a robust and independent news media. Even though I access many articles in the way I’ve outlined here, I do also support publications through subscriptions, donations or other financial contributions. I urge anyone who is able to do so to do the same. Those working in the media have the same bills and needs in life as we all do, and we’ve already seen dramatic losses in meaningful careers in the profession.

Leave a Comment

A Nice Enhancement for Voice Access on Windows

As a matter of choice, not necessity, I try from time to time to use the various speech and voice input systems in operating systems. My ideal scenario is still to be able to use the computer by voice entirely as well as running a screen meter. I’ve not found a reliable solution as of yet that meets my needs completely.

I know there are combinations of solutions that have made great strides in this area largely using Dragon products and screen readers but as the basis of what I use, I try to use either Voice Access on windows or Voice Control on the Mac. Both platforms also have solutions, as I expect many know, for strictly text input.

I no longer recall how long ago this was but the Voice Access product on Windows did make one change that helps with using screen readers. As a start, Voice Access produces notifications of what Voice Access has heard so that screen readers can echo this back. It is fairly basic and in need of much refinement  it’s at least a start.

I am mentioning this here because in trying voice access this week, I noticed a change that is another step in helping Improve the experience. I do not know when this change was made to be clear. It is just that I noticed it this week. I also run Insider builds of Windows so if this does not work for you, that may be why.

When you’re trying to control the computer by voice, it is common to issue commands such as click and then an item that you want to activate. The challenge becomes that if there is more than 1 item with the same name, you are usually presented some experience to disambiguate what you want to click on.

When I first tried voice access, to the best of my recollection, the experience of identifying what you wanted to activate was not usable with a screen reader. It has been enhanced a bit so that now when that list of choices comes up, the list of choices is echoed back similar to how what Voice Access heard is repeated. Again this needs extensive refinement because it is kind of like a one time listen or read and Braille experience with no way to have the list repeated, step through the list in item at a time or otherwise understand what was said.

As an example of using the feature to identify what I want to click, here was what was read when I asked for the word paste to be clicked.

click paste. Which one?
There are 2 options available. (1) Paste, (2) Paste

Here is another example when I said “click login” on the Fidelity home page.

Click login. Which one?
There are 2 options available. (1) LOG IN, (2) Open login link

It is also worth noting that these disambiguation choices if using Braille appear as flash messages. For those unfamiliar with how Braille displays and screen readers work, this means that the messages stick around for a set period of time and then disappear from the display.

. Here is one last example when I tried to activate the OK button with my voice after running a spell check on an email message. Note, I intentionally replaced the actual email address with email@provider.com.

Click ok. Which one?
There are 2 options available. (1) OK, (2) Sent – email@provider.com – Outlook – 2 running windows

The experiences I’ve described work independent of what screen reader is being used.

Again this experience overall of using the computer with a screen reader and voice on Windows as far from finished. In fact one of the key experiences for correcting words that have not been recognized correctly does not work at all with screen readers. Voice access in fact gives the following notification when you try and correct something and a screen reader is running:

Alert: This experience is not optimized for use with screen readers. Say “Cancel” to exit.

Microsoft has a document on using Voice Access in general. If they have screen reader-specific documentation, I wasn’t able to find it.

If you do try Voice Access, two important hotkeys to know are Alt+Shift+b for toggling the microphone between sleep and awake and Alt+shift+c for toggling the Microphone off and on. When sleeping, the microphone remains on to listen for certain words. See the support article or say, “what can I say,” when Voice Access is running for a full list of commands.

Leave a Comment

No Big Finish, Giving Microsoft’s Recall a Try

Microsoft released the Recall feature for users of Windows Insider builds. I gave the feature a try and made an audio recording with a roughly edited transcript. The short version of my experience is that I can get around Recall but am still trying to figure out how to read search results effectively with a screen reader, hence the title of No Big Finish.

Since this blog post is about an AI feature, I’ll offer an unedited version of Copilot’s summarization of my transcript as well. Overall the summary is reasonable but in my opinion misstates what I said about the Click To Do feature. I said I hadn’t figured out how it is supposed to work.

Perhaps because it was at the end, the summary also missed my biggest point entirely about still trying an effective way to use the search functionality of Recall and reading the meaningful text easily. Here is the Copilot summary:

The document is a transcript of a demonstration by Kelly on the Microsoft feature called “Recall.” This feature captures screenshots of your screen and allows you to search for information within those screenshots. Kelly explains that Recall was released as a test version for Microsoft Insiders on November 22nd. The demonstration focuses on the accessibility aspects of Recall, particularly how it works with screen readers like NVDA, Narrator, and JAWS.

Kelly describes the setup process, which includes signing in with Windows Hello, and demonstrates how to navigate the Recall interface using keyboard commands and screen reader feedback. The document details various functionalities of Recall, such as saving snapshots, opening Recall from the system tray, and navigating through captured images. Kelly also mentions some issues and limitations encountered during the demonstration, such as certain screen reader commands not working as expected and the need for further development to improve accessibility.

Kelly starts by explaining the initial setup, which involves a brief opt-in experience where the user grants permission for the computer to take screenshots. This process takes about 20 to 30 seconds and requires signing in with Windows Hello. Kelly notes that each time Recall is opened or used, the user is prompted to sign in with Windows Hello.

The demonstration continues with Kelly navigating the Recall interface. Kelly shows how to open Recall from the system tray and describes the various options available, such as saving snapshots, opening Recall, pausing the feature, and accessing more settings. Kelly highlights that the timeline feature in Recall displays the captured images, referred to as “memories,” and sessions, which represent the time periods during which the computer was used.

Kelly also points out some issues with the screen reader’s ability to read the titles of the snapshots automatically. Kelly demonstrates how to navigate through the list of snapshots and mentions that sometimes the titles do not read correctly on the first attempt. Kelly suggests using the home and end keys to navigate through the list and get the titles to read properly.

Additionally, Kelly explores the commands available for each snapshot, such as opening the associated application (e.g., Outlook), copying the snapshot, deleting it, and accessing more options. Kelly notes that some features, like the “Click to do” button, are not fully functional yet and require further development.

Kelly also demonstrates how to get more details about a captured image using Narrator’s command for image descriptions. However, the output is often gibberish due to the limitations of the OCR (Optical Character Recognition) technology used. Kelly suggests using other solutions like JAWS, Google’s Gemini, or Be My Eyes for better image descriptions.

Overall, the document provides an insightful look into the Recall feature, its current capabilities, and areas for improvement from an accessibility perspective. Kelly’s demonstration highlights the potential of Recall while also acknowledging the need for further development to enhance its functionality and accessibility.

Leave a Comment

Unlock Windows Efficiency with PowerToys Run

Sometimes you make a handy discovery that makes you wonder why you did not know about it long before. I just had that happen with PowerToys Run, one of the toys included in the full suite of items in the electronic toybox known as Windows PowerToys.

PowerToys are a set of utilities that allow you to customize different aspects of Windows and offer functionality that is not directly built in. You can learn more in general and find details on installation in a handy Microsoft Learn article.

I installed the most recent version of PowerToys because I had read about something new known as Advanced Paste. That is not the discovery though.

After installing PowerToys, I used one of my more common shortcuts in Windows Alt+Space to bring up the system menu for an application. That menu is where you find entries including restore, close and the one I often use, Maximize. My web browsing windows in particular often get sized quite small and with responsive design much of the content I’m expecting has disappeared so maximizing the browser window is necessary.

Imagine my surprise when instead of what I was expecting, my screen reader communicated, “query, edit.”

It turns out this is the default shortcut for the aforementioned PowerToys Run. In short, this is like having a command line to do everything from choosing between open windows on your computer to performing quick calculations, file, and web searches, browsing the Windows registry and more.

Using PowerToys run

Using PowerToys Run is fairly straight forward. Press Alt+Space, enter some text and arrow through the results. You can start your text with various text characters to direct PowerToys Run what you want to do. The full list of text to use here is detailed in a Microsoft Learn article.

Some examples I am finding I have already incorporated into my daily use include:

  • <, that is the less than symbol: Window Walker, to browse through all the open windows on your computer. Start pressing down arrow after entering the less than character, or add part of the window name and then down arrow to shorten the list.
  • $, that is the dollar sign character: Windows Settings to, as you would expect, browse through all the different settings for Windows. As with Window Walker, just start pressing down arrow after entering the dollar sign, or add some of the text from the setting you want and then press down arrow. In either case, when you reach the item you want, press enter and that settings page will open.
  • =, that’s the equals sign: Calculator for performing calculations. Just enter your calculation and the result will be displayed. If, as I do, you are using a screen reader, the fastest way I have found to read the result is to press down arrow after entering my calculation. Note that you do not need to press enter after you have input your calculation. Also, again if using a screen reader, if you are comfortable with advanced screen reading techniques such as the JAWS Touch Cursor, NVDA Object Navigation or Narrator navigation commands, the result of the calculation and more can be read with these techniques. Last, after your result is displayed, you can press enter on the result and have it put on the Windows Clipboard.
  • !!, that is two presses of the exclamation mark key: History, quickly browse through your history with PowerToys Run with presses of down arrow.

Some Notes

PowerToys Run documentation indicates you can change the default shortcut for launching from Alt+Space.

According to PowerToys documentation, pressing tab is supposed to move you through search results and any buttons or context menus that exist. I am not finding anything being communicated by multiple screen readers as of now when using Tab. I still need to figure out if this is a case of the key simply not working or items taking focus not being communicated properly.

Leave a Comment