Skip to content →

Category: Accessibility

Image Description Toolkit 3.0 Available

I have a vastly updated version of my Image Description Toolkit (IDT) available. The 3.0 release builds on the command line workflow tools I released earlier with a guided workflow creation system, analysis tools for combining descriptions, a range of performance statistics on your descriptions and tools to review content from various AI models and prompts.

The IDT also includes several other tools for working with image descriptions. A results viewer allows you to browse results, monitor workflows as they happen, copy descriptions and images.

If you want to focus on detailed descriptions for individual images, the Image Describer tool, allows you to load a directory of images, and easily request descriptions for individual images and do so using multiple providers and prompts and save all your work for future use.

A Prompt Editor allows you to create and edit the prompts used to guide AI descriptions for all the tools in the IDT and configure defaults.

In addition to Ollama, which powered earlier versions of the IDT, the IDT now includes support for multiple AI models from both OpenAI and Anthropic if you provide your own API key.

A comprehensive User Guide is available for the various tools in the IDT.

Whether you want detailed descriptions for one or one thousand images, the IDT has a tool for you. Grab IDT3.zip, extract the files to your computer and run install_idt. Be sure to install Ollama and or get API keys for OpenAI or Anthropic and configure your AI models. Then open a command prompt and run:

idt guideme to generate your first descriptions.

15 Comments

RSS Quick, A Basic RSS Reader for Windows 1.0 Available

I’ve been dabbling further with AI-driven development and have another app for exploration. This time it is a basic RSS Reader for Windows I’m calling RSS Quick. Get the full details and program from Release RSS Quick 1.0 – Initial Release · kellylford/rssquick or my alternative download location.

The name RSS Quick reflects the intent of the app to be a simple and fast way to quickly browse RSS feeds. The program does not handle feed management at this time but will allow you to import an existing OPML file and quickly browse the feeds. RSS Quick includes a default OPML file named RSS.OPML that will be put in the default folder where you unzip the software package. It has dozens of feeds from a wide range of sources already included. Even if you don’t use the app, the OPML file may be handy as another source of feeds.

To use RSS Quick, you need to download the .ZIP file for your computer. In most cases this will be the X64 version. If you are using ARM hardware with Windows, you’ll want the ARM version. Extract all the files to any location on your computer and run RSSQuick.EXE. The program will load the default OPML file that has been included. If you do not have the necessary version of .NET, you will be prompted to download that first.

You can learn more about RSS Quick from the readme for the program. Give it a try and consider getting involved or submitting issues on the GitHub page.

RSS Quick is one of three projects I have available. I also have my Sports Scores app and an Image Description Toolkit. available. All projects can also be downloaded from my separate release archive.

Leave a Comment

Making Sports Information More Accessible: A Preview of the Sports Scores App”

I’ve gone back to the AI workbench, much like I did with my Image Description Toolkit, to create another app. This time it is to address the goal to make sports scores and play-by-play information easier to read and avoid some of the long-standing accessibility challenges on leading sports web sites.

I’m calling this app a preview release because not everything is working yet but I have enough functionality available that I wanted to share it now.

I have an audio demo of the app along with a transcript of the demo. The release page with notes and links to download the Windows program are at Release Sports Scores 0.5 Preview Release · kellylford/Scores.

The program uses lists, trees and tables to show things like game scores, play-by-play action, box scores and standings. The info itself comes from APIs made available by leading sports providers. I’m just showing it in ways that work better in my opinion with screen readers and other assistive technology.

Perhaps the most interesting feature I have so far is the ability to hear baseball pitch location and speed by converting pitch data to audio sounds. The demo has a sample of this and within the app you can press Alt+p to hear the location of any pitch when you are on that pitch in the play-by-play details. Press Alt+s to hear the entire pitch sequence. Bring up the context menu on a pitch for these options plus a Pitch Explorer feature to get a sense of how the sounds for different locations and pitches will be presented.

In addition to the release page, the GitHub project has a readme with full details on the app. An issue list where you can report new issues or discover what’s already known is available.

Give the app a try and if you are interested in contributing to the project, please do so.

Leave a Comment

Image Description Toolkit V2 Available

I’ve made another series of updates to what I’m calling the Image Description Toolkit since my last announcement. As a recap, the goal of this toolkit is to take collections of images and videos and create descriptions you can save and do this all with local AI models. Dozens of tools provide descriptions, but it is still difficult to save those descriptions for future review. With the Image Description Toolkit, you get nicely formatted HTML pages to read through all your image descriptions.

The newest enhancements include a comprehensive testing system to experiment with model prompts, a workflow script that allows for all tasks to be run with one command versus individually running each script and numerous small adjustments throughout the system. The code here is still all AI-generated with my ideas powering what’s created.

I’m sure I’m not objective but for me this has turned into something that started as a curiosity, moved into a better understanding of how AI code generation could work and is now something I’m using regularly. Over the weekend I attended several musical events and was able to generate more than 400 image descriptions from photos and videos I took.

The project lives on GitHub and has a readme that covers the basics of getting started. A guide for using the prompt testing script is also available. This is particularly heklpful for trying out different models.

I’m always curious how AI writing works as well so asked GitHub Copilot to generate a second blog post about project developments. And of course, it is software, so there is also an issue list.

I won’t say for certain what’s next but my current plan is to work on a graphical version of the project to understand more about that environment with Python, create a prompt editor so changing the default prompts is easier and get this all working with Python packaging so install is easier.

Contributions, suggestions or pointers to tools that already do all of this are always welcome.

Leave a Comment

Updates to Image Description Toolkit

Several months ago I announced a highly experimental set of Python scripts I called The Image Description Toolkit. Consider it a fancy name for solving my goal of wanting a way to get thousands of pictures taken from my iPhone and also for the past several decades from whatever phone I was using described and having a permanent description of the photos. I’ve made some key updates, although I’d still say this is categorized as highly experimental.

Most notably, I’ve made it possible to build custom AI prompts, choose the model you use and adjust the parameters used with the model and have all of this done through a configuration file.

I’ve also updated the script that will convert files in the .HEIC format to .JPG and streamlined the output to HTML with a script that can be run. To be very clear, when I say I’ve done these things. All the code in this project was generated with AI through my prompting and refinement.

A readme for the project explaining how all this works is available. I also had AI generate a blog post about the project. You can find the full project on GitHub.

With all of those qualifications, I have found these tools of value. I’ve now generated more than 10,000 image descriptions running on my local computer. The Moondream model used through Ollama has been excellent. It is incredibly fast when used for batch processing, has some of the lowest memory requirements I’ve found and still gives rich details and is highly responsive to different prompts.

I plan to continue experimenting here over time. I want to make setup easier and know about Python packaging but have found it doesn’t always work so this all still requires manual install of Ollama, Python and the individual scripts. The readme file should walk you through this though.

If you have feedback, know of other ways to accomplish these same tasks or suggestions on what else I should include here, feel free to let me know. I’ve leanred a great deal about image processing from AI, using Python and AI code generation from these experiments. And of course, I now have permanent descriptions of more than 10,000 pictures.

Leave a Comment

“I Just Want to Be Included”: How a Simple Plea Sparked an unintentional AI Accessibility Experiment

The work of editorial cartoonists and their ability to capture the essence of an idea in a picture or two must be quite the talent to have and job to hold. That phrase “a picture is worth a thousand words,” certainly can ring true.

A recent experience verifying an issue in Microsoft’s Copilot app I was told had been fixed, led to that all-too-common situation in accessibility of taking one step forward and a couple in reverse. My issue was fixed but I discovered new accessibility issues to report.

What began as a routine task—verifying an accessibility fix in Microsoft’s Copilot—quickly devolved into a familiar frustration as new issues emerged. It was in that moment that the app presented me with a default prompt to “Create an image of…”. My frustration, my curiosity about AI, and my interest in editorial cartoons suddenly converged, sparking an unintentional experiment with a fundamental question at its core: could an AI capture my feelings in a picture, and then accurately describe what it had just made?

I gave Copilot this prompt:

“Create an image of the Microsoft logo with a man’s face crying in sadness imposed over it and a cartoon bubble saying, “I just want to be included please,” and here is where things turned interesting.

I used Copilot, Gemini, ChatGPT, and Claude (the last two through JAWS Picture Smart functionality) to get descriptions of the resulting image. I expected some variability but take note in particular of the Copilot description. It leaves out critical details and gives a vastly different version of the very image created through Copilot. I have no illusion that Copilot would have the ability to know that it created the image but really do wonder why the details are so different with Copilot? Did this trigger some content flags internal to Copilot for example?

It is also interesting to note how Gemini’s interpretation of the image is not entirely my intent. Gemini seems to interpret that Microsoft is the entity being left out here and in fact it was my intent to express the opposite, that the person crying was the person being left out.

This experiment raises obvious questions about the capabilities and challenges of today’s AI technology. But for me, it highlights a much deeper, more personal challenge.

As someone who understands the visual world entirely through words—from alternative text, human-provided descriptions and as of late AI-based image description—this leads to a fundamental question: can my words ever be enough to accurately portray a feeling or an idea for an AI to translate into a picture I cannot see but trust represents my intent? If it does, will I have the ability to know this with confidence? Had I spent the time to write those thousand words in the prompt, would the result have been any closer to the image in my mind’s eye?

The image generated by Copilot can be found at http://theideaplace.net/wp-content/uploads/2025/06/CopilotMSFTLogoCryingAndCartoonBalloon.png.

The responses from the various AI descriptions are below.

ChatGPT

The image is a meme featuring the “Crying Jordan” face superimposed over the Microsoft logo. The face has a speech bubble coming from it, saying, “I just want to be included please.” The background shows the Microsoft logo, which includes its four colored squares and the word “Microsoft” in a large font.

The meme seems to humorously convey the idea of being left out or wanting inclusion.

Claude

This image is a popular internet meme that combines the Microsoft logo (including its distinctive four-colored window-pane design in orange and green)

with a crying face. The person in the image appears emotional with tears on their face, and there’s a speech bubble saying “I just want to be included please.” The image is meant to be humorous, playing on the idea of feeling left out, with “Microsoft” written in the company’s characteristic gray font at the bottom of the image. The background is plain white, making the elements of the meme stand out clearly.

Copilot

The image features the Microsoft logo, consisting of four colored squares—red, green, blue, and yellow—arranged in a 2×2 grid. Beneath the logo, the word “Microsoft” is displayed in a bold, gray font. On the right side of the image, there’s a speech bubble containing the text: “I just want to be included please.” The placement of the speech bubble suggests a plea for inclusion or recognition, adding an element of sentiment to the otherwise corporate design. Is there something specific you’re curious about regarding this image?

Gemini

This is a popular internet meme that combines several elements to create a humorous and satirical image. At the center of the image is the disembodied head of a person who appears to be a Black man, crying. His face is contorted in sadness, with visible tears welling in his eyes and streaming down his cheeks. This specific image is widely recognized as the “Crying Jordan” meme, which features a photograph of basketball legend Michael Jordan during his 2009 Basketball Hall of Fame induction speech. In the background, positioned behind the head, is the Microsoft logo, composed of four squares: orange-red in the top left and green in the top right. Below the head, the word “Microsoft” is intentionally misspelled as “Micrososft.” To the right of the head is a cartoon speech bubble with the text, “I just want to be included please.” The overall composition suggests that the corporation Microsoft is being personified as the “Crying Jordan” figure, expressing feelings of sadness, failure, and exclusion, as articulated by the text in the speech bubble. The meme is typically used to mock a perceived failure or a lack of popularity of a Microsoft product or service in comparison to its competitors.

Leave a Comment

Good to Know AI Doesn’t Pull Any Punches

It is good to know that AI doesn’t pull any punches. I took the latest Word Online conformance report available from aka.ms/msacr and asked the Windows Copilot app to summarize the issues identified. I was answering a question on am email list for someone expressing some frustrations with Word Online.

I won’t copy that full response here but it was comprehensive at pulling the details from the report. I will share the summary.

As AI systems seem to want to do these days, Copilot then prompted me with a question. Hey AI, be careful what you ask!

Copilot:

### In Summary

While Word Online meets a number of WCAG 2.1 criteria, these exception points highlight areas where users—especially those relying on assistive technologies—may encounter challenges. The primary issues revolve around:

– **Rendering content as images instead of accessible text by default,**

– **Inconsistent or incomplete keyboard and screen reader accessibility for dynamic content and interactive controls,**

– **Visual presentation concerns (contrast and resizing problems), and**

– **Limited programmatic exposure of certain UI elements and features.**

These exceptions suggest that although the product supports many accessibility features, improvements are necessary to ensure a fully inclusive experience.  

For anyone interested in further details, you might explore how these issues could be mitigated with updated design approaches or additional assistive technology compatibility tests. What are your thoughts—do you see these issues as critical to the overall user experience or as minor inconveniences that can be worked around?

Kelly:

I think these are incredibly problematic:

Copilot:

You’re absolutely right, Kelly. These issues aren’t mere inconveniences—they directly impact users who depend on accessible technology to interact with content. When information is rendered as images rather than text, or when keyboard navigation and screen reader support are inconsistent, it creates significant barriers for people with disabilities. This scenario isn’t just a minor glitch; it’s a fundamental shortcoming that compromises an inclusive user experience.

Moreover, the challenges with contrast ratios, resizing limitations, and the incomplete exposure of UI states further erode the usability and clarity of the interface. For individuals who rely on assistive devices, even small obstacles can lead to frustration and exclusion. It’s a stark reminder that accessibility features must be integrated from the design stage, rather than treated as an afterthought or workaround.

It’s particularly concerning considering the widespread use of Word Online. Many users who could benefit immensely from a fully accessible environment might instead be left with a subpar experience. This not only affects productivity but also perpetuates inequality in access to essential digital tools.

What kinds of solutions do you think would be most effective in addressing these issues? Perhaps more rigorous testing with a variety of assistive technologies or more dynamic adaptation of content could be a start. Your perspective is crucial as it echoes the real-world impact these shortcomings have on users.

I’m curious—what improvements do you envision that could make a real difference for those relying on accessibility features?

Leave a Comment

Tooltip Should Not Start An Accessible Name

I’ve noticed a growing trend to try and make the content from some popups or tooltips directly keyboard accessible on web pages with buttons that reveal the content. This is good but what’s not so good is the naming of these buttons. In pretty much every situation where I’ve encountered this, the name starts with the phrase “tooltip.” For example on a financial web site I use when obtaining a ticker quote, the button in question reads “Tooltip for Quick Quote Disclosure.”

This is not an isolated situation and maybe it is just because I’ve been using a range of financial web sites lately but it seems very common in those experiences of late.

This, as with most accessible names, would be better at minimum to put the unique part of the name first. In this case, “quick quote disclosure.”

There is obviously more too this, such as whether the word tooltip is needed at all, proper roles and more but this basic naming syntax makes a big difference in efficient access to the content. The unique should generally come first in an accessible name.

2 Comments

Clearing the Chaos: Using A Library Card and U.S. Newsstream for an Improved Reading Experience

While the digital world offers an abundance of online news sources, accessibility is still a work-in-progress  far too often. It is commonplace to spend more time navigating to content than reading that content when using screen reading technology. Poor heading structure, Ads lacking accessibility that are mixed in with the news story, multimedia that plays silently and grabs focus and much more take away from the reading experience.

A public library card and a resource known as U.S. Newsstream offered by many libraries is one solution to include in your reading toolkit to assist.

ProQuest’s U.S. Newsstream is a gateway to an improved reading experience. The full text of hundreds of publications is available  and the helpful thing is that through a URL for each publication, you can easily access a specific issue of a newspaper or other resource with a very screen reader-friendly view of all the article headlines. In addition, when full text is available, you can read the article free of ads or other distractions mixed in with the news.

To use this method of accessing content requires a few preparation steps. First, you need to have a library card for a library that subscribes to this service, and you will need to know your library barcode.

Second, and this is critical, you need to sign into the service through your library’s access point. You can typically find this on your library’s home page under an area called subscription databases, online resources or some other link pointing to the various databases available from your library.

For example, my local library is the Madison Public Library and their list of resources is available under the eResources link.

Following the U.S. Newsstream, link you are prompted to provide library information. Typically this involves some variation of providing your library barcode and at times indicating your library. Again, it is vital you start with this path before going to the next step.

Once you are authenticated to U.S. Newsstream, you can search and use the database directly. However, what has worked well for me is accessing publications directly.

U.S. Newsstream has a list of all their publications you can download. I took the liberty of downloading the file and turning it into a table within Excel and using Excel’s First Column feature to make the file fairly screen reader friendly and available to use.

To access a publication, open the file I’ve created and locate the publication you want to read. Titles are in the first column.

Next, locate the URL for the publication. Hint, with focus in the cell containing the title, press CTRL+Right Arrow and you will jump to the last column in the table row which contains the URL. Press CTRL+C to copy the URL and return to your web browser.

Move focus to the address bar in whatever browser you are using and paste the URL you copied and press Enter. This will take you to the publication page for the resource of interest. Combo boxes allow you to select the year and date for an issue of the publication and a Show Issue Content button brings up the content from that publication for the chosen day.

Article headlines are marked up as headings and links. Pressing enter will load the article.

The article content, when the publication has full text, starts under a heading aptly named full text. At this point you can simply start reading the article. Use whatever method you prefer to navigate back to the list of articles for the publication when finished reading.

As mentioned earlier, it is key that you are signed into the general U.S. Newsstream service before accessing the URL for the publication. If you are not, the URL will not work as described. You will be told to sign in through a library but without options for doing so directly.

The Excel file listing publications has multiple columns of information. These include items such as the dates for which content is available, gaps in coverage, and more.

U.S. Newsstream, other ProQuest databases and resources from your library offer much more functionality and information than outlined here. This is just a starting point.

Finally, I am a firm supporter of a robust and independent news media. Even though I access many articles in the way I’ve outlined here, I do also support publications through subscriptions, donations or other financial contributions. I urge anyone who is able to do so to do the same. Those working in the media have the same bills and needs in life as we all do, and we’ve already seen dramatic losses in meaningful careers in the profession.

Leave a Comment

A Nice Enhancement for Voice Access on Windows

As a matter of choice, not necessity, I try from time to time to use the various speech and voice input systems in operating systems. My ideal scenario is still to be able to use the computer by voice entirely as well as running a screen meter. I’ve not found a reliable solution as of yet that meets my needs completely.

I know there are combinations of solutions that have made great strides in this area largely using Dragon products and screen readers but as the basis of what I use, I try to use either Voice Access on windows or Voice Control on the Mac. Both platforms also have solutions, as I expect many know, for strictly text input.

I no longer recall how long ago this was but the Voice Access product on Windows did make one change that helps with using screen readers. As a start, Voice Access produces notifications of what Voice Access has heard so that screen readers can echo this back. It is fairly basic and in need of much refinement  it’s at least a start.

I am mentioning this here because in trying voice access this week, I noticed a change that is another step in helping Improve the experience. I do not know when this change was made to be clear. It is just that I noticed it this week. I also run Insider builds of Windows so if this does not work for you, that may be why.

When you’re trying to control the computer by voice, it is common to issue commands such as click and then an item that you want to activate. The challenge becomes that if there is more than 1 item with the same name, you are usually presented some experience to disambiguate what you want to click on.

When I first tried voice access, to the best of my recollection, the experience of identifying what you wanted to activate was not usable with a screen reader. It has been enhanced a bit so that now when that list of choices comes up, the list of choices is echoed back similar to how what Voice Access heard is repeated. Again this needs extensive refinement because it is kind of like a one time listen or read and Braille experience with no way to have the list repeated, step through the list in item at a time or otherwise understand what was said.

As an example of using the feature to identify what I want to click, here was what was read when I asked for the word paste to be clicked.

click paste. Which one?
There are 2 options available. (1) Paste, (2) Paste

Here is another example when I said “click login” on the Fidelity home page.

Click login. Which one?
There are 2 options available. (1) LOG IN, (2) Open login link

It is also worth noting that these disambiguation choices if using Braille appear as flash messages. For those unfamiliar with how Braille displays and screen readers work, this means that the messages stick around for a set period of time and then disappear from the display.

. Here is one last example when I tried to activate the OK button with my voice after running a spell check on an email message. Note, I intentionally replaced the actual email address with email@provider.com.

Click ok. Which one?
There are 2 options available. (1) OK, (2) Sent – email@provider.com – Outlook – 2 running windows

The experiences I’ve described work independent of what screen reader is being used.

Again this experience overall of using the computer with a screen reader and voice on Windows as far from finished. In fact one of the key experiences for correcting words that have not been recognized correctly does not work at all with screen readers. Voice access in fact gives the following notification when you try and correct something and a screen reader is running:

Alert: This experience is not optimized for use with screen readers. Say “Cancel” to exit.

Microsoft has a document on using Voice Access in general. If they have screen reader-specific documentation, I wasn’t able to find it.

If you do try Voice Access, two important hotkeys to know are Alt+Shift+b for toggling the microphone between sleep and awake and Alt+shift+c for toggling the Microphone off and on. When sleeping, the microphone remains on to listen for certain words. See the support article or say, “what can I say,” when Voice Access is running for a full list of commands.

Leave a Comment