Skip to content →

Category: Accessibility

Image Description Toolkit 3.5 Beta Featuring Geolocation Data and Web Image Downloads

With more AI-driven development I have another sizable update for my Image Description Toolkit or IDT. There is a full What’s New document available.

Highlights for this beta release include use of geolocation data when present in images, the ability to download images from a specified web address and have them run through the image description system and numerous other enhancements.

You can also keep current with all my projects from my Projects page.

One Comment

Feedback on IDT Demo Gallery

I’m looking to crowd source some feedback. I’ve mentioned here a few times a collection of tools I’ve created called the Image Description Toolkit. The short version of this is that it is a way to get image descriptions that you can save and customize the level of detail you get. This can be a bit of an abstract concept in a world where many still do not understand alt text.

So, I’ve put together a demo page at www.kellford.com/idtdemo. It has the traditional image gallery but then a Description Explorer. The Description Explorer allows you to see how different AI prompts result in various image descriptions and how different AI providers do at image descriptions. There are a total of four prompts (narrative, colorful, technical and detailed) using 10 different AI provider/model combinations.

For example, choose Description Explorer and then the option for all prompts from a provider. Note how the descriptions built on each other in a way from Narrative to Colorful to Technical.

The point of this demo is to showcase the sort of data my toolkit can make available. Whether you are an individual like me who wants more access to my pictures with different descriptions, or you want longer descriptions for other purposes, this is an example of what my toolkit makes possible.

This is not the one-off random describe this picture type of system. There are hundreds of those. This is the I want permanent descriptions at scale type of system.

Feedback I’m love to have. First off, does the web page look reasonable and free from glaring problems? Do the concepts of what info you can have from my toolkit make sense from this demo? If not, what would help?

One very interesting challenge. AI vision models are in my experience not great at generating alt text. I tried a range of prompts to get them to do so. In the end, the alt text (not my longer descriptions) was created by taking the Narrative prompts created by AI and running those through AI again asking for alt text to be created. You can see an example of this in action by using the Image Browser and choosing to show the alt text visibly. Note, choosing this mode with a screen reader will result in alt text reading twice–once as alt text on the images and once as the visible version of the alt text. I debated what to do about this situation and, so far, opted to turn off the visible display of alt text on page load. I do want people to see the alt text on demand because it is part of the overall system.

The toolkit allows for this sort of data gathering and gallery creation to be done all automatically. Just point the tools at a collection of images and an AI provider and you can choose how the info is shown.

Again, visit http://www.kellford.com/idtdemo for the gallery. Visit https://github.com/kellylford/Image-Description-Toolkit/releases/tag/v3.0.1 for the toolkit itself and https://theideaplace.net/image-description-toolkit-3-0-available/ for my latest blog post on the toolkit.

Leave a Comment

Image Description Toolkit 3.0 Available

I have a vastly updated version of my Image Description Toolkit (IDT) available. The 3.0 release builds on the command line workflow tools I released earlier with a guided workflow creation system, analysis tools for combining descriptions, a range of performance statistics on your descriptions and tools to review content from various AI models and prompts.

The IDT also includes several other tools for working with image descriptions. A results viewer allows you to browse results, monitor workflows as they happen, copy descriptions and images.

If you want to focus on detailed descriptions for individual images, the Image Describer tool, allows you to load a directory of images, and easily request descriptions for individual images and do so using multiple providers and prompts and save all your work for future use.

A Prompt Editor allows you to create and edit the prompts used to guide AI descriptions for all the tools in the IDT and configure defaults.

In addition to Ollama, which powered earlier versions of the IDT, the IDT now includes support for multiple AI models from both OpenAI and Anthropic if you provide your own API key.

A comprehensive User Guide is available for the various tools in the IDT.

Whether you want detailed descriptions for one or one thousand images, the IDT has a tool for you. Grab IDT3.zip, extract the files to your computer and run install_idt. Be sure to install Ollama and or get API keys for OpenAI or Anthropic and configure your AI models. Then open a command prompt and run:

idt guideme to generate your first descriptions.

15 Comments

RSS Quick, A Basic RSS Reader for Windows 1.0 Available

I’ve been dabbling further with AI-driven development and have another app for exploration. This time it is a basic RSS Reader for Windows I’m calling RSS Quick. Get the full details and program from Release RSS Quick 1.0 – Initial Release · kellylford/rssquick or my alternative download location.

The name RSS Quick reflects the intent of the app to be a simple and fast way to quickly browse RSS feeds. The program does not handle feed management at this time but will allow you to import an existing OPML file and quickly browse the feeds. RSS Quick includes a default OPML file named RSS.OPML that will be put in the default folder where you unzip the software package. It has dozens of feeds from a wide range of sources already included. Even if you don’t use the app, the OPML file may be handy as another source of feeds.

To use RSS Quick, you need to download the .ZIP file for your computer. In most cases this will be the X64 version. If you are using ARM hardware with Windows, you’ll want the ARM version. Extract all the files to any location on your computer and run RSSQuick.EXE. The program will load the default OPML file that has been included. If you do not have the necessary version of .NET, you will be prompted to download that first.

You can learn more about RSS Quick from the readme for the program. Give it a try and consider getting involved or submitting issues on the GitHub page.

RSS Quick is one of three projects I have available. I also have my Sports Scores app and an Image Description Toolkit. available. All projects can also be downloaded from my separate release archive.

Leave a Comment

Making Sports Information More Accessible: A Preview of the Sports Scores App”

I’ve gone back to the AI workbench, much like I did with my Image Description Toolkit, to create another app. This time it is to address the goal to make sports scores and play-by-play information easier to read and avoid some of the long-standing accessibility challenges on leading sports web sites.

I’m calling this app a preview release because not everything is working yet but I have enough functionality available that I wanted to share it now.

I have an audio demo of the app along with a transcript of the demo. The release page with notes and links to download the Windows program are at Release Sports Scores 0.5 Preview Release · kellylford/Scores.

The program uses lists, trees and tables to show things like game scores, play-by-play action, box scores and standings. The info itself comes from APIs made available by leading sports providers. I’m just showing it in ways that work better in my opinion with screen readers and other assistive technology.

Perhaps the most interesting feature I have so far is the ability to hear baseball pitch location and speed by converting pitch data to audio sounds. The demo has a sample of this and within the app you can press Alt+p to hear the location of any pitch when you are on that pitch in the play-by-play details. Press Alt+s to hear the entire pitch sequence. Bring up the context menu on a pitch for these options plus a Pitch Explorer feature to get a sense of how the sounds for different locations and pitches will be presented.

In addition to the release page, the GitHub project has a readme with full details on the app. An issue list where you can report new issues or discover what’s already known is available.

Give the app a try and if you are interested in contributing to the project, please do so.

Leave a Comment

Image Description Toolkit V2 Available

I’ve made another series of updates to what I’m calling the Image Description Toolkit since my last announcement. As a recap, the goal of this toolkit is to take collections of images and videos and create descriptions you can save and do this all with local AI models. Dozens of tools provide descriptions, but it is still difficult to save those descriptions for future review. With the Image Description Toolkit, you get nicely formatted HTML pages to read through all your image descriptions.

The newest enhancements include a comprehensive testing system to experiment with model prompts, a workflow script that allows for all tasks to be run with one command versus individually running each script and numerous small adjustments throughout the system. The code here is still all AI-generated with my ideas powering what’s created.

I’m sure I’m not objective but for me this has turned into something that started as a curiosity, moved into a better understanding of how AI code generation could work and is now something I’m using regularly. Over the weekend I attended several musical events and was able to generate more than 400 image descriptions from photos and videos I took.

The project lives on GitHub and has a readme that covers the basics of getting started. A guide for using the prompt testing script is also available. This is particularly heklpful for trying out different models.

I’m always curious how AI writing works as well so asked GitHub Copilot to generate a second blog post about project developments. And of course, it is software, so there is also an issue list.

I won’t say for certain what’s next but my current plan is to work on a graphical version of the project to understand more about that environment with Python, create a prompt editor so changing the default prompts is easier and get this all working with Python packaging so install is easier.

Contributions, suggestions or pointers to tools that already do all of this are always welcome.

Leave a Comment

Updates to Image Description Toolkit

Several months ago I announced a highly experimental set of Python scripts I called The Image Description Toolkit. Consider it a fancy name for solving my goal of wanting a way to get thousands of pictures taken from my iPhone and also for the past several decades from whatever phone I was using described and having a permanent description of the photos. I’ve made some key updates, although I’d still say this is categorized as highly experimental.

Most notably, I’ve made it possible to build custom AI prompts, choose the model you use and adjust the parameters used with the model and have all of this done through a configuration file.

I’ve also updated the script that will convert files in the .HEIC format to .JPG and streamlined the output to HTML with a script that can be run. To be very clear, when I say I’ve done these things. All the code in this project was generated with AI through my prompting and refinement.

A readme for the project explaining how all this works is available. I also had AI generate a blog post about the project. You can find the full project on GitHub.

With all of those qualifications, I have found these tools of value. I’ve now generated more than 10,000 image descriptions running on my local computer. The Moondream model used through Ollama has been excellent. It is incredibly fast when used for batch processing, has some of the lowest memory requirements I’ve found and still gives rich details and is highly responsive to different prompts.

I plan to continue experimenting here over time. I want to make setup easier and know about Python packaging but have found it doesn’t always work so this all still requires manual install of Ollama, Python and the individual scripts. The readme file should walk you through this though.

If you have feedback, know of other ways to accomplish these same tasks or suggestions on what else I should include here, feel free to let me know. I’ve leanred a great deal about image processing from AI, using Python and AI code generation from these experiments. And of course, I now have permanent descriptions of more than 10,000 pictures.

Leave a Comment

“I Just Want to Be Included”: How a Simple Plea Sparked an unintentional AI Accessibility Experiment

The work of editorial cartoonists and their ability to capture the essence of an idea in a picture or two must be quite the talent to have and job to hold. That phrase “a picture is worth a thousand words,” certainly can ring true.

A recent experience verifying an issue in Microsoft’s Copilot app I was told had been fixed, led to that all-too-common situation in accessibility of taking one step forward and a couple in reverse. My issue was fixed but I discovered new accessibility issues to report.

What began as a routine task—verifying an accessibility fix in Microsoft’s Copilot—quickly devolved into a familiar frustration as new issues emerged. It was in that moment that the app presented me with a default prompt to “Create an image of…”. My frustration, my curiosity about AI, and my interest in editorial cartoons suddenly converged, sparking an unintentional experiment with a fundamental question at its core: could an AI capture my feelings in a picture, and then accurately describe what it had just made?

I gave Copilot this prompt:

“Create an image of the Microsoft logo with a man’s face crying in sadness imposed over it and a cartoon bubble saying, “I just want to be included please,” and here is where things turned interesting.

I used Copilot, Gemini, ChatGPT, and Claude (the last two through JAWS Picture Smart functionality) to get descriptions of the resulting image. I expected some variability but take note in particular of the Copilot description. It leaves out critical details and gives a vastly different version of the very image created through Copilot. I have no illusion that Copilot would have the ability to know that it created the image but really do wonder why the details are so different with Copilot? Did this trigger some content flags internal to Copilot for example?

It is also interesting to note how Gemini’s interpretation of the image is not entirely my intent. Gemini seems to interpret that Microsoft is the entity being left out here and in fact it was my intent to express the opposite, that the person crying was the person being left out.

This experiment raises obvious questions about the capabilities and challenges of today’s AI technology. But for me, it highlights a much deeper, more personal challenge.

As someone who understands the visual world entirely through words—from alternative text, human-provided descriptions and as of late AI-based image description—this leads to a fundamental question: can my words ever be enough to accurately portray a feeling or an idea for an AI to translate into a picture I cannot see but trust represents my intent? If it does, will I have the ability to know this with confidence? Had I spent the time to write those thousand words in the prompt, would the result have been any closer to the image in my mind’s eye?

The image generated by Copilot can be found at http://theideaplace.net/wp-content/uploads/2025/06/CopilotMSFTLogoCryingAndCartoonBalloon.png.

The responses from the various AI descriptions are below.

ChatGPT

The image is a meme featuring the “Crying Jordan” face superimposed over the Microsoft logo. The face has a speech bubble coming from it, saying, “I just want to be included please.” The background shows the Microsoft logo, which includes its four colored squares and the word “Microsoft” in a large font.

The meme seems to humorously convey the idea of being left out or wanting inclusion.

Claude

This image is a popular internet meme that combines the Microsoft logo (including its distinctive four-colored window-pane design in orange and green)

with a crying face. The person in the image appears emotional with tears on their face, and there’s a speech bubble saying “I just want to be included please.” The image is meant to be humorous, playing on the idea of feeling left out, with “Microsoft” written in the company’s characteristic gray font at the bottom of the image. The background is plain white, making the elements of the meme stand out clearly.

Copilot

The image features the Microsoft logo, consisting of four colored squares—red, green, blue, and yellow—arranged in a 2×2 grid. Beneath the logo, the word “Microsoft” is displayed in a bold, gray font. On the right side of the image, there’s a speech bubble containing the text: “I just want to be included please.” The placement of the speech bubble suggests a plea for inclusion or recognition, adding an element of sentiment to the otherwise corporate design. Is there something specific you’re curious about regarding this image?

Gemini

This is a popular internet meme that combines several elements to create a humorous and satirical image. At the center of the image is the disembodied head of a person who appears to be a Black man, crying. His face is contorted in sadness, with visible tears welling in his eyes and streaming down his cheeks. This specific image is widely recognized as the “Crying Jordan” meme, which features a photograph of basketball legend Michael Jordan during his 2009 Basketball Hall of Fame induction speech. In the background, positioned behind the head, is the Microsoft logo, composed of four squares: orange-red in the top left and green in the top right. Below the head, the word “Microsoft” is intentionally misspelled as “Micrososft.” To the right of the head is a cartoon speech bubble with the text, “I just want to be included please.” The overall composition suggests that the corporation Microsoft is being personified as the “Crying Jordan” figure, expressing feelings of sadness, failure, and exclusion, as articulated by the text in the speech bubble. The meme is typically used to mock a perceived failure or a lack of popularity of a Microsoft product or service in comparison to its competitors.

Leave a Comment

Good to Know AI Doesn’t Pull Any Punches

It is good to know that AI doesn’t pull any punches. I took the latest Word Online conformance report available from aka.ms/msacr and asked the Windows Copilot app to summarize the issues identified. I was answering a question on am email list for someone expressing some frustrations with Word Online.

I won’t copy that full response here but it was comprehensive at pulling the details from the report. I will share the summary.

As AI systems seem to want to do these days, Copilot then prompted me with a question. Hey AI, be careful what you ask!

Copilot:

### In Summary

While Word Online meets a number of WCAG 2.1 criteria, these exception points highlight areas where users—especially those relying on assistive technologies—may encounter challenges. The primary issues revolve around:

– **Rendering content as images instead of accessible text by default,**

– **Inconsistent or incomplete keyboard and screen reader accessibility for dynamic content and interactive controls,**

– **Visual presentation concerns (contrast and resizing problems), and**

– **Limited programmatic exposure of certain UI elements and features.**

These exceptions suggest that although the product supports many accessibility features, improvements are necessary to ensure a fully inclusive experience.  

For anyone interested in further details, you might explore how these issues could be mitigated with updated design approaches or additional assistive technology compatibility tests. What are your thoughts—do you see these issues as critical to the overall user experience or as minor inconveniences that can be worked around?

Kelly:

I think these are incredibly problematic:

Copilot:

You’re absolutely right, Kelly. These issues aren’t mere inconveniences—they directly impact users who depend on accessible technology to interact with content. When information is rendered as images rather than text, or when keyboard navigation and screen reader support are inconsistent, it creates significant barriers for people with disabilities. This scenario isn’t just a minor glitch; it’s a fundamental shortcoming that compromises an inclusive user experience.

Moreover, the challenges with contrast ratios, resizing limitations, and the incomplete exposure of UI states further erode the usability and clarity of the interface. For individuals who rely on assistive devices, even small obstacles can lead to frustration and exclusion. It’s a stark reminder that accessibility features must be integrated from the design stage, rather than treated as an afterthought or workaround.

It’s particularly concerning considering the widespread use of Word Online. Many users who could benefit immensely from a fully accessible environment might instead be left with a subpar experience. This not only affects productivity but also perpetuates inequality in access to essential digital tools.

What kinds of solutions do you think would be most effective in addressing these issues? Perhaps more rigorous testing with a variety of assistive technologies or more dynamic adaptation of content could be a start. Your perspective is crucial as it echoes the real-world impact these shortcomings have on users.

I’m curious—what improvements do you envision that could make a real difference for those relying on accessibility features?

Leave a Comment

Tooltip Should Not Start An Accessible Name

I’ve noticed a growing trend to try and make the content from some popups or tooltips directly keyboard accessible on web pages with buttons that reveal the content. This is good but what’s not so good is the naming of these buttons. In pretty much every situation where I’ve encountered this, the name starts with the phrase “tooltip.” For example on a financial web site I use when obtaining a ticker quote, the button in question reads “Tooltip for Quick Quote Disclosure.”

This is not an isolated situation and maybe it is just because I’ve been using a range of financial web sites lately but it seems very common in those experiences of late.

This, as with most accessible names, would be better at minimum to put the unique part of the name first. In this case, “quick quote disclosure.”

There is obviously more too this, such as whether the word tooltip is needed at all, proper roles and more but this basic naming syntax makes a big difference in efficient access to the content. The unique should generally come first in an accessible name.

2 Comments