I’m looking to crowd source some feedback. I’ve mentioned here a few times a collection of tools I’ve created called the Image Description Toolkit. The short version of this is that it is a way to get image descriptions that you can save and customize the level of detail you get. This can be a bit of an abstract concept in a world where many still do not understand alt text.
So, I’ve put together a demo page at www.kellford.com/idtdemo. It has the traditional image gallery but then a Description Explorer. The Description Explorer allows you to see how different AI prompts result in various image descriptions and how different AI providers do at image descriptions. There are a total of four prompts (narrative, colorful, technical and detailed) using 10 different AI provider/model combinations.
For example, choose Description Explorer and then the option for all prompts from a provider. Note how the descriptions built on each other in a way from Narrative to Colorful to Technical.
The point of this demo is to showcase the sort of data my toolkit can make available. Whether you are an individual like me who wants more access to my pictures with different descriptions, or you want longer descriptions for other purposes, this is an example of what my toolkit makes possible.
This is not the one-off random describe this picture type of system. There are hundreds of those. This is the I want permanent descriptions at scale type of system.
Feedback I’m love to have. First off, does the web page look reasonable and free from glaring problems? Do the concepts of what info you can have from my toolkit make sense from this demo? If not, what would help?
One very interesting challenge. AI vision models are in my experience not great at generating alt text. I tried a range of prompts to get them to do so. In the end, the alt text (not my longer descriptions) was created by taking the Narrative prompts created by AI and running those through AI again asking for alt text to be created. You can see an example of this in action by using the Image Browser and choosing to show the alt text visibly. Note, choosing this mode with a screen reader will result in alt text reading twice–once as alt text on the images and once as the visible version of the alt text. I debated what to do about this situation and, so far, opted to turn off the visible display of alt text on page load. I do want people to see the alt text on demand because it is part of the overall system.
The toolkit allows for this sort of data gathering and gallery creation to be done all automatically. Just point the tools at a collection of images and an AI provider and you can choose how the info is shown.
Again, visit http://www.kellford.com/idtdemo for the gallery. Visit https://github.com/kellylford/Image-Description-Toolkit/releases/tag/v3.0.1 for the toolkit itself and https://theideaplace.net/image-description-toolkit-3-0-available/ for my latest blog post on the toolkit.