Skip to content →

Category: AI

Updates to Image Description Toolkit

Several months ago I announced a highly experimental set of Python scripts I called The Image Description Toolkit. Consider it a fancy name for solving my goal of wanting a way to get thousands of pictures taken from my iPhone and also for the past several decades from whatever phone I was using described and having a permanent description of the photos. I’ve made some key updates, although I’d still say this is categorized as highly experimental.

Most notably, I’ve made it possible to build custom AI prompts, choose the model you use and adjust the parameters used with the model and have all of this done through a configuration file.

I’ve also updated the script that will convert files in the .HEIC format to .JPG and streamlined the output to HTML with a script that can be run. To be very clear, when I say I’ve done these things. All the code in this project was generated with AI through my prompting and refinement.

A readme for the project explaining how all this works is available. I also had AI generate a blog post about the project. You can find the full project on GitHub.

With all of those qualifications, I have found these tools of value. I’ve now generated more than 10,000 image descriptions running on my local computer. The Moondream model used through Ollama has been excellent. It is incredibly fast when used for batch processing, has some of the lowest memory requirements I’ve found and still gives rich details and is highly responsive to different prompts.

I plan to continue experimenting here over time. I want to make setup easier and know about Python packaging but have found it doesn’t always work so this all still requires manual install of Ollama, Python and the individual scripts. The readme file should walk you through this though.

If you have feedback, know of other ways to accomplish these same tasks or suggestions on what else I should include here, feel free to let me know. I’ve leanred a great deal about image processing from AI, using Python and AI code generation from these experiments. And of course, I now have permanent descriptions of more than 10,000 pictures.

Leave a Comment

“I Just Want to Be Included”: How a Simple Plea Sparked an unintentional AI Accessibility Experiment

The work of editorial cartoonists and their ability to capture the essence of an idea in a picture or two must be quite the talent to have and job to hold. That phrase “a picture is worth a thousand words,” certainly can ring true.

A recent experience verifying an issue in Microsoft’s Copilot app I was told had been fixed, led to that all-too-common situation in accessibility of taking one step forward and a couple in reverse. My issue was fixed but I discovered new accessibility issues to report.

What began as a routine task—verifying an accessibility fix in Microsoft’s Copilot—quickly devolved into a familiar frustration as new issues emerged. It was in that moment that the app presented me with a default prompt to “Create an image of…”. My frustration, my curiosity about AI, and my interest in editorial cartoons suddenly converged, sparking an unintentional experiment with a fundamental question at its core: could an AI capture my feelings in a picture, and then accurately describe what it had just made?

I gave Copilot this prompt:

“Create an image of the Microsoft logo with a man’s face crying in sadness imposed over it and a cartoon bubble saying, “I just want to be included please,” and here is where things turned interesting.

I used Copilot, Gemini, ChatGPT, and Claude (the last two through JAWS Picture Smart functionality) to get descriptions of the resulting image. I expected some variability but take note in particular of the Copilot description. It leaves out critical details and gives a vastly different version of the very image created through Copilot. I have no illusion that Copilot would have the ability to know that it created the image but really do wonder why the details are so different with Copilot? Did this trigger some content flags internal to Copilot for example?

It is also interesting to note how Gemini’s interpretation of the image is not entirely my intent. Gemini seems to interpret that Microsoft is the entity being left out here and in fact it was my intent to express the opposite, that the person crying was the person being left out.

This experiment raises obvious questions about the capabilities and challenges of today’s AI technology. But for me, it highlights a much deeper, more personal challenge.

As someone who understands the visual world entirely through words—from alternative text, human-provided descriptions and as of late AI-based image description—this leads to a fundamental question: can my words ever be enough to accurately portray a feeling or an idea for an AI to translate into a picture I cannot see but trust represents my intent? If it does, will I have the ability to know this with confidence? Had I spent the time to write those thousand words in the prompt, would the result have been any closer to the image in my mind’s eye?

The image generated by Copilot can be found at http://theideaplace.net/wp-content/uploads/2025/06/CopilotMSFTLogoCryingAndCartoonBalloon.png.

The responses from the various AI descriptions are below.

ChatGPT

The image is a meme featuring the “Crying Jordan” face superimposed over the Microsoft logo. The face has a speech bubble coming from it, saying, “I just want to be included please.” The background shows the Microsoft logo, which includes its four colored squares and the word “Microsoft” in a large font.

The meme seems to humorously convey the idea of being left out or wanting inclusion.

Claude

This image is a popular internet meme that combines the Microsoft logo (including its distinctive four-colored window-pane design in orange and green)

with a crying face. The person in the image appears emotional with tears on their face, and there’s a speech bubble saying “I just want to be included please.” The image is meant to be humorous, playing on the idea of feeling left out, with “Microsoft” written in the company’s characteristic gray font at the bottom of the image. The background is plain white, making the elements of the meme stand out clearly.

Copilot

The image features the Microsoft logo, consisting of four colored squares—red, green, blue, and yellow—arranged in a 2×2 grid. Beneath the logo, the word “Microsoft” is displayed in a bold, gray font. On the right side of the image, there’s a speech bubble containing the text: “I just want to be included please.” The placement of the speech bubble suggests a plea for inclusion or recognition, adding an element of sentiment to the otherwise corporate design. Is there something specific you’re curious about regarding this image?

Gemini

This is a popular internet meme that combines several elements to create a humorous and satirical image. At the center of the image is the disembodied head of a person who appears to be a Black man, crying. His face is contorted in sadness, with visible tears welling in his eyes and streaming down his cheeks. This specific image is widely recognized as the “Crying Jordan” meme, which features a photograph of basketball legend Michael Jordan during his 2009 Basketball Hall of Fame induction speech. In the background, positioned behind the head, is the Microsoft logo, composed of four squares: orange-red in the top left and green in the top right. Below the head, the word “Microsoft” is intentionally misspelled as “Micrososft.” To the right of the head is a cartoon speech bubble with the text, “I just want to be included please.” The overall composition suggests that the corporation Microsoft is being personified as the “Crying Jordan” figure, expressing feelings of sadness, failure, and exclusion, as articulated by the text in the speech bubble. The meme is typically used to mock a perceived failure or a lack of popularity of a Microsoft product or service in comparison to its competitors.

Leave a Comment