Skip to content →

Category: Photography

“Introducing IDT 4.0 Beta 1: An Enhanced Way to Describe Your Digital Images

A 4.0Beta1 release of my Image Description Toolkit is now available. If you are unfamiliar with this project, it provides apps to handle image description using Ollama AI models that can run locally. The toolkit is specifically aimed at handling large numbers of images and videos. The toolkit offers both a command line (IDT) version and a graphical version (ImageDescriber) and allows you to generate descriptions based on a series of predefined prompts or add your own to tailor the sort of information you get back from an AI model. The app also supports OpenAI and Claude models with your own user-supplied API key.

What the Image Description Toolkit Is Not

It is important to understand what the toolkit is not. Countless apps exist for getting image descriptions. I use many myself. Most of these are aimed at describing one or two images though and often the descriptions must be generated new each time you want the image described. IDT is not meant to replace these experiences directly.

What the Image Description Toolkit Is

The IDT is aimed at the situation when you have dozens, hundreds or thousands of photos you want described and you want permanent descriptions of the images. IDT also allows for easy generation of multiple descriptions based on different AI prompts and asking follow up questions. If you want more emphasis on colors in an image for example, you can use the ‘Colorful’ prompt. The ‘Technical’ prompt gives you more details about the image quality. All prompts are customizable and you can add your own prompts to the system.

Quick Start

After installing the IDT, you have two main tools for image description. ImageDescriber is the first option you’ll want to explore for describing smaller groups of images and generating multiple descriptions per image. The default install puts this in ‘c:\idt\imagedescriber.exe’. Load the app, press ctrl+l or choose File:Load Directory and then arrow through images that have been loaded. Press the letter p on any image to process that image or choose Processing:Process All Undescribed to have the full set of images you have loaded described. In either case you’ll be prompted for the AI provider you want to use, the model and prompt. The system defaults to Ollama, Moondream and the Narrative prompt.

The second option is IDT command line. Open a command prompt and change to the default install location of ‘c:\idt’, assuming a standard install. Enter ‘idt guideme’ and the system will ask you a series of questions and then kick off a workflow to describe your images. Alternatively enter ‘idt workflow <path to images>’ and the system will just start describing images using the same defaults as mentioned earlier.

Additional Functionality

In addition to providing image descriptions, both IDT and the ImageDescriber can download images from a web page and provide descriptions. In IDT just enter ‘idt workflow <url>’ and in ImageDescriber press ctrl+u and provide the address of a web page.

ImageDescriber also allows you to ask follow up questions about any image using the original AI provider and model or by selecting another of your choice. Just press f on any image or description in the app.

Monitoring Costs for Paid Models

When using Claude or OpenAI models, the number of tokens used for the AI prompt, which includes the image, as well as the response are also shown as part of the image meta data. Review these to gauge the cost of using the various models from these companies. They publish pricing for the tokens used with different models.

A Basic Chat with AI

Finally, ImageDescriber has a basic chat system that does not require images for any AI models you have installed or configured. Press c when the program is open or choose Process:Chat. Next select your model and the chat experience will open. Enter your question and the AI will respond. You can press shift+tab to go to the chat history. Note, currently the only way to save the chat history is to press ctrl+a and then ctrl+c to copy the full chat and paste it in another app. Work to fully integrate saving of chats into an Image Description Toolkit workspace file is ongoing.

User Guide and Issue Reporting

A full User Guide is available for the IDT. You can also report issues from within the program with the Report an Issue menu item on the Help menu or by visiting the New Issue form on GitHub. Check the list of current known issues as well.

Windows Version Today, Mac Coming Soon

This release is for Windows, with a Mac version coming soon. The Mac version is in fact done and for those who are so inclined you can build it from source on GitHub. Look in the wxmigration branch under MacBuilds for build scripts. I’m not officially releasing the Mac version yet because I am still refining how VoiceOver reads lists within the program and doing further testing.

Getting the Image Description Toolkit

You can download the 4.0Beta1 release directly from my projects page.

2/14 Update on Downloading

My application is currently not code signed. As a result, you may be prompted that the file is untrusted or otherwise not safe to download. I run security checks and such before making my files available and am still researching costs for code signing on Windows. You can try downloading a .zip file with the installer. Unzip the file and run the installer. You’ll still likely have to deal with untrusted warnings but they are a bit easier than doing so as a part of the browser download process. Some browsers may still present the warning experience as well.

Leave a Comment

Image Description Toolkit 3.5 Beta Featuring Geolocation Data and Web Image Downloads

With more AI-driven development I have another sizable update for my Image Description Toolkit or IDT. There is a full What’s New document available.

Highlights for this beta release include use of geolocation data when present in images, the ability to download images from a specified web address and have them run through the image description system and numerous other enhancements.

You can also keep current with all my projects from my Projects page.

One Comment

Feedback on IDT Demo Gallery

I’m looking to crowd source some feedback. I’ve mentioned here a few times a collection of tools I’ve created called the Image Description Toolkit. The short version of this is that it is a way to get image descriptions that you can save and customize the level of detail you get. This can be a bit of an abstract concept in a world where many still do not understand alt text.

So, I’ve put together a demo page at www.kellford.com/idtdemo. It has the traditional image gallery but then a Description Explorer. The Description Explorer allows you to see how different AI prompts result in various image descriptions and how different AI providers do at image descriptions. There are a total of four prompts (narrative, colorful, technical and detailed) using 10 different AI provider/model combinations.

For example, choose Description Explorer and then the option for all prompts from a provider. Note how the descriptions built on each other in a way from Narrative to Colorful to Technical.

The point of this demo is to showcase the sort of data my toolkit can make available. Whether you are an individual like me who wants more access to my pictures with different descriptions, or you want longer descriptions for other purposes, this is an example of what my toolkit makes possible.

This is not the one-off random describe this picture type of system. There are hundreds of those. This is the I want permanent descriptions at scale type of system.

Feedback I’m love to have. First off, does the web page look reasonable and free from glaring problems? Do the concepts of what info you can have from my toolkit make sense from this demo? If not, what would help?

One very interesting challenge. AI vision models are in my experience not great at generating alt text. I tried a range of prompts to get them to do so. In the end, the alt text (not my longer descriptions) was created by taking the Narrative prompts created by AI and running those through AI again asking for alt text to be created. You can see an example of this in action by using the Image Browser and choosing to show the alt text visibly. Note, choosing this mode with a screen reader will result in alt text reading twice–once as alt text on the images and once as the visible version of the alt text. I debated what to do about this situation and, so far, opted to turn off the visible display of alt text on page load. I do want people to see the alt text on demand because it is part of the overall system.

The toolkit allows for this sort of data gathering and gallery creation to be done all automatically. Just point the tools at a collection of images and an AI provider and you can choose how the info is shown.

Again, visit http://www.kellford.com/idtdemo for the gallery. Visit https://github.com/kellylford/Image-Description-Toolkit/releases/tag/v3.0.1 for the toolkit itself and https://theideaplace.net/image-description-toolkit-3-0-available/ for my latest blog post on the toolkit.

Leave a Comment

Image Description Toolkit V2 Available

I’ve made another series of updates to what I’m calling the Image Description Toolkit since my last announcement. As a recap, the goal of this toolkit is to take collections of images and videos and create descriptions you can save and do this all with local AI models. Dozens of tools provide descriptions, but it is still difficult to save those descriptions for future review. With the Image Description Toolkit, you get nicely formatted HTML pages to read through all your image descriptions.

The newest enhancements include a comprehensive testing system to experiment with model prompts, a workflow script that allows for all tasks to be run with one command versus individually running each script and numerous small adjustments throughout the system. The code here is still all AI-generated with my ideas powering what’s created.

I’m sure I’m not objective but for me this has turned into something that started as a curiosity, moved into a better understanding of how AI code generation could work and is now something I’m using regularly. Over the weekend I attended several musical events and was able to generate more than 400 image descriptions from photos and videos I took.

The project lives on GitHub and has a readme that covers the basics of getting started. A guide for using the prompt testing script is also available. This is particularly heklpful for trying out different models.

I’m always curious how AI writing works as well so asked GitHub Copilot to generate a second blog post about project developments. And of course, it is software, so there is also an issue list.

I won’t say for certain what’s next but my current plan is to work on a graphical version of the project to understand more about that environment with Python, create a prompt editor so changing the default prompts is easier and get this all working with Python packaging so install is easier.

Contributions, suggestions or pointers to tools that already do all of this are always welcome.

Leave a Comment

Updates to Image Description Toolkit

Several months ago I announced a highly experimental set of Python scripts I called The Image Description Toolkit. Consider it a fancy name for solving my goal of wanting a way to get thousands of pictures taken from my iPhone and also for the past several decades from whatever phone I was using described and having a permanent description of the photos. I’ve made some key updates, although I’d still say this is categorized as highly experimental.

Most notably, I’ve made it possible to build custom AI prompts, choose the model you use and adjust the parameters used with the model and have all of this done through a configuration file.

I’ve also updated the script that will convert files in the .HEIC format to .JPG and streamlined the output to HTML with a script that can be run. To be very clear, when I say I’ve done these things. All the code in this project was generated with AI through my prompting and refinement.

A readme for the project explaining how all this works is available. I also had AI generate a blog post about the project. You can find the full project on GitHub.

With all of those qualifications, I have found these tools of value. I’ve now generated more than 10,000 image descriptions running on my local computer. The Moondream model used through Ollama has been excellent. It is incredibly fast when used for batch processing, has some of the lowest memory requirements I’ve found and still gives rich details and is highly responsive to different prompts.

I plan to continue experimenting here over time. I want to make setup easier and know about Python packaging but have found it doesn’t always work so this all still requires manual install of Ollama, Python and the individual scripts. The readme file should walk you through this though.

If you have feedback, know of other ways to accomplish these same tasks or suggestions on what else I should include here, feel free to let me know. I’ve leanred a great deal about image processing from AI, using Python and AI code generation from these experiments. And of course, I now have permanent descriptions of more than 10,000 pictures.

Leave a Comment