A 4.0Beta1 release of my Image Description Toolkit is now available. If you are unfamiliar with this project, it provides apps to handle image description using Ollama AI models that can run locally. The toolkit is specifically aimed at handling large numbers of images and videos. The toolkit offers both a command line (IDT) version and a graphical version (ImageDescriber) and allows you to generate descriptions based on a series of predefined prompts or add your own to tailor the sort of information you get back from an AI model. The app also supports OpenAI and Claude models with your own user-supplied API key.
What the Image Description Toolkit Is Not
It is important to understand what the toolkit is not. Countless apps exist for getting image descriptions. I use many myself. Most of these are aimed at describing one or two images though and often the descriptions must be generated new each time you want the image described. IDT is not meant to replace these experiences directly.
What the Image Description Toolkit Is
The IDT is aimed at the situation when you have dozens, hundreds or thousands of photos you want described and you want permanent descriptions of the images. IDT also allows for easy generation of multiple descriptions based on different AI prompts and asking follow up questions. If you want more emphasis on colors in an image for example, you can use the ‘Colorful’ prompt. The ‘Technical’ prompt gives you more details about the image quality. All prompts are customizable and you can add your own prompts to the system.
Quick Start
After installing the IDT, you have two main tools for image description. ImageDescriber is the first option you’ll want to explore for describing smaller groups of images and generating multiple descriptions per image. The default install puts this in ‘c:\idt\imagedescriber.exe’. Load the app, press ctrl+l or choose File:Load Directory and then arrow through images that have been loaded. Press the letter p on any image to process that image or choose Processing:Process All Undescribed to have the full set of images you have loaded described. In either case you’ll be prompted for the AI provider you want to use, the model and prompt. The system defaults to Ollama, Moondream and the Narrative prompt.
The second option is IDT command line. Open a command prompt and change to the default install location of ‘c:\idt’, assuming a standard install. Enter ‘idt guideme’ and the system will ask you a series of questions and then kick off a workflow to describe your images. Alternatively enter ‘idt workflow <path to images>’ and the system will just start describing images using the same defaults as mentioned earlier.
Additional Functionality
In addition to providing image descriptions, both IDT and the ImageDescriber can download images from a web page and provide descriptions. In IDT just enter ‘idt workflow <url>’ and in ImageDescriber press ctrl+u and provide the address of a web page.
ImageDescriber also allows you to ask follow up questions about any image using the original AI provider and model or by selecting another of your choice. Just press f on any image or description in the app.
Monitoring Costs for Paid Models
When using Claude or OpenAI models, the number of tokens used for the AI prompt, which includes the image, as well as the response are also shown as part of the image meta data. Review these to gauge the cost of using the various models from these companies. They publish pricing for the tokens used with different models.
A Basic Chat with AI
Finally, ImageDescriber has a basic chat system that does not require images for any AI models you have installed or configured. Press c when the program is open or choose Process:Chat. Next select your model and the chat experience will open. Enter your question and the AI will respond. You can press shift+tab to go to the chat history. Note, currently the only way to save the chat history is to press ctrl+a and then ctrl+c to copy the full chat and paste it in another app. Work to fully integrate saving of chats into an Image Description Toolkit workspace file is ongoing.
User Guide and Issue Reporting
A full User Guide is available for the IDT. You can also report issues from within the program with the Report an Issue menu item on the Help menu or by visiting the New Issue form on GitHub. Check the list of current known issues as well.
Windows Version Today, Mac Coming Soon
This release is for Windows, with a Mac version coming soon. The Mac version is in fact done and for those who are so inclined you can build it from source on GitHub. Look in the wxmigration branch under MacBuilds for build scripts. I’m not officially releasing the Mac version yet because I am still refining how VoiceOver reads lists within the program and doing further testing.
Getting the Image Description Toolkit
You can download the 4.0Beta1 release directly from my projects page.
2/14 Update on Downloading
My application is currently not code signed. As a result, you may be prompted that the file is untrusted or otherwise not safe to download. I run security checks and such before making my files available and am still researching costs for code signing on Windows. You can try downloading a .zip file with the installer. Unzip the file and run the installer. You’ll still likely have to deal with untrusted warnings but they are a bit easier than doing so as a part of the browser download process. Some browsers may still present the warning experience as well.
Leave a Comment