The Idea Place

A Short Video for Two of My Apps

Published May 10, 2026 by Kelly Ford

I recently gave a talk to some staff from the University of Wisconsin’s ophthalmology clinic. Although it wasn’t the main focus of my talk, I had an opportunity to show a little bit about two of the iOS apps I’ve been working on. To do so, I made a short video and thought I would share it here. A video transcript is also available.

The video shows a bit about my WeatherFast and Sports Scores apps. I think the touch exploration of sports stadiums in my Sports Scores app is pretty cool as just one item I demoed in the video.

One Comment

Microsoft, Where Are The Descriptions?

Published April 23, 2026 by Kelly Ford

The Post and Video

Excel is one of the programs I use frequently so a post on X by Microsoft’s Satya Nadella talking about Copilot in Excel and Agent Mode caught my attention. Microsoft CEO Satya Nadella’s original post is available at this link. In the post, an Excel ad from the 1990s was recreated using Agent Mode in Excel.

Dialog alone to me does not convey enough information and I really wish when the CEO of Microsoft posted videos like this, meaningful descriptions and or audio described options were included.

My Attempts to Serve Descriptions for Myself

My Image Description Toolkit has the ability to extract images based on attempts to detect scene changes or at preset time intervals. I was really curious what the video showed Excel doing so downloaded the video from the post on X and ran it through my toolkit. This provided 66 descriptions. My toolkit can obviously improve in terms of handling duplicate images and more in a scenario like this. If you want to read the original 66 descriptions you can read them online.

While this gave me loads of details, for a video such as this, it wasn’t really a practical solution.

My AI-generated Audio Description Script

I want to emphasize that I do not believe AI alone should be used to produce audio descriptions or voice the audio description script. The folks that do this work are highly skilled and there is both an art and a science to providing good audio descriptions that simply is not something I’ve experienced AI doing at a level that would meet my expectations yet.

I did use AI to create a script of sorts here because I still wanted to know what was in the video. I used OpenClaw and gave it my extracted descriptions and the original video and asked it to produce an audio description script. The result is at this link and at the end of this post.

My Conclusion

As even my AI-generated script shows, there is a richness in conveying the story that individual descriptions simply do not convey. While I have no way to know how accurate the AI-generated script is, I obtained more info from that than trying to sort through all the individual descriptions. I understood the story being told and also understood enough of the details about exactly what Excel was doing to catch my attention. Microsoft should be doing more here for all the videos they post.

The Script

AUDIO DESCRIPTION SCRIPT
Microsoft Excel + Copilot Video
Video Duration: Approximately 65 seconds
Source: twitsave.com_QbZKcMGJmJTDKAIl.mp4
=== OPENING ===
[0:00-0:05]
On a black screen, white serif text appears: “In 1992, Excel took an elevator ride.”
The text fades, then transitions to reveal the tagline: “Same ad. Smarter Excel. Agent Mode.”
=== ELEVATOR SCENE – THE SETUP ===
[0:05-0:10]
The scene opens inside a crowded elevator with beige marble-patterned walls. Five business professionals stand packed together: A man in a tan suit on the left holds a laptop. Behind him, a young man wears a red and white bicycle helmet with the straps dangling. In the center, a man with curly dark hair, glasses, and a gray suit looks toward a woman in a navy blazer who holds a tablet, her head tilted down looking at the screen. On the far right, an older man with gray hair and glasses in a light beige suit stands against the wall.
[0:10-0:13]
The woman speaks, looking down at her laptop. The screen comes into view, revealing a messy Microsoft Excel spreadsheet filled with yellow-highlighted error cells displaying “#DIV/0!” and “#NAME?” messages. A pie chart and bar chart appear below the data, partially broken. The Copilot AI assistant panel is open on the right side of the screen with three buttons: “Research with Copilot,” “Create a table,” and “Teach me something about Excel.”
[0:13-0:18]
The man in the tan suit asks, “What is this?” The woman responds, “It’s Copilot in Microsoft Excel.” She looks up briefly. “Now hold still. I’m about to perform a miracle.”
=== COPILOT DEMONSTRATION – FIXING ERRORS ===
[0:18-0:25]
Close-up on the Excel screen. The Copilot panel on the right shows a text input field. The user has typed: “Find formula errors, clean up graphs, and make it presentation ready.” She clicks the green send button. Copilot responds: “I’ll analyze the workbook to find formula errors, clean up the charts, and make it presentation ready.” A “Step 1” indicator appears as Copilot begins working.
[0:25-0:32]
Cut back to the elevator. The man in the bicycle helmet says skeptically, “My spreadsheet doesn’t do that.” The man in the tan suit looks at him, then replies, “Okay, I admit it, it’s pretty neat, but there still isn’t enough time.” The camera pans to the elevator floor indicator, which shows they’re on floor 14, heading up.
=== COPILOT DEMONSTRATION – ADVANCED FEATURES ===
[0:32-0:40]
The man in the bicycle helmet looks at the man in the tan suit and says, “Don’t panic. I’m just going to have Copilot build another report.”
Cut to the Excel screen showing the Copilot panel with a new prompt being typed: “Reference my meeting notes and emails to find the latest operational efficiency data, then build a full-year forecast. Include staffing and supplies costs. Create pivots and charts.” Copilot begins processing with a “Step 1” indicator. The response begins: “I’ll search your emails and meeting notes for the latest operational efficiency data…”
[0:40-0:46]
Back in the elevator, the man in the tan suit looks stressed. “Ugh, look at this new ask from Wilson, she wants a price comparison.”
The man in the bicycle helmet leans in and whispers, “Look, look, let’s just say that your PC got run over by a bus and that you’re lucky to be alive.”
[0:46-0:51]
The woman in the navy blazer smiles and says confidently, “Oh, ye of little faith.”
Cut to Excel. A new worksheet is being created. The Copilot prompt reads: “Start a new worksheet. Use web search to build a market research snapshot. Cite sources. Then pull price information from the competitors listed on sheet two.”
=== THE RESULTS ===
[0:51-0:59]
The elevator floor indicator shows they’ve reached floor 46.
Back in the elevator, the woman announces, “With a few clicks, our superhero fixed errors, incorporated relevant data, and cleaned the whole thing up.”
Cut to Excel showing the completed workbook. The Copilot panel displays: “Workbook Cleanup Complete. Formula Errors Fixed (6 total).” The spreadsheet now shows clean data tables with proper formatting, three professional charts including a line graph, pie chart, and bar chart—all properly labeled and formatted.
A second workbook appears showing “Operational Efficiency” with pivot tables, competitor cost analysis comparing Contoso Suites to Margie’s Travel and Zeva, and a horizontal bar chart. The Copilot panel shows “Reasoned in 11 steps” and lists “Full-Year Forecast” data populated from multiple sources.
A third workbook shows “Market Research Snapshot” with seven tables created via web search, including Global Hospitality Market Overview, U.S. Hotel KPIs, Performance Benchmarks, Operating Cost Trends, Key Market Insights, and Contoso Suites vs. Industry comparison. All tables are properly formatted with sources cited.
=== THE REVEAL ===
[0:59-1:05]
The man with the mustache and glasses looks at the laptop in disbelief, shaking his head. “I can’t believe it.”
The group exchanges amazed looks. The man in the tan suit exclaims, “You did it!”
The woman smiles. The man with glasses says, “This is a miracle.”
[1:05-1:08]
The camera focuses on the woman as she delivers the final line: “This is Microsoft Excel.”
=== ENDING ===
[1:08-1:12]
The scene cuts to the closing shot: On a black background, the Microsoft logo appears—a four-square grid in red, green, blue, and yellow, followed by the word “Microsoft” in white.
[1:12-1:15]
The screen fades to black, then reveals the final tagline: “My spreadsheet does do that.” The word “does” is italicized for emphasis.
The Microsoft logo appears once more as the video ends.
=== KEY VISUAL DETAILS ===
CHARACTERS:

Woman in navy blazer: Main presenter, confident, using Copilot

Man in tan suit: Skeptical but impressed colleague

Man in bicycle helmet: Tech-savvy supporter

Man with mustache and glasses: Curly dark hair, gray suit, orange paisley tie

Older man with gray hair: Light beige suit, blue polka dot tie, observer

EXCEL FEATURES DEMONSTRATED:

Copilot AI assistant panel (right sidebar)

Natural language prompts for complex tasks

Automatic formula error detection and correction

Chart cleanup and formatting

Web search integration for market research

Email and meeting notes integration

Pivot table creation

Multi-source data compilation

Professional presentation formatting

VISUAL STYLE:

Modern office elevator with beige marble walls

1990s-inspired business attire (reference to original ad)

Excel interface with green title bar, dark blue table headers

Copilot panel with white background, green action buttons

Clean, professional spreadsheet design after cleanup

=== NOTES FOR NARRATOR ===
Tone: Enthusiastic but professional. The script balances the humor of the elevator scenario with clear explanation of Copilot’s capabilities.
Pacing: Move quickly through the Excel screen descriptions to keep pace with the fast cuts, but pause slightly on the final tagline delivery.
Emphasis: Highlight the contrast between the “before” state (messy spreadsheet with errors) and “after” state (clean, professional workbooks with charts and analysis).
Accessibility Note: Describe the visual humor (like the bicycle helmet) as it adds character context without needing to explain the joke.
—
Script prepared based on 66 extracted video frames
Video runtime: Approximately 65 seconds

One Comment

The Power of Prompting – Image Description Toolkit 4 beta 3 with NASA Images

Published April 19, 2026 by Kelly Ford

Since writing about The Image Description Toolkit (IDT) 4.0 Beta 1 in February, a lot of work on the toolkit has been happening. A new beta is out, new models have been added, the prompt library has grown, and I found a perfect real-world dataset to put the whole system through its paces: NASA’s Image of the Day collection. This post is about what I learned, what the data showed, and why the right prompt can make all the difference.

What Has Changed Since Beta 1

The February post introduced IDT 4.0’s two main tools — the ImageDescriber GUI and the idt command line — along with support for Ollama, OpenAI, and Claude models. Since then, 4.0.0Beta3 has shipped with a number of additions worth knowing about:

idt describe — a friendlier alias for idt workflow. Same command, easier to remember.
idt redescribe — re-runs AI description on a set of images you have already processed, skipping the video extraction and conversion steps. Useful for quickly testing a new model or prompt on the same images without starting from scratch.
idt guideme now accepts URLs — in addition to a folder path, you can paste a web page URL at the image folder prompt and IDT will download images from that page and continue the workflow automatically.
Downloaded images organized by source — images downloaded from a URL are now placed in a subfolder named after the domain and page title, so you always know where a set of images came from.

The full list of prompts has also grown. IDT now ships with twelve built-in prompts, each designed to draw out a different aspect of an image:

Narrative — a straightforward scene description, grounding you in what is present
Detailed — everything Narrative gives you, plus more technical depth
Concise — a brief, single-sentence summary
Colorful — emphasis on color, tone, and visual atmosphere
Technical — image quality, composition, exposure, and photographic characteristics
Artistic — style, mood, and creative interpretation
Simple — plain-language description suitable for a general audience
Accessibility — structured for screen readers and assistive technology
Comparison — designed for comparing two versions of the same image
Mood — emotional tone and atmosphere
Functional — what the image is for, not just what it shows
AI Alt Text — three website alt text options at 25, 50, and 100 words, designed as a starting draft for human review before publishing

You can edit any of these or add your own through the Tools menu in ImageDescriber or by editing prompts.json directly.

When You Download Images, IDT Now Captures Alt Text Too

One feature that did not get much attention in the Beta 1 post: when IDT downloads images from a web page, it also captures the alt text that is already on those images. That data is stored alongside your downloaded images and is available for comparison when you run the analysis tools.

This turned out to be far more interesting than I expected once I chose my test dataset.

The NASA Opportunity

On April 1, 2026, NASA launched the Artemis II mission — the first crewed Moon trip since the Apollo program. NASA’s Image of the Day page had been building up to this for weeks with stunning photographs: the crew in training, the rocket being prepared, and then the mission itself: Earthrise from lunar orbit, the Milky Way seen from deep space, the crew capsule from a window looking back at a shrunken planet.

This was a natural fit for IDT. I used idt workflow to download 64 images directly from the NASA Image of the Day page and ran all twelve built-in prompts across multiple AI models. The dataset includes descriptions from Claude Haiku 4.5, Claude Sonnet 4.6, Ollama Gemma4 31b (cloud), Ollama Kimi-K2.5 (cloud), Ollama Moondream, Ollama Qwen3-VL 235b (cloud), GPT-4.1 Mini, and GPT-4.1 Nano — over 700 descriptions in total.

And because IDT captured the alt text from NASA’s page during download, I had a built-in comparison baseline.

NASA Does Quality Alt Text Already

Before getting into what IDT produced, it is worth saying clearly: NASA already writes excellent alt text. This is not a critique. NASA’s image descriptions are thoughtful, specific, and informative. They name people, identify locations, and include context that no AI can supply from an image alone.

Here is an example. The image is a photograph taken from inside the Orion spacecraft. An astronaut is silhouetted against a window, looking out at Earth floating in space.

NASA’s alt text:

NASA astronaut Reid Wiseman looks out of the Orion spacecraft’s rectangular window at Earth. He is barely visible in the dimly lit cabin, but Earth is bright. The entirety of Earth cannot be seen; in this view, Earth’s round shape is cut off. Earth itself looks like a bright blue ball with swirling white clouds all over it; at center is a brown and green landmass.

That is excellent alt text. It names the astronaut. It identifies the spacecraft. It describes what Earth looks like from that vantage point. An AI looking at the same image cannot do any of that.

Here is what Kimi-K2.5 produced for the same image with the Narrative prompt:

The image shows the interior of a spacecraft. The left side of the frame is filled with solid black darkness. Within this darkness, the silhouette of a human head appears in left profile, positioned approximately one-third from the left edge. The person’s forehead, nose, closed mouth, and chin are outlined by light entering from the right.

Good, accurate description of what is visible — but no name, no mission context, no spacecraft identification. The AI sees the scene; NASA’s writer knows the story.

Prompts Change What You Get

Here is where the Power of Prompting shows up. Take a different image: Earthset from the far side of the Moon — the Earth appearing as a crescent above the cratered lunar horizon.

NASA’s alt text:

Earthset From the Lunar Far Side

That is the title, essentially. It tells you what it is but nothing about what you are seeing.

Now watch what happens as IDT works through different prompts, using Kimi-K2.5 (cloud):

Narrative prompt:

The image presents a view of space dominated by the curved bulk of the Moon occupying the lower two-thirds of the frame. The scene contains no human figures, clothing, or artificial objects.

A starting point, but sparse.

Colorful prompt:

This image depicts Earth rising above the Moon’s horizon, with the planet’s partially illuminated disk displaying navy oceans and ivory cloud formations against the charcoal blackness of space, while the foreground shows the Moon’s cratered, uneven surface in shades of slate and charcoal gray. The lighting originates from the right side of the frame, producing hard-edged shadows across the lunar terrain.

Now we have color, light direction, texture. A vivid picture is forming.

Accessibility prompt (Gemma4 31b):

A wide-angle shot of Earth rising over the horizon of the Moon against a black void.

Concise. Functional. Ready for a screen reader.

AI Alt Text prompt (Kimi-K2.5, 25/50/100 words):

25 words: A stunning view of Earth rising over the Moon’s cratered horizon, with the blue planet visible as a crescent against the vast blackness of space.

50 words: A dramatic photograph showing the heavily cratered surface of the Moon in the foreground, with planet Earth visible as a blue and white crescent rising above the lunar horizon. The stark contrast between the gray moonscape and vibrant Earth suspended in black void of space creates a breathtaking cosmic perspective.

100 words: This awe-inspiring image captures the iconic “Earthrise” phenomenon, showing the rugged, crater-covered surface of the Moon dominating the foreground while the delicate blue and white crescent of Earth emerges above the lunar horizon. The photograph reveals the stark beauty of the moonscape’s gray, pockmarked terrain, illuminated by sunlight, contrasting dramatically with the vibrant colors of our home planet suspended in the infinite blackness. This perspective, captured from lunar orbit, emphasizes the profound isolation of the Moon and the fragility of Earth, offering a humbling reminder of our planet’s place in the cosmos.

Each prompt gives you something different. None replaces NASA’s contextual knowledge. But together they build a much fuller picture than any single description could.

Introducing the AI Alt Text Prompt (Experimental)

The eleven standard prompts I was originally going to use in this version of the IDT are all about understanding an image in depth. But there is also a practical question: can AI generate usable alt text for web images?

I added a twelfth experimental prompt — aialttext — that asks the AI for three versions of website alt text at different lengths: 25 words, 50 words, and 100 words. The goal is to give whoever is publishing the image options to choose from depending on the context. I’ll give away the answer and say the answer is a qualified yes so the IDT now has twelve standard prompts.

This yes is qualified because AI text should always be reviewed by a human. AI can assist in generating alt text is probably the better answer.

The results above show what Kimi-K2.5 produced for the Earthset image. Let me give one more example. Here is NASA’s alt text for the Saturn image taken by the James Webb Space Telescope:

NASA’s alt text:

NASA Webb, Hubble Share Most Comprehensive View of Saturn to Date

Kimi AI Alt Text — 25 words:

Infrared image of Saturn captured by Webb Telescope on November 29, 2024, showing bright glowing blue rings and three labeled moons: Janus, Dione, and Enceladus.

This is interesting. The AI correctly read the labels on the image — Webb infrared, the date, the moon names. NASA’s alt text is a headline; the AI’s version is descriptive. Neither is wrong. They serve different purposes.

But here is the point that matters most: you should not publish AI-generated alt text without reviewing it first.

Unless you would automatically publish an AI-generated image on your website, you should not publish AI-generated alt text without review. Alt text is not decoration. It is the description that a person using a screen reader gets instead of seeing the image. Getting it wrong — or getting it confidently wrong, as AI sometimes does — causes real harm.

The AI cannot name people it has not been trained on. It may misread or invent labels. It may describe what looks plausible instead of what is actually there. Domain knowledge matters enormously. NASA’s writers know the astronauts by name, the spacecraft by model, the mission by number. AI does not.

Used as a starting point for a human editor, the AI alt text prompt can save time and surface details a writer might miss. Used as a replacement for human review, it introduces exactly the kind of accessibility failure that ruins the experience for the people who most need it.

Kimi-K2.5 and Gemma4: Same Prompt, Different Voice

One of the things I wanted to know was whether two different models given the exact same alt text prompt would produce meaningfully different results — or whether the prompt would dominate and the model would not matter much.

The short answer: both models followed the three-length structure cleanly, but they have genuinely different voices. And the differences matter for publishing.

Speed is the most obvious difference — and it comes with a lesson. Kimi-K2.5 averaged 116 seconds per image for this run. Gemma4 averaged 14 seconds. For 64 images, Kimi took just over two hours; Gemma took about 15 minutes.

What makes that gap interesting is that Kimi is not normally a slow model. Across all the other prompts in this dataset, it averaged between 11 and 36 seconds per image. The aialttext run was a significant outlier. My hypothesis is that asking for three outputs of different lengths in a single prompt caused Kimi to generate substantially more text per image — and longer outputs simply take longer to produce. Gemma handled the same multi-part prompt without the same time cost.

The practical takeaway: any time you combine things in a prompt — multiple outputs, multiple formats, multiple lengths — it is worth running a small test batch first to understand how your model of choice responds before committing to 64 images. What works efficiently with one model may behave very differently with another.

Formatting differed in small but telling ways. Kimi labeled its sections **25 words:**, **50 words:**, **100 words:** — a clean, precise claim. Gemma used **Short (approx. 25 words)**, **Medium (approx. 50 words)**, **Long (approx. 100 words)**. Gemma’s “approx.” qualifier is more honest: neither model is actually hitting the word counts precisely, but Gemma at least acknowledges it.

Here is the same image — an astronaut silhouetted against a spacecraft window, with Earth filling the frame outside — described by both models.

Kimi-K2.5:

25 words: Silhouette of an astronaut gazing at Earth through a spacecraft window, showing blue oceans, swirling white clouds, and the planet’s curvature against the black space.

50 words: Astronaut’s silhouette visible against a spacecraft window, gazing at Earth suspended in the blackness of space. The planet displays vibrant blue oceans, swirling white cloud formations, and hints of landmasses. The curved horizon emphasizes Earth’s spherical shape, creating a striking contrast between the dark interior and the illuminated planet beyond.

100 words: The dark silhouette of an astronaut’s profile is visible against a spacecraft window, gazing at the stunning view of Earth from orbit. The planet fills the window with vibrant blue oceans, swirling white cloud formations, and visible landmasses partially obscured by haze. The curvature of Earth is prominently displayed against the infinite blackness of space, creating a dramatic contrast between the illuminated planet and the dark void surrounding it. The spacecraft interior remains shadowed, emphasizing the isolation of space travel while highlighting the breathtaking beauty of our home planet. This perspective showcases Earth’s fragility and the profound experience of orbital observation.

Gemma4:

Short (approx. 25 words): Silhouette of an astronaut looking through a spacecraft window at a vibrant blue and white Earth against the deep black void of space.

Medium (approx. 50 words): A dramatic silhouette of a person’s profile, likely an astronaut, gazing through a rounded spacecraft portal. Outside the window, the Earth is visible as a brilliant blue marble with swirling white clouds and brown landmasses, contrasting sharply against the absolute darkness of the surrounding outer space.

Long (approx. 100 words): A poignant, high-contrast photograph taken from inside a spacecraft. In the foreground, the dark silhouette of an astronaut’s face is seen in profile, looking out through a curved observation window. The window frames a breathtaking view of planet Earth, showcasing deep blue oceans, intricate white cloud patterns, and the reddish-brown hues of a continent. The stark contrast between the pitch-black interior of the cabin and the luminous, glowing sphere of the planet emphasizes the isolation of space and the fragile beauty of Earth seen from a low-orbit perspective.

Neither names the astronaut (Reid Wiseman). Neither identifies the spacecraft (Orion). That knowledge has to come from the human editor. But as starting drafts, both are genuinely useful.

Kimi reads as more narrative and expansive. The 100-word version builds toward a conclusion — “the profound experience of orbital observation.” Gemma is tighter and more visual — “the reddish-brown hues of a continent.” Different writers will reach for one or the other depending on the context and tone of the publication.

There is also a case where model voice led to an actual error worth noting. For the Earthset image — Earth appearing above the lunar horizon, taken from the far side of the Moon — Kimi’s 100-word version referred to it as the “Earthrise” phenomenon. Earthrise and Earthset are different events. Earthrise famously refers to the Apollo 8 photograph from lunar orbit. This image, taken by the Artemis II crew, shows Earthset from the lunar far side. Kimi reached for the famous reference but got it wrong. NASA’s captured alt text — “Earthset From the Lunar Far Side” — was correct. Gemma described it accurately without using either term.

This is exactly why human review matters before publishing. The AI knew the image was significant. It chose a famous, evocative label. But the label was wrong. A human editor who knew the difference would catch it instantly. A human editor who did not might publish the error.

Follow-Up Questions and Chat

One more IDT capability worth highlighting, now that the NASA dataset illustrates it well. After any image has been described, IDT lets you ask follow-up questions — press F in the ImageDescriber GUI or use the CLI. You can switch to a different model for the follow-up if you want.

For example: after running the Narrative prompt on the Artemis II launch photograph, you might ask: “How does this compare to a Saturn V launch in terms of visual scale?” Or for the supernova remnant: “What telescope instruments were used to capture the different wavelengths in this image?” The AI does not always get these right, but the capability is there, and for scientific images especially, the follow-up question is often where the real value is.

IDT also has a freestanding chat mode (press C in ImageDescriber) for model-to-model conversation without any image attached.

The NASA Meatball

Some images are so iconic that AI models reference them by name without stopping to describe them — and the NASA meatball logo is the clearest example in this dataset.

The term “meatball” appeared in 61 descriptions across 12 images. In 11 of those images, the logo appeared as an incidental background element — on a wall at JPL, projected as a backdrop behind a rock sample display, on the tail of a T-38 jet, on the building behind the Artemis II crew. Across all of those background appearances, models named the logo 26 times and actually described what it looks like only 7 times. Most of those 7 were borderline — noting that the logo was “circular” or appeared on the wall. Almost no description explained what a viewer would actually see.

NASA’s own alt-text set the tone. For the JPL auditorium selfie, NASA’s caption reads: “In the far background, there is a NASA ‘meatball’ insignia and the letters ‘JPL’ on the wall.” The name, nothing more. The AI models followed the same pattern. If you have never seen the NASA meatball and no one has ever described it to you, most of these descriptions left you exactly where you started.

The one exception: when the NASA logo itself was the image subject (a standalone logo PNG in the dataset), models gave rich, accurate descriptions across virtually every prompt. Gemma4’s AI alt text treatment is a good example: “The NASA insignia, known as the ‘meatball’ logo. It consists of a blue sphere representing a planet, containing the word ‘NASA’ in bold white text. A red V-shaped vector symbolizes aeronautics, while a white orbital ring and scattered white stars represent space exploration.” That’s exactly what someone unfamiliar with the logo needs. The problem is that this only happened when the logo was the foreground subject — never when it appeared in the background.

What the NASA meatball actually looks like: A circular emblem on a deep blue field, with the word “NASA” in large bold white letters across the center. A red swoosh — sometimes described as a wing or chevron — cuts diagonally across the circle from lower-left to upper-right, representing aeronautics. A white curved orbital path arcs around the circle, suggesting a spacecraft in orbit. Small white stars are scattered through the blue field. The overall effect is dense but balanced: space, flight, and identity in one symbol.

Followup questions help, but aren’t foolproof. When asked as a standalone followup — “Please describe what the NASA meatball looks like to someone who has never seen it” — GPT-4.1 Mini returned an accurate description on the first try: blue background, red wing-shaped vector, white stars, bold white NASA lettering. Claude Haiku’s response, by contrast, described it as having “a large red circle dominating the center” as the main element — significantly wrong. The logo is primarily blue. Even when a model knows the name of an iconic image, its ability to accurately describe that image from memory alone varies.

The prompt implication. The accessibility and AI alt text prompts in IDT currently do not explicitly instruct models to describe iconic logos and emblems when they appear incidentally in an image. Adding language like “when referencing iconic logos, insignia, or well-known symbols by name, include a brief visual description of what the symbol looks like” may close this gap in a future prompt revision.

Data Completeness

In a few instances, data for a prompt and model are not complete. In the case of Moondream, some prompts — such as Technical — failed to return any response on multiple tries.

In the case of Claude, not all prompts were run due to the amount of data already gathered and the costs associated with Claude models. In addition, due to image sizing and the way IDT currently works, some images exceeded the size limit for Claude and were not described.

The AI Alt Text prompt was only run on Kimi-K2.5 and Gemma4 models for this dataset.

The Data

You can obtain the full set of image descriptions and prompts used for this set of NASA images at the following locations.

The CSV includes descriptions from eight models across six providers: Claude Haiku 4.5, Claude Sonnet 4.6, Gemma4 31b, Kimi K2, Moondream, Qwen3-VL 235b, GPT-4.1 Mini, and GPT-4.1 Nano. For a brief description of each model and links to official documentation, see the models file above.

Try It Yourself

The latest version of the Image Description Toolkit can be obtained from the GitHub releases page or on the project page at theideaplace.net. Full documentation is in the User Guide on GitHub.

If you want to replicate the NASA experiment, try:

idt workflow https://www.nasa.gov/image-of-the-day/

That will download the current Image of the Day collection and describe the images using your configured model and prompt. From there, idt combinedescriptions will compile everything into a CSV you can explore in Excel or any spreadsheet tool.

Questions, issues, and pull requests are welcome at github.com/kellylford/Image-Description-Toolkit.

*This blog post used AI for parts of the image research, content organization and overall editorial review.

2 Comments

Fidelity and Elan Show Zero Accountability for Accessibility

Published March 26, 2026 by Kelly Ford

This is a textbook example of a complete lack of accountability from both Fidelity Investments and Elan Financial Services for a fundamental accessibility failure. On January 11, 2026, I reported a serious accessibility issue to both Fidelity and Elan around the website used for a Fidelity-branded credit card from Elan. Fidelity confirmed the issue via email to me on January 16, 2026. I’ve had no response from Elan. Two months later and there is no change. Why?

As I mentioned, Fidelity offers a credit card that comes from Elan but is branded as a Fidelity credit card. As you’d expect, there is a website where you can view transactions and more.

When reviewing transactions, an ARIA-Label on the table row with all the transaction details is used. This is an absolute disaster for accessibility. On top of this, just about all the table columns are filled with blank cells when read with a screen reader because ARIA-Hidden is used on the data in those cells.

It is bad enough that all the data, as a result of the ARIA-Label, is jammed into one long string. The fact that an actual table is still present but the columns are all blank with a screen reader, when you really need a properly structured table to review financial transactions effectively, just compounds the problem.

Ironically, maybe a month ago, the credit card was changed so that if you have multiple cardholders on the account, transactions show up under the name of each cardholder. This column is actually shown in the table when using a screen reader with data.

A Real Example

This is an example from my account just now for one of the ARIA-Labels. Note I changed the ending digits from my card here to 0000 and deleted some of the HTML for styling.

Can You Say Double Standard?

You can bet if the visual experience was broken in some equivalent way, such as all the table data jammed into a single cell, it wouldn’t take more than two months for something to be done. The sad thing is that this is just par for the course when it comes to accessibility. There is little to no accountability for accessibility basics when they are broken. Fine, mistakes happen. What you do or do not do about them speaks volumes. Leaving the user to deal with this kind of garbage, well, what’s that message? Using a screen reader should not mean you get an inferior experience.

One Comment

Windows Unattended Install with Parallels and a Simple Machine Manager

Published March 1, 2026 by Kelly Ford

Background

Virtual machines, running one computer as an application inside another, still give me that “kid in a candy store” feeling. They allow me to test, experiment, and safely break things; if it all goes horribly wrong, I can just delete the virtual machine and start over. There is no risk of losing computer access or screen-reading functionality.

For the past couple of years, my primary environment has been Parallels on a Mac. I mainly run Windows virtual machines, and Parallels has done a great job of supporting Windows 11 despite its more stringent hardware requirements.

For sighted users, Parallels makes creating a VM easy: you simply choose “New VM,” and there is an option to download a Windows ISO directly from Microsoft. Unfortunately, this part of the Parallels interface (along with several other features) suffers from significant accessibility gaps.

One workaround is to use VOCR, a tool that performs optical character recognition on the screen. It is incredibly handy for this and many other situations where standard screen reading fails.

As someone who originally set out to be a journalist, the combination of AI-assisted code development and my own ideas has been a game-changer. I’ve long said, “I’ll tell you what to build; you build it.” We still have a long way to go, but the progress I’ve seen in just one year is truly amazing.

Another Solution: The Unattended Install

I decided to use AI to improve the Parallels experience. I asked it to help me create a script that would download a version of Windows for ARM hardware, create the Parallels virtual machine, and perform an unattended install so that the OS is ready at the desktop. This essentially replicates what Parallels does natively, but through a more accessible interface.

Managing Machines

Once I had a script that allowed me to trigger a VM creation from the Mac Finder and have a working desktop in 20 minutes, I wanted to go further. Managing existing VMs through the Parallels UI can also be hit-or-miss regarding accessibility.

Parallels does offer a robust command-line utility called prlctl. While powerful, command-line tools can be unforgiving if your syntax is slightly off. To bridge this gap, I went back to the AI and created a very basic Parallels Machine Manager. This utility displays a list of your running machines and provides basic management tasks like starting, stopping, cloning, and deleting. I also integrated the targeted Windows VM creation script directly into the app.

Getting These Tools

You can download the standalone unattended install script here, or download the full Parallels Machine Manager here.

Contributions Welcome

I’ve hosted the script and the app on GitHub in a repository I’m calling The Workbench. You can find it at github.com/kellylford/TheWorkBench.

I create simple solutions like this from time to time and plan to populate this repository with more of them. Contributions—whether to these projects or sharing your own—are more than welcome.

My contributions to this repo are functional, but they are “immediate-need” projects. Once I’ve solved the specific problem I’m facing, I don’t typically invest significant time in extending them. While these tools are valuable, they would require more work to become “production-quality” software.

More Formal Projects

I also maintain a list of more formal projects at www.theideaplace.net/projects. These include:

A weather app
A sports scores app
An image description toolkit
A simple RSS reader

“Introducing IDT 4.0 Beta 1: An Enhanced Way to Describe Your Digital Images

Published February 13, 2026 by Kelly Ford

A 4.0Beta1 release of my Image Description Toolkit is now available. If you are unfamiliar with this project, it provides apps to handle image description using Ollama AI models that can run locally. The toolkit is specifically aimed at handling large numbers of images and videos. The toolkit offers both a command line (IDT) version and a graphical version (ImageDescriber) and allows you to generate descriptions based on a series of predefined prompts or add your own to tailor the sort of information you get back from an AI model. The app also supports OpenAI and Claude models with your own user-supplied API key.

What the Image Description Toolkit Is Not

It is important to understand what the toolkit is not. Countless apps exist for getting image descriptions. I use many myself. Most of these are aimed at describing one or two images though and often the descriptions must be generated new each time you want the image described. IDT is not meant to replace these experiences directly.

What the Image Description Toolkit Is

The IDT is aimed at the situation when you have dozens, hundreds or thousands of photos you want described and you want permanent descriptions of the images. IDT also allows for easy generation of multiple descriptions based on different AI prompts and asking follow up questions. If you want more emphasis on colors in an image for example, you can use the ‘Colorful’ prompt. The ‘Technical’ prompt gives you more details about the image quality. All prompts are customizable and you can add your own prompts to the system.

Quick Start

After installing the IDT, you have two main tools for image description. ImageDescriber is the first option you’ll want to explore for describing smaller groups of images and generating multiple descriptions per image. The default install puts this in ‘c:\idt\imagedescriber.exe’. Load the app, press ctrl+l or choose File:Load Directory and then arrow through images that have been loaded. Press the letter p on any image to process that image or choose Processing:Process All Undescribed to have the full set of images you have loaded described. In either case you’ll be prompted for the AI provider you want to use, the model and prompt. The system defaults to Ollama, Moondream and the Narrative prompt.

The second option is IDT command line. Open a command prompt and change to the default install location of ‘c:\idt’, assuming a standard install. Enter ‘idt guideme’ and the system will ask you a series of questions and then kick off a workflow to describe your images. Alternatively enter ‘idt workflow <path to images>’ and the system will just start describing images using the same defaults as mentioned earlier.

Additional Functionality

In addition to providing image descriptions, both IDT and the ImageDescriber can download images from a web page and provide descriptions. In IDT just enter ‘idt workflow <url>’ and in ImageDescriber press ctrl+u and provide the address of a web page.

ImageDescriber also allows you to ask follow up questions about any image using the original AI provider and model or by selecting another of your choice. Just press f on any image or description in the app.

Monitoring Costs for Paid Models

When using Claude or OpenAI models, the number of tokens used for the AI prompt, which includes the image, as well as the response are also shown as part of the image meta data. Review these to gauge the cost of using the various models from these companies. They publish pricing for the tokens used with different models.

A Basic Chat with AI

Finally, ImageDescriber has a basic chat system that does not require images for any AI models you have installed or configured. Press c when the program is open or choose Process:Chat. Next select your model and the chat experience will open. Enter your question and the AI will respond. You can press shift+tab to go to the chat history. Note, currently the only way to save the chat history is to press ctrl+a and then ctrl+c to copy the full chat and paste it in another app. Work to fully integrate saving of chats into an Image Description Toolkit workspace file is ongoing.

User Guide and Issue Reporting

A full User Guide is available for the IDT. You can also report issues from within the program with the Report an Issue menu item on the Help menu or by visiting the New Issue form on GitHub. Check the list of current known issues as well.

Windows Version Today, Mac Coming Soon

This release is for Windows, with a Mac version coming soon. The Mac version is in fact done and for those who are so inclined you can build it from source on GitHub. Look in the wxmigration branch under MacBuilds for build scripts. I’m not officially releasing the Mac version yet because I am still refining how VoiceOver reads lists within the program and doing further testing.

Getting the Image Description Toolkit

You can download the 4.0Beta1 release directly from my projects page.

2/14 Update on Downloading

My application is currently not code signed. As a result, you may be prompted that the file is untrusted or otherwise not safe to download. I run security checks and such before making my files available and am still researching costs for code signing on Windows. You can try downloading a .zip file with the installer. Unzip the file and run the installer. You’ll still likely have to deal with untrusted warnings but they are a bit easier than doing so as a part of the browser download process. Some browsers may still present the warning experience as well.

One Comment

Basic Accessibility Gaps Still Too Common, Microsoft please justify CopilotNative.Chat.Controls

Published December 27, 2025 by Kelly Ford

With the end of the year approaching, I was doing some tax prep. I asked the Windows version of Copilot to convert a final pay summary from my last employer into something that was more readable. To Copilot’s credit, it converted the PDF into a highly accessible Excel spreadsheet. That’s the positive. It is really handy.

Unfortunately, that’s not the full story. In reviewing the conversation, here is how the entry providing my answer reads with a screen reader. Note this is just the start of the answer.

CopilotNative.Chat.Controls.ViewModels.MessageThinkingAndActivityPartCopilotNative.Chat.Controls.ViewModels.MessageDocumentPartYour Excel spreadsheet is ready to go — you can **click the card above to download it**.

I have to ask why this sort of a basic accessibility failure is still making it to customers. Both automated and manual testing could easily find this sort of a failure and likely basic code inspection and automated checking as well at the code level.

People who do not use screen readers would never be shown text like CopilotNative.Chat.Controls and the rest of the long string those of us who do use such technology are offered. This is simply not acceptable.

Sadly, this is still far too often the reality of accessibility though in 2025. It is part of why I wrote my post on Accessibility island years ago at https://theideaplace.net/accessibility-island-the-journey-of-many-experienced-as-one/. It is also why I say the accessibility field, especially at the corporate level, needs to be far more accountable. I wrote about this also at https://theideaplace.net/from-word-fluff-to-real-impact-achieving-specific-measurable-and-accountable-accessibility/.

Some accessibility is more complex to address but these sorts of issues are not. All it takes is commitment and attention to the most basic of details.

If you try to reproduce this, the issue seems to occur when Copilot is producing downloadable content. Of course, the Windows Copilot app still serves at best half an answer to screen reading users in my opinion in general.

Yes, it is true that AI, as with many technologies, has done many innovative things for accessibility. But the full story should be positive, not just the innovations. These tools still need sizable attention because the basics of accessibility within many of these tools still falls short.

Image Description Toolkit 3.6.0 Available

Published December 5, 2025 by Kelly Ford

A 3.6.0 release for my Image Description Toolkit is now available. The release features a new –redescribe option as well as support for two Microsoft Florence-2 models via Hugging Face.

Redescribe

The redescribe option can be used with the command line version of the toolkit to redescribe the same set of images without the need to extract video frames or convert image files to JPG from HEIC. This can save time and allow you to quickly test different prompts on the same set of images. To use the redescribe option, simply add –redescribe as the first option in an “idt workflow” command followed by the standard –provider, –model and –prompt-style options. Use the workflow directory from a previous image description workflow as the file directory for images to be described. Product documentation has full details.

Hugging Face and Florence Models

Hugging Face is described as “The platform where the machine learning community collaborates on models, datasets, and applications.” It contains a wide range of AI models, datasets and apps across the full spectrum of AI tasks.

The Florence-2 AI model is described in part as, “an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks.” It runs locally and requires no pre-installation. Note that the model will download on first use so expect a longer image description time. Also be aware that for a variety of reasons, the Florence-2 model is only supported in the command line version of the Image Description Toolkit.

You can get the latest update for the Image Description Toolkit from my projects page at www.theideaplace.net/projects or the GitHub release page.

2 Comments

Give Accessibility a Seat at the Notepad Table Please

Published November 21, 2025 by Kelly Ford

Unreal, I read today that Microsoft has started rolling out table support for Notepad. But guess what, if you are using a screen reader, the fact that you are in a table when editing is not communicated at all. This despite the fact that Office has clearly communicated tables when editing for many years with screen readers, including Narrator. So, clearly technical solutions exist and have been used for a long time. Why is something this basic once again left out for accessibility?

If you are so inclined, you can up vote feedback I submitted on this issue at this feedback link.

This is another example of why I say we need more accountability around accessibility. If Microsoft is going to enhance Notepad, accessibility needs an equal seat at the table, not something missed yet again.

2 Comments

Windows Copilot Serves At Best Half an Answer to Screen Reading Users

Published November 6, 2025 by Kelly Ford

If you follow me on social media, one of the themes you’ll find I talk about fairly often related to accessibility is how text that isn’t shown visually needs to have better verification. This can be anything from alt text on web pages, to the accessible names used in different applications for controls such as menus and more. For several months now, the Windows Copilot app has had a serious problem that fits in this category. I know I reported it to Microsoft at the start of March of this year and the problem remains as broken today as it was then.

In this case, the challenge is that the text communicated for accessibility purposes is clipped such that when reviewing answers with Copilot, The answers you receive are incomplete.

If I ask Copilot for a list of the fifty United States, I’m led to believe that there are only 27 states. In the way the list is rendered to a screen reader, the List ends at the State of Nebraska. I guess Wisconsin, where I am writing this from, has disappeared?

If I ask for a list of the U.S. presidents, the list when read with a screen reader only shows 10 presidents and in fact cuts off the last name such that you end up getting only a partial name. At least my history never said we had a president named “10. John T.”

You can try various modes with the different screen readers, such as Narrator’s Scan Mode, the JAWS Touch Cursor and more and the problem is not improved. In most cases, such as Narrator’s Scan Mode, you end up reading some text twice but still never get the full answer.

It should not be necessary to copy a response and paste it into another app just to read the answer from an AI system or any app for that matter.

The frustrating thing is that none of this was an issue at the start of 2025. Microsoft appeared to be coalescing around a fairly standard user experience for the multitude of Copilot experiences available to customers. The foundation for the Windows app was a well-structured HTML experience with headings and more that made reading answers quite straight forward.

Sometime after the start of the year, the Windows Copilot app appears to have changed some of the foundational technology used that has caused this sorry state of incomplete answers when read with a screen reader. One can only wonder how long this busted experience will be allowed to exist. It is already several months too long in my opinion.

To be honest, it is incredibly frustrating to write blog posts like this. I doubt they do much good, but I am not sure what other avenues to pursue. As I said, I reported this situation to Microsoft in early March of this year. Through multiple email exchanges, all I know is that Microsoft claims they are taking the issue seriously.

It is repeated situations like this, where accessibility issues linger on and on where I say we need a shift to true accountable accessibility. I outlined a proposal for such earlier this year as well.

This is just one of several significant accessibility challenges with the Windows Copilot app. I’m a firm believer that all experiences should be usable and accessible, but this is especially true for AI-based situations. We need to be able to access the responses put out from AI effectively and obviously comprehensively.

In fact, if I ask the Windows Copilot app to give me a definition of accessible software, the experience is to me completely unacceptable. Things start out well enough with automatic reading of a message that Copilot is preparing the answer. They then progress to a reading of the beginning of the answer, only to have that reading interrupted by a restart of the reading of the answer that starts over from the beginning, only to read more of the full answer automatically.

Ignoring the multiple tabs or shift+tabs it requires to get from the edit box for communicating with Copilot to the chat history, the answer in the chat history is again clipped from what you heard with the automatic reading. Here is my chat history as read by Narrator when moving to the Copilot chat history for the accessibility prompt and response.

give me the definition of accessible software please.
**Accessible software is software designed to be usable by people with a wide range of disabilities, ensuring equal access to digital tools and content.**
### 🧩 What Accessible Software Means
Accessible software incorporates features and design principles that remove barriers for users with:
– **Visual impairments** (e.g., blindness, low vision, color blindness)
– **Auditory impairments** (e.g., deafness, hard of hearing)
– **Motor impairments** (e.g., limited dexterity, paralysis)
– **Cognitive or neurol

If I copy the answer, you can read just how much is skipped and also the fact that any of the structure, such as lists, is not communicated when reading in the Copilot app. Here is the full copied answer to my question.

Accessible software is software designed to be usable by people with a wide range of disabilities, ensuring equal access to digital tools and content.

🧩 What Accessible Software Means

Accessible software incorporates features and design principles that remove barriers for users with:

Visual impairments (e.g., blindness, low vision, color blindness)
Auditory impairments (e.g., deafness, hard of hearing)
Motor impairments (e.g., limited dexterity, paralysis)
Cognitive or neurological conditions (e.g., dyslexia, ADHD, autism)

The goal is to ensure that everyone, regardless of ability, can interact with the software effectively and independently.

🔍 Key Features of Accessible Software

Keyboard navigation: All functions can be accessed without a mouse.
Screen reader compatibility: Proper use of semantic HTML, ARIA roles, and alt text for images.
Color contrast and scalable text: Ensures readability for users with low vision.
Captions and transcripts: For multimedia content, aiding users with hearing impairments.
Logical structure and focus management: Helps users navigate and understand content flow.
Customizable interfaces: Allows users to adjust layout, font size, and input methods.

📐 Standards and Compliance

Accessible software typically adheres to guidelines such as:

WCAG (Web Content Accessibility Guidelines) – the global standard for digital accessibility.
Section 508 (U.S.) – mandates accessible technology for federal agencies.
EN 301 549 (EU) – accessibility requirements for ICT products and services.

🧠 Why It Matters

Inclusivity: Promotes equal participation in digital spaces.
Legal compliance: Avoids lawsuits and meets regulatory requirements.
Better UX for all: Accessibility often improves usability for everyone.
Ethical responsibility: Reflects a commitment to social equity and universal design.

If you’d like examples of accessible software in action or want to benchmark a specific app, I can help you analyze it.

2 Comments

The Idea Place Posts

The Post and Video

My Attempts to Serve Descriptions for Myself

My AI-generated Audio Description Script

My Conclusion

The Script

What Has Changed Since Beta 1

When You Download Images, IDT Now Captures Alt Text Too

The NASA Opportunity

NASA Does Quality Alt Text Already

Prompts Change What You Get

Introducing the AI Alt Text Prompt (Experimental)

Kimi-K2.5 and Gemma4: Same Prompt, Different Voice

Follow-Up Questions and Chat

The NASA Meatball

Data Completeness

The Data

Try It Yourself

A Real Example

Can You Say Double Standard?

Background

Another Solution: The Unattended Install

Managing Machines

Getting These Tools

Contributions Welcome

More Formal Projects

What the Image Description Toolkit Is Not

What the Image Description Toolkit Is

Quick Start

Additional Functionality

Monitoring Costs for Paid Models

A Basic Chat with AI

User Guide and Issue Reporting

Windows Version Today, Mac Coming Soon

Getting the Image Description Toolkit

2/14 Update on Downloading

Redescribe

Hugging Face and Florence Models