Skip to content →

Tag: accessibility

Image Description Toolkit 3.6.0 Available

A 3.6.0 release for my Image Description Toolkit is now available. The release features a new –redescribe option as well as support for two Microsoft Florence-2 models via Hugging Face.

Redescribe

The redescribe option can be used with the command line version of the toolkit to redescribe the same set of images without the need to extract video frames or convert image files to JPG from HEIC. This can save time and allow you to quickly test different prompts on the same set of images. To use the redescribe option, simply add –redescribe as the first option in an “idt workflow” command followed by the standard –provider, –model and –prompt-style options. Use the workflow directory from a previous image description workflow as the file directory for images to be described. Product documentation has full details.

Hugging Face and Florence Models

Hugging Face is described as “The platform where the machine learning community collaborates on models, datasets, and applications.” It contains a wide range of AI models, datasets and apps across the full spectrum of AI tasks.

The Florence-2 AI model is described in part as, “an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks.” It runs locally and requires no pre-installation. Note that the model will download on first use so expect a longer image description time. Also be aware that for a variety of reasons, the Florence-2 model is only supported in the command line version of the Image Description Toolkit.

You can get the latest update for the Image Description Toolkit from my projects page at www.theideaplace.net/projects or the GitHub release page.

2 Comments

“I Just Want to Be Included”: How a Simple Plea Sparked an unintentional AI Accessibility Experiment

The work of editorial cartoonists and their ability to capture the essence of an idea in a picture or two must be quite the talent to have and job to hold. That phrase “a picture is worth a thousand words,” certainly can ring true.

A recent experience verifying an issue in Microsoft’s Copilot app I was told had been fixed, led to that all-too-common situation in accessibility of taking one step forward and a couple in reverse. My issue was fixed but I discovered new accessibility issues to report.

What began as a routine task—verifying an accessibility fix in Microsoft’s Copilot—quickly devolved into a familiar frustration as new issues emerged. It was in that moment that the app presented me with a default prompt to “Create an image of…”. My frustration, my curiosity about AI, and my interest in editorial cartoons suddenly converged, sparking an unintentional experiment with a fundamental question at its core: could an AI capture my feelings in a picture, and then accurately describe what it had just made?

I gave Copilot this prompt:

“Create an image of the Microsoft logo with a man’s face crying in sadness imposed over it and a cartoon bubble saying, “I just want to be included please,” and here is where things turned interesting.

I used Copilot, Gemini, ChatGPT, and Claude (the last two through JAWS Picture Smart functionality) to get descriptions of the resulting image. I expected some variability but take note in particular of the Copilot description. It leaves out critical details and gives a vastly different version of the very image created through Copilot. I have no illusion that Copilot would have the ability to know that it created the image but really do wonder why the details are so different with Copilot? Did this trigger some content flags internal to Copilot for example?

It is also interesting to note how Gemini’s interpretation of the image is not entirely my intent. Gemini seems to interpret that Microsoft is the entity being left out here and in fact it was my intent to express the opposite, that the person crying was the person being left out.

This experiment raises obvious questions about the capabilities and challenges of today’s AI technology. But for me, it highlights a much deeper, more personal challenge.

As someone who understands the visual world entirely through words—from alternative text, human-provided descriptions and as of late AI-based image description—this leads to a fundamental question: can my words ever be enough to accurately portray a feeling or an idea for an AI to translate into a picture I cannot see but trust represents my intent? If it does, will I have the ability to know this with confidence? Had I spent the time to write those thousand words in the prompt, would the result have been any closer to the image in my mind’s eye?

The image generated by Copilot can be found at http://theideaplace.net/wp-content/uploads/2025/06/CopilotMSFTLogoCryingAndCartoonBalloon.png.

The responses from the various AI descriptions are below.

ChatGPT

The image is a meme featuring the “Crying Jordan” face superimposed over the Microsoft logo. The face has a speech bubble coming from it, saying, “I just want to be included please.” The background shows the Microsoft logo, which includes its four colored squares and the word “Microsoft” in a large font.

The meme seems to humorously convey the idea of being left out or wanting inclusion.

Claude

This image is a popular internet meme that combines the Microsoft logo (including its distinctive four-colored window-pane design in orange and green)

with a crying face. The person in the image appears emotional with tears on their face, and there’s a speech bubble saying “I just want to be included please.” The image is meant to be humorous, playing on the idea of feeling left out, with “Microsoft” written in the company’s characteristic gray font at the bottom of the image. The background is plain white, making the elements of the meme stand out clearly.

Copilot

The image features the Microsoft logo, consisting of four colored squares—red, green, blue, and yellow—arranged in a 2×2 grid. Beneath the logo, the word “Microsoft” is displayed in a bold, gray font. On the right side of the image, there’s a speech bubble containing the text: “I just want to be included please.” The placement of the speech bubble suggests a plea for inclusion or recognition, adding an element of sentiment to the otherwise corporate design. Is there something specific you’re curious about regarding this image?

Gemini

This is a popular internet meme that combines several elements to create a humorous and satirical image. At the center of the image is the disembodied head of a person who appears to be a Black man, crying. His face is contorted in sadness, with visible tears welling in his eyes and streaming down his cheeks. This specific image is widely recognized as the “Crying Jordan” meme, which features a photograph of basketball legend Michael Jordan during his 2009 Basketball Hall of Fame induction speech. In the background, positioned behind the head, is the Microsoft logo, composed of four squares: orange-red in the top left and green in the top right. Below the head, the word “Microsoft” is intentionally misspelled as “Micrososft.” To the right of the head is a cartoon speech bubble with the text, “I just want to be included please.” The overall composition suggests that the corporation Microsoft is being personified as the “Crying Jordan” figure, expressing feelings of sadness, failure, and exclusion, as articulated by the text in the speech bubble. The meme is typically used to mock a perceived failure or a lack of popularity of a Microsoft product or service in comparison to its competitors.

Leave a Comment

Clearing the Chaos: Using A Library Card and U.S. Newsstream for an Improved Reading Experience

While the digital world offers an abundance of online news sources, accessibility is still a work-in-progress  far too often. It is commonplace to spend more time navigating to content than reading that content when using screen reading technology. Poor heading structure, Ads lacking accessibility that are mixed in with the news story, multimedia that plays silently and grabs focus and much more take away from the reading experience.

A public library card and a resource known as U.S. Newsstream offered by many libraries is one solution to include in your reading toolkit to assist.

ProQuest’s U.S. Newsstream is a gateway to an improved reading experience. The full text of hundreds of publications is available  and the helpful thing is that through a URL for each publication, you can easily access a specific issue of a newspaper or other resource with a very screen reader-friendly view of all the article headlines. In addition, when full text is available, you can read the article free of ads or other distractions mixed in with the news.

To use this method of accessing content requires a few preparation steps. First, you need to have a library card for a library that subscribes to this service, and you will need to know your library barcode.

Second, and this is critical, you need to sign into the service through your library’s access point. You can typically find this on your library’s home page under an area called subscription databases, online resources or some other link pointing to the various databases available from your library.

For example, my local library is the Madison Public Library and their list of resources is available under the eResources link.

Following the U.S. Newsstream, link you are prompted to provide library information. Typically this involves some variation of providing your library barcode and at times indicating your library. Again, it is vital you start with this path before going to the next step.

Once you are authenticated to U.S. Newsstream, you can search and use the database directly. However, what has worked well for me is accessing publications directly.

U.S. Newsstream has a list of all their publications you can download. I took the liberty of downloading the file and turning it into a table within Excel and using Excel’s First Column feature to make the file fairly screen reader friendly and available to use.

To access a publication, open the file I’ve created and locate the publication you want to read. Titles are in the first column.

Next, locate the URL for the publication. Hint, with focus in the cell containing the title, press CTRL+Right Arrow and you will jump to the last column in the table row which contains the URL. Press CTRL+C to copy the URL and return to your web browser.

Move focus to the address bar in whatever browser you are using and paste the URL you copied and press Enter. This will take you to the publication page for the resource of interest. Combo boxes allow you to select the year and date for an issue of the publication and a Show Issue Content button brings up the content from that publication for the chosen day.

Article headlines are marked up as headings and links. Pressing enter will load the article.

The article content, when the publication has full text, starts under a heading aptly named full text. At this point you can simply start reading the article. Use whatever method you prefer to navigate back to the list of articles for the publication when finished reading.

As mentioned earlier, it is key that you are signed into the general U.S. Newsstream service before accessing the URL for the publication. If you are not, the URL will not work as described. You will be told to sign in through a library but without options for doing so directly.

The Excel file listing publications has multiple columns of information. These include items such as the dates for which content is available, gaps in coverage, and more.

U.S. Newsstream, other ProQuest databases and resources from your library offer much more functionality and information than outlined here. This is just a starting point.

Finally, I am a firm supporter of a robust and independent news media. Even though I access many articles in the way I’ve outlined here, I do also support publications through subscriptions, donations or other financial contributions. I urge anyone who is able to do so to do the same. Those working in the media have the same bills and needs in life as we all do, and we’ve already seen dramatic losses in meaningful careers in the profession.

Leave a Comment

If You Tag, I Will Read

In my experience, a fundamental disconnect exists between accessibility and the investing world when it comes to a statement that is blasted all over every investment web site when you are about to invest. You will read the statement that you should carefully read the prospectus before making any investment. Good luck with that as in my experience these documents are rarely, if ever, properly tagged for accessibility.

As just one example, typically deep within the multipage documents are tables of the individual investments the mutual fund or ETF holds. Yet the tables in every prospectus I’ve tried to read, more than 50 in the last few weeks, from at least 20 different companies, fail to tag tables properly.

This is just the tip of the iceberg when it comes to accessibility of these documents. Should we talk about the charts and graphs in the same documents?

The Securities and Exchange Commission should mandate that to sell securities in the U.S. at least, all investment materials need to be WCAG 2.1/2.2 AA conformant and give the industry one year to comply. If legislation is needed to make such a mandate enforceable, then congress should craft and enact such.

Given the number of employers who include 401K programs as part of employment, every company who offers this to employees should be holding the investing world accountable today for this.

If anyone knows of an investment company or ETF or mutual fund provider who actually does these documents correctly today, I’d love to hear about it.

Leave a Comment

Variations on an Automatic Image Description

Reading through Twitter today, the following tweet showed up on the timeline of one of the people I follow as a retweet.

Doc🐕 – @DocAtCDI: A truck loaded with thousands of copies of Roget’s Thesaurus spilled its load leaving New York

Witnesses were stunned, startled, aghast, stupefied, confused, shocked, rattled, paralyzed, dazed, bewildered, surprised, dumbfounded, flabbergasted, confounded, astonished, and numbed.

I found the tweet amusing and was going to retweet but noticed it had a picture without any alt text. This lead me to be curious what was in the picture. From the tweet text, I’m assuming some form of vehicles on a road with a bunch of books scattered about is most likely.

I suspect most reading this know that iOS has the ability to automatically describe pictures. This functionality started in iOS 14. When using VoiceOver you can have a short machine-generated description of pictures such as the one attached to the tweet here.

Newer versions of iOS extended this functionality to include a feature called Explore Image. That allows you to use VoiceOver to step through individual objects recognized in the image. It can be accessed with a rotor option when focussed on the image. Here is where the experience gets a bit interesting.

My go to Twitter app on the iPhone is Twitterific. The accessibility of the app has been outstanding for years and the maker has been highly responsive if issues to creep in.

I’ve also been exploring another highly accessible Twitter app named Spring. So far I’ve had a great experience with this app as well.

As one would expect, both Twitterific and Spring offer the ability to view images included with tweets. When images are viewed in either app, the VoiceOver automatic image description and Explore Image functionality work. Differences in the same picture viewed in two different apps using the same automatic image description and exploration technology are plainly obvious though.

First off, the automatic description when viewing the image in Twitterific says:

an illustration of vehicles on a road X. VETERAN’S PLUMRNO. Rall

That same image viewed in Spring yields the following automatic description:

a group of cars driving on a highway ETERAN ‘S PLUMPING

Both descriptions mention that the picture deals with vehicles on a road in some fashion. and include what I’d suspect is the text of a sign on a van or truck in the picture from a plumbing company. Again the descriptions come from Apple, not the individual apps.

A picky point but cars do not drive, people drive them. I might not know what is in the photo for certain but I am quite confident it isn’t a bunch of Teslas with the self-driving mode engaged.

It is also interesting how the image description when using Spring is a bit more detailed. It uses the terms highway and cars, whereas the Twitterific version is more generic in nature. The detail about cars when using Spring is even more interesting when using the Explore Image feature to review the individual objects in the picture.

Again, the newest versions of iOS added a feature called Explore Image to VoiceOver. Focus an image, change the VoiceOver rotor to Actions and one of the choices will be Explore Image. This opens a pop-over experience with individual objects from the picture. You can use VoiceOver previous and next commands to move from object to object and have them highlighted visually in the picture.

Here are the objects from the picture in the tweet I mentioned when explored with Twitterrific:

  • Automobile near left edge
  • Automobile Centered
  • Automobile near right edge

Recall how the automatic description for Spring talked about cars driving on a highway? One can only wonder where the cars went and where the train came from when using the Explore Image feature. Here is what is reported when exploring the image in Spring.

  • Van near bottom-left edge
  • Van near right edge
  • Van near bottom-left edge
  • Train near top edge

Automatic image descriptions are another helpful tool for shaping the accessibility landscape. They’ll be even more impactful if the technology continues to advance to reduce the variability of something as simple as viewing an image in a different program seems to introduce and the accuracy and detail of what is described improves.

Leave a Comment

A Good Example Of What Not To Do and Using Color To Convey Information

I’m not a huge golf fan but with the Ryder Cup being played here in Wisconsin, I was a bit curious about the Whistling Straits course design. There is a good hole-by-hole guide that gives a description of the course.

Selecting a more details link for any hole takes you to a page with additional information. I know enough to know that golfers can start from different tee positions and assumed the numbers for each hole represented the distance for those locations. That’s largely where my understanding of the numbers stops so I was curious why there were five numbers for each hole.

A brother of mine tells me that for those familiar with golf, the colors for each number are fairly established as far as what they mean. Black represents the distance for professionals for example.

This to me is an excellent example of what not to do for web accessibility as far as conveying information with color alone. For those who do not see the colors, as obvious as they might be to golfers, the numbers by themselves are clearly not obvious. Similarly, for those less familiar with golf, I contend attaching a descriptive word to each number would be of benefit.

This is also an illustration of why manual review of web accessibility is so important. I ran multiple accessibility tools on one of these pages. Some contrast errors with other text on the page were flagged but not a single automated tool called attention to these numbers. Automated testing is just not at the point to handle that level of analysis.

This is going to be an example I add to my learning materials on web accessibility. For me it illustrates the concept of not using color alone quite well.

Leave a Comment

University of Wisconsin Hangs Out No Screen Readers Allowed Sign For Big Ten Opener

On Friday, the University of Wisconsin Badgers kicked off the COVID-19-influenced 2020 football campaign with a resounding 45-7 victory over the Illinois Fighting Illini. Like much in this year of change, Camp Randall was empty of the typical 80,000 fans.

To bring some of the gameday experience into the home, Wisconsin social media touted a new Badgers Live gameday experience.  Unfortunately, what Wisconsin Athletics clearly failed to do was ensure this experience was open to all fans. Instead, they hung out a sign to people who use keyboards and screen readers saying, “You are not welcome.”

Anyone familiar with web accessibility will recognize obvious WCAG failures on the opening signup screen.  Missing form labels and lack of keyboard access to needed controls just to name a couple.

If you manage to get past that, the signup experience has another basic failure where you are asked to pick an image to represent your user account.  The images are not reachable from the keyboard and are missing proper alt text.

There are likely many other failures beyond this.  I gave up after the inability to pick an image in the account creation process.

Web accessibility is not new and in fact is not optional for public institutions such as the University of Wisconsin. The university has detailed accessibility policies at https://www.wisc.edu/accessibility/.

At this point in my mind there is no reason beyond institutional indifference from at minimum the Athletics department to accessibility for these situations to keep happening.  This is not the first time I have experienced accessibility issues with web offerings from the athletics department.

It is far beyond time that Director of Athletics Barry Alvarez and Chancellor Becky Blank take accessibility of the online experiences for Wisconsin Athletics seriously. This new gameday experience may be exciting or it may be of absolutely no interest to me. But I, like any other fan, should have the opportunity to join and evaluate for myself.

As of Sunday, inquiries to Chancellor Blank on Twitter have gone unacknowledged. Email to a member of the athletic department indicated the issue would be investigated but with no date for an answer.

We are in unique times with all of us facing many challenges that were unexpected at the start of the year. But it is important that as we respond to those challenges, as Wisconsin Athletics has here, we keep our values and responsibilities in mind. Clearly someone at the university had the time to find this service. In fact, pregame radio interviews with members of the athletic marketing department repeatedly promoted how the team was looking to respond to COVID-19 and still create quality experiences for players and fans. This should have included accessibility and failing to do so is simply unacceptable.

Leave a Comment

Improved Sports Descriptions

I enjoy sports but often find myself wanting more details than even the best play-by-play announcer can provide. I find myself wondering if technology could help in the situation.

I’ll start by taking football as an example. As most fans of the sport with know you have 22 players on the field at any given time. It would be simply impossible for any announcer to describe the location of all 22 players, let alone what each of them is doing during a play.

Imagine though if you divided a football field up into a complete grid where you could express the location of any spot by a set of coordinates. Then imagine that you can describe the location of players based on this grid. So at the start of the play you could indicate where all 22 players are lined up. Then as the players move during a play, you could have technology that would communicate how any player is moving throughout the grid.

Or imagine if you could combine this technology with the accessibility improvements that have come in various touch devices so you could slide your finger around in the application window and tell the location of any player at any time. Obviously doing any of this in real time would likely still be tough but imagine if you could do this on demand for certain plays just to get a sense of what’s really going on.

I have no idea how possible this is with today’s technology. It seems like it could be an interesting research project for someone in the right environment. I’m tossing this out there because it’s some thing I’ve thought about lately. Anyone want to take up the challenge?

Leave a Comment

Another Day, Another Example of Missing Alt Text

As much as I’m sure anyone familiar with web accessibility doesn’t need yet another example of why alt text matters, as a consumer of web content I certainly am impacted when it is missing.

For anyone exploring cutting the cord, the U.S. Federal Communications Commission (FCC) has a handy resource to show you what digital television stations you can receive in your area with an antenna. Navigate to https://www.fcc.gov/media/engineering/dtvmaps and enter an address, city and state or zip code to get this information. Results are in a table that has headers and such. This is good.

Unfortunately, one of the key pieces of information, the signal strength from these results is a graphic. As you can expect from the title of this post, there is no alt text on these graphics.

Section 508 has been around for quite some time as have the Web Content Accessibility Guidelines. Proper alt text, again as I’m sure pretty much anyone working in the web environment knows, is a requirement. One can only wonder why something this basic was missed.

One Comment

A Request to Librarians: Please Ask OverDrive About Libby Accessibility

I’m a big fan of public libraries and the wide range of resources they make available. As a child making stops to my local bookmobile or summer afternoons spent at “Story Time Tree” to hear a fun adventure were two of my favorite activities. As an adult, I make frequent use of the eBook services, databases and other resources libraries make available.

OverDrive is as far as I know the largest player in making eBooks available to libraries. In many ways they provide a quality service but I’d encourage every librarian to fully understand the bargain you are making when you use OverDrive.

Would your library tolerate an author or other speaker coming to give a talk in your facility secretly whispering to some visitors they should not attend the talk? I think not, yet when you invite OverDrive into your facility, that is close to what you are doing.

OverDrive heavily promotes their Libby app as a part of the eBook services they offer. What I suspect most librarians do not know is that for users who rely on screen reading technology, the following is what greets their patrons when the Libby app is launched:

Welcome to Libby! This is a secret message for screen readers. We are working to improve your experience with this app. In the meantime, our OverDrive app is more accessible. You can find it in the app store. We thank you for your patience.

Libby is hardly a new app at this point and it should have been accessible from the start in the first place. This message has been present to the best of my knowledge for close to two years now. My own requests to OverDrive asking for any updates have gone without any meaningful response on multiple occasions.

Accessibility requirements too are nothing new. Nor are the technical details to make an app accessible a mystery. Apple, where this message appears on the iOS Libby app, has a wealth of resources. OverDrive itself by directing users to their older app, claiming it is more accessible, also demonstrates they understand accessibility to some degree.

I’d encourage librarians to ask OverDrive when this app will be accessible? Ask why is this message indicating the app has accessibility issues “secret”. It is beyond time that these sorts of challenges not be hidden away. It is time for them to be fixed and most definitely not hidden.

One Comment