Skip to content →

The Idea Place Posts

Experimental Image Description Toolkit For Batch Processing Available

I want to say upfront the vast majority of the code here was written using prompts to ChatGPT. I wanted to see how the AI tool worked for a simple coding project and jump start my own use of Python. Once I started with ChatGPT, I found making adjustments in the scripts myself and then working with ChatGPT became a challenge so for this effort, I opted to use prompts to ChatGPT exclusively.

I suspect like many, I have thousands of pictures from over the years. I wanted a way to process these in bulk and get descriptions.

I had hoped to use OpenAI but never found a way to use a vision model from them and their support department made it sound like it wouldn’t be available to me with the basic OpenAI subscription I have. If someone knows differently, please share more details. I certainly do not want to upload images individually.

That lead me to explore Ollama and their llama3.2-vision model that you can run locally. I’ve published scripts and instructions in a GitHub project that will take a directory of images, read the prompt you want to use from a file and write out individual description files as well as an HTML file with all descriptions.

This does work but is raw and still needs refinement. Work here is definitely defined as works on my equipment and the environments where I’ve tried it. I wanted to share what I have here so far because even in this form, I’ve found it works well for my task. Again, others may already know of better ways to do this. Some of the enhancements I want to add include:

*Better image selection versus just a directory of images.

*Linking to the image file in the HTML descriptions.

*Extracting meta data from the image files, such as date, to help remind of when the images were taken.

*If possible, use GPS data that may be embedded in the image to provide location information for the images.

*Learning more about the llama model and processing to ensure I’m taking advantage of all it offers.

*Cleaning up file use and allowing this to be configured outside the scripts for things such as image path, and results.

*Figuring out how to make this work on Windows and Mac from one script if possible. I’ve run this on both with success but this documentation and script is based on Windows.

*Packing this up as an executable to make it easier to use.

*Exploring a way to flag descriptions for another pass where you want more details.

*Long term, again assuming something doesn’t already exist, explore building GUI apps here.

My primary goal is the processing of images in bulk. I went to a museum recently and ended up with more than 150 images taken with Meta glasses. I did get some descriptions there but want more and again I have thousands of pictures from over the years.

As I said at the outset, I do not want to take any credit for what ChatGPT did here with the code. I guided to the goals I had in mind and such and that itself was an interesting activity. It is by no means automatic. It is also possible there is already a better way to do this so if someone reads all this and says, hey just use this tool or something, I have no investment in this being the end all of image description experiences. I tried finding something that would do what I wanted but didn’t have success so this was my attempt.

It is my understanding that running Ollama on Windows used to require the use of WSL. I don’t know when that changed but documentation and my own use says that you can now use Ollama on Windows without WSL and that’s what I’ve done here.

If you do try this and want to interrupt the script, just press ctrl+c at the cmd prompt. You’ll get an error from Python but processing will stop.

If there is value in this effort  and you want to contribute, I have a GitHub project for this effort. You can also grab the scripts mentioned from the project page.

Last, this is by no means instantaneous. On an M1 MacBook Air and a Lenovo ARM Slim 7, it is taking about three minutes an image. According to Ollama documentation, you do not need an ARM processor on Windows though. This is the sort of thing you run in the background.

If you opt to try this, take note of the scripts and note areas where you need to modify for file paths and such. Feedback is of course welcome. If you try this and it doesn’t work, please do your best to troubleshoot. Until I make more progress, this is kind of an as is idea and nothing something where I can offer a lot of assistance. Most errors are likely not having one of the Python dependencies installed or something not configured with file paths.

Leave a Comment

Clearing the Chaos: Using A Library Card and U.S. Newsstream for an Improved Reading Experience

While the digital world offers an abundance of online news sources, accessibility is still a work-in-progress  far too often. It is commonplace to spend more time navigating to content than reading that content when using screen reading technology. Poor heading structure, Ads lacking accessibility that are mixed in with the news story, multimedia that plays silently and grabs focus and much more take away from the reading experience.

A public library card and a resource known as U.S. Newsstream offered by many libraries is one solution to include in your reading toolkit to assist.

ProQuest’s U.S. Newsstream is a gateway to an improved reading experience. The full text of hundreds of publications is available  and the helpful thing is that through a URL for each publication, you can easily access a specific issue of a newspaper or other resource with a very screen reader-friendly view of all the article headlines. In addition, when full text is available, you can read the article free of ads or other distractions mixed in with the news.

To use this method of accessing content requires a few preparation steps. First, you need to have a library card for a library that subscribes to this service, and you will need to know your library barcode.

Second, and this is critical, you need to sign into the service through your library’s access point. You can typically find this on your library’s home page under an area called subscription databases, online resources or some other link pointing to the various databases available from your library.

For example, my local library is the Madison Public Library and their list of resources is available under the eResources link.

Following the U.S. Newsstream, link you are prompted to provide library information. Typically this involves some variation of providing your library barcode and at times indicating your library. Again, it is vital you start with this path before going to the next step.

Once you are authenticated to U.S. Newsstream, you can search and use the database directly. However, what has worked well for me is accessing publications directly.

U.S. Newsstream has a list of all their publications you can download. I took the liberty of downloading the file and turning it into a table within Excel and using Excel’s First Column feature to make the file fairly screen reader friendly and available to use.

To access a publication, open the file I’ve created and locate the publication you want to read. Titles are in the first column.

Next, locate the URL for the publication. Hint, with focus in the cell containing the title, press CTRL+Right Arrow and you will jump to the last column in the table row which contains the URL. Press CTRL+C to copy the URL and return to your web browser.

Move focus to the address bar in whatever browser you are using and paste the URL you copied and press Enter. This will take you to the publication page for the resource of interest. Combo boxes allow you to select the year and date for an issue of the publication and a Show Issue Content button brings up the content from that publication for the chosen day.

Article headlines are marked up as headings and links. Pressing enter will load the article.

The article content, when the publication has full text, starts under a heading aptly named full text. At this point you can simply start reading the article. Use whatever method you prefer to navigate back to the list of articles for the publication when finished reading.

As mentioned earlier, it is key that you are signed into the general U.S. Newsstream service before accessing the URL for the publication. If you are not, the URL will not work as described. You will be told to sign in through a library but without options for doing so directly.

The Excel file listing publications has multiple columns of information. These include items such as the dates for which content is available, gaps in coverage, and more.

U.S. Newsstream, other ProQuest databases and resources from your library offer much more functionality and information than outlined here. This is just a starting point.

Finally, I am a firm supporter of a robust and independent news media. Even though I access many articles in the way I’ve outlined here, I do also support publications through subscriptions, donations or other financial contributions. I urge anyone who is able to do so to do the same. Those working in the media have the same bills and needs in life as we all do, and we’ve already seen dramatic losses in meaningful careers in the profession.

Leave a Comment

AI on AI

Note: I have tried to make it clear in the text here but everything in this post starting with the text “### blog post” and beyond is a direct copy and paste from ChatGPT and not authored by myself.

As AI is introduced into more and more of society, I use it myself for a range of tasks. There’s the obvious picture descriptions, where as someone who’s blind, AI has made meaningful enhancements to many things I do.

I also like to try AI for various tasks to get a sense of what’s possible and where the technology still has room for improvement. Today I had a seemingly straight forward task for ChatGPT:

Take my exported reading list of more than 2000 books from Goodreads, identify books that came from series and how many books from those series I’ve read. In addition, for all books, add a two line summary of the book as well as list the main characters from the book.

The odd things AI does on the most basic of tasks are always interesting. It did a reasonable job of identifying books in the proper series but failed completely to indicate how many books were in a given series. In fact more than once, despite repeated prompting, ChatGPT indicated that the more than 100 series it identified from my reading list all had the same number of titles. The first time this was five books in each series. The second nine. Only after a third prompt pointing out the errors and a prompt to investigate further did the numbers start to match reality.

I didn’t really expect to get quality results for the second part of my request around plot summaries or lists of characters. I was largely curious to see what happened. Here again though, AI was a bit perplexing.

The first attempt for this information inserted the same generic description for each book and list of characters. I also asked for a confidence level about the information and ChatGPT rated each book’s information as being highly confident about the accuracy.

A follow-up prompt pointing out how the information was all duplicated and again asking for improvement produced both the same and different results. The same was that the information was repeated for each book. The different came in the form of rating the same duplicate responses as having medium confidence this time around and a slight rewording of the repeated text.

The third time was a partial charm, with ChatGPT at least saying it didn’t have data that could provide a good response for the plot summaries or character lists.

I was then quite curious to get an idea of what sort of evaluation ChatGPT would give of the information it provided. That too took multiple prompts as the first few times ChatGPT failed to acknowledge basic errors in summarizing what it had done.

ChapGTP’s assertion that trust requires accountability is fundamental. Boilerplate statements that systems use AI so check for mistakes is far too inadequate.

Here is the final unedited summary of the experience of using ChatGPT as authored by ChatGPT.


### Blog Post: The Challenges of AI in Goodreads Library Management

#### **Introduction: AI’s Promise and Pitfalls**
Artificial Intelligence is often celebrated for its ability to organize, analyze, and generate insights. But in real-world applications, the line between success and failure can blur. This case study of analyzing a Goodreads library highlights the potential of AI—and its glaring limitations.

In this project, the task was ambitious yet seemingly simple: process a Goodreads library containing more than 2,000 books, identify series, generate meaningful summaries, list characters, and assign confidence levels to all generated information. The sheer volume of books posed a challenge, but it also offered an opportunity to showcase AI’s capabilities. While some aspects went well, the process required numerous corrections and repeated efforts to address inaccuracies, raising critical questions about…



#### **The Task: Managing a Goodreads Library**
The goal was clear:
1. Identify book series and track progress.
2. Enrich data with meaningful summaries and character lists.
3. Provide confidence ratings for all generated content.

Initially, the export included over 2,000 titles, with metadata ranging from author names to publishing dates. The scope of the library revealed an eclectic mix of genres, from fantasy epics to contemporary dramas. Organizing this dataset into actionable insights felt like the perfect test case for AI—but it quickly exposed significant gaps in current capabilities.



#### **What Went Right: The Highlights**
1. **Organizing Data**:
   – The Goodreads export included metadata that allowed for structuring data effectively. Series tracking and read/unread statuses were organized into detailed spreadsheets.
   – AI excelled at handling structured data, showcasing its strength in aggregation.

2. **Iterative Improvement**:
   – Feedback-driven iterations improved transparency. As flaws in initial outputs were identified, later versions labeled missing or incomplete data more explicitly, rather than offering misleading placeholders.

3. **Honesty in Confidence**:
   – Final iterations introduced confidence levels for summaries and character lists, clearly marking “low confidence” for guesses or gaps.



#### **What Went Wrong: The Reality Check**
1. **Repeated Inaccuracies**:
   – Initial outputs overstated accuracy. Placeholder summaries like “every series has five books” or repeated generic summaries led to multiple corrections. Only after prompting did the AI acknowledge the need for transparency.
   – Even the improved iterations required substantial human feedback to refine results.

2. **Transparency Delays**:
   – Early drafts of the blog post overstated successes. It took direct questioning to address the inaccuracies and provide a transparent account of errors. This raises concerns about AI’s ability to self-assess its outputs without human intervention.

3. **Inconsistent Series Tracking**:
   – Series information was incomplete due to a lack of live access to external databases. Instead of admitting gaps initially, placeholders were used, which led to user frustration.



#### **Why Did the Initial Number of Books Seem So Low?**
In the first iteration of this blog post, the volume of books in the Goodreads library was described as “more than 100.” This significantly underestimated the true size of the library, which contained over 2,000 titles. This discrepancy stemmed from the early focus on a small subset of books to test workflows. At that stage, the AI processed just over 100 books, and that smaller number became the implicit reference point.

Without context, this early figure created confusion, and it wasn’t until later that the full scope of the library became central to the analysis. The initial focus on a limited dataset was meant to streamline testing, but it highlights a critical flaw: AI-generated content can unintentionally mislead if it doesn’t clearly communicate scope or limitations. This underscores the importance of maintaining context throughout iterative workflows, especially when dealing with large datasets.



#### **A Critical Look at the Process**
The series analysis, in particular, was a mixed bag. Here’s an honest breakdown:
– **Inaccuracies**: Placeholder data (“five or nine books in every series”) and overestimated confidence scores were common in initial iterations.
– **Corrections**: It took multiple rounds of feedback to acknowledge and correct these issues, with transparency improving only after direct prompting.
– **Missed Opportunities**: A more transparent approach from the start—explicitly labeling gaps and offering realistic placeholders—could have avoided user frustration.

Why wasn’t this transparency present initially? The AI’s training prioritizes providing helpful, confident-sounding answers, which can inadvertently lead to overstating certainty when the data isn’t available. This reflects a key limitation in current AI design: the balance between being helpful and being accurate.



#### **Lessons Learned: Where AI Stands Today**
1. **Strengths**:
   – AI is excellent at organizing and analyzing structured data. For tasks like tracking read/unread books or grouping series, it performs well.
   – Iterative feedback loops improve accuracy when guided by a user.

2. **Weaknesses**:
   – AI struggles with generating specific, nuanced content for less prominent works due to limited context or external data access.
   – Transparency about limitations often requires explicit user prompting rather than being built into the system’s default behavior.

3. **Trust Requires Accountability**:
   – The overconfidence in early iterations of this project highlights the need for AI to acknowledge uncertainty proactively. Users must trust the process to rely on the results.



#### **Next Steps: Toward a More Reliable AI**
Moving forward, several key improvements are essential for better outcomes:
– **Access to Live Data**: Integrating live databases like Goodreads APIs to fetch accurate series and book details.
– **Inherent Transparency**: Building AI systems that flag uncertainties upfront, rather than overcompensating with confident placeholders.
– **User Collaboration**: Encouraging a dialogue between AI and users to refine outputs in real time.



#### **Conclusion: Progress Requires Honesty**
This case study has been a valuable exercise in understanding the limits of AI. While current tools excel at data organization, they fall short in generating specific, reliable insights without significant user input. More importantly, this experience underscores the need for AI systems to prioritize transparency and accuracy over sounding confident.

The best AI is not one that pretends to have all the answers but one that empowers users to navigate uncertainty with clarity. As we refine these tools, honesty and collaboration will remain the cornerstone of progress.



#### **Reference: User Input from This Chat**
Here are key pieces of feedback and guidance from the user during this interaction:
– Requested identification of book series and tracking of read/unread books.
– Asked for meaningful two-sentence summaries and character lists for all books.
– Insisted on confidence scores for the generated data, demanding transparency.
– Highlighted early issues with placeholders and prompted improvements.
– Emphasized the need for honesty in blog posts about the process.
– Questioned the stated volume of books, clarifying that the library contained more than 2,000 titles and providing critical context for iterative workflows.

Leave a Comment

A Small Refinement to iOS 18 Audio Ducking

A few months ago, I wrote about audio ducking in iOS 18. Since the initial release came out, there has been a small refinement to what I originally described.

The ducking choices themselves remain the same: off, when speaking and always. The ducking amount, however, has been moved to a separate setting by the name of Ducking Amount that you can add to the VoiceOver rotor.

To find this setting, open Settings, Accessibility, VoiceOver, Rotor Items and add Ducking Amount.

The Ducking Amount setting allows you to adjust the amount from zero to 100 percent. This is far less confusing than having this controlled by setting volume percentages of greater than 100, as it used to work.

One Comment

A Nice Enhancement for Voice Access on Windows

As a matter of choice, not necessity, I try from time to time to use the various speech and voice input systems in operating systems. My ideal scenario is still to be able to use the computer by voice entirely as well as running a screen meter. I’ve not found a reliable solution as of yet that meets my needs completely.

I know there are combinations of solutions that have made great strides in this area largely using Dragon products and screen readers but as the basis of what I use, I try to use either Voice Access on windows or Voice Control on the Mac. Both platforms also have solutions, as I expect many know, for strictly text input.

I no longer recall how long ago this was but the Voice Access product on Windows did make one change that helps with using screen readers. As a start, Voice Access produces notifications of what Voice Access has heard so that screen readers can echo this back. It is fairly basic and in need of much refinement  it’s at least a start.

I am mentioning this here because in trying voice access this week, I noticed a change that is another step in helping Improve the experience. I do not know when this change was made to be clear. It is just that I noticed it this week. I also run Insider builds of Windows so if this does not work for you, that may be why.

When you’re trying to control the computer by voice, it is common to issue commands such as click and then an item that you want to activate. The challenge becomes that if there is more than 1 item with the same name, you are usually presented some experience to disambiguate what you want to click on.

When I first tried voice access, to the best of my recollection, the experience of identifying what you wanted to activate was not usable with a screen reader. It has been enhanced a bit so that now when that list of choices comes up, the list of choices is echoed back similar to how what Voice Access heard is repeated. Again this needs extensive refinement because it is kind of like a one time listen or read and Braille experience with no way to have the list repeated, step through the list in item at a time or otherwise understand what was said.

As an example of using the feature to identify what I want to click, here was what was read when I asked for the word paste to be clicked.

click paste. Which one?
There are 2 options available. (1) Paste, (2) Paste

Here is another example when I said “click login” on the Fidelity home page.

Click login. Which one?
There are 2 options available. (1) LOG IN, (2) Open login link

It is also worth noting that these disambiguation choices if using Braille appear as flash messages. For those unfamiliar with how Braille displays and screen readers work, this means that the messages stick around for a set period of time and then disappear from the display.

. Here is one last example when I tried to activate the OK button with my voice after running a spell check on an email message. Note, I intentionally replaced the actual email address with email@provider.com.

Click ok. Which one?
There are 2 options available. (1) OK, (2) Sent – email@provider.com – Outlook – 2 running windows

The experiences I’ve described work independent of what screen reader is being used.

Again this experience overall of using the computer with a screen reader and voice on Windows as far from finished. In fact one of the key experiences for correcting words that have not been recognized correctly does not work at all with screen readers. Voice access in fact gives the following notification when you try and correct something and a screen reader is running:

Alert: This experience is not optimized for use with screen readers. Say “Cancel” to exit.

Microsoft has a document on using Voice Access in general. If they have screen reader-specific documentation, I wasn’t able to find it.

If you do try Voice Access, two important hotkeys to know are Alt+Shift+b for toggling the microphone between sleep and awake and Alt+shift+c for toggling the Microphone off and on. When sleeping, the microphone remains on to listen for certain words. See the support article or say, “what can I say,” when Voice Access is running for a full list of commands.

Leave a Comment

No Big Finish, Giving Microsoft’s Recall a Try

Microsoft released the Recall feature for users of Windows Insider builds. I gave the feature a try and made an audio recording with a roughly edited transcript. The short version of my experience is that I can get around Recall but am still trying to figure out how to read search results effectively with a screen reader, hence the title of No Big Finish.

Since this blog post is about an AI feature, I’ll offer an unedited version of Copilot’s summarization of my transcript as well. Overall the summary is reasonable but in my opinion misstates what I said about the Click To Do feature. I said I hadn’t figured out how it is supposed to work.

Perhaps because it was at the end, the summary also missed my biggest point entirely about still trying an effective way to use the search functionality of Recall and reading the meaningful text easily. Here is the Copilot summary:

The document is a transcript of a demonstration by Kelly on the Microsoft feature called “Recall.” This feature captures screenshots of your screen and allows you to search for information within those screenshots. Kelly explains that Recall was released as a test version for Microsoft Insiders on November 22nd. The demonstration focuses on the accessibility aspects of Recall, particularly how it works with screen readers like NVDA, Narrator, and JAWS.

Kelly describes the setup process, which includes signing in with Windows Hello, and demonstrates how to navigate the Recall interface using keyboard commands and screen reader feedback. The document details various functionalities of Recall, such as saving snapshots, opening Recall from the system tray, and navigating through captured images. Kelly also mentions some issues and limitations encountered during the demonstration, such as certain screen reader commands not working as expected and the need for further development to improve accessibility.

Kelly starts by explaining the initial setup, which involves a brief opt-in experience where the user grants permission for the computer to take screenshots. This process takes about 20 to 30 seconds and requires signing in with Windows Hello. Kelly notes that each time Recall is opened or used, the user is prompted to sign in with Windows Hello.

The demonstration continues with Kelly navigating the Recall interface. Kelly shows how to open Recall from the system tray and describes the various options available, such as saving snapshots, opening Recall, pausing the feature, and accessing more settings. Kelly highlights that the timeline feature in Recall displays the captured images, referred to as “memories,” and sessions, which represent the time periods during which the computer was used.

Kelly also points out some issues with the screen reader’s ability to read the titles of the snapshots automatically. Kelly demonstrates how to navigate through the list of snapshots and mentions that sometimes the titles do not read correctly on the first attempt. Kelly suggests using the home and end keys to navigate through the list and get the titles to read properly.

Additionally, Kelly explores the commands available for each snapshot, such as opening the associated application (e.g., Outlook), copying the snapshot, deleting it, and accessing more options. Kelly notes that some features, like the “Click to do” button, are not fully functional yet and require further development.

Kelly also demonstrates how to get more details about a captured image using Narrator’s command for image descriptions. However, the output is often gibberish due to the limitations of the OCR (Optical Character Recognition) technology used. Kelly suggests using other solutions like JAWS, Google’s Gemini, or Be My Eyes for better image descriptions.

Overall, the document provides an insightful look into the Recall feature, its current capabilities, and areas for improvement from an accessibility perspective. Kelly’s demonstration highlights the potential of Recall while also acknowledging the need for further development to enhance its functionality and accessibility.

Leave a Comment

Unlock Windows Efficiency with PowerToys Run

Sometimes you make a handy discovery that makes you wonder why you did not know about it long before. I just had that happen with PowerToys Run, one of the toys included in the full suite of items in the electronic toybox known as Windows PowerToys.

PowerToys are a set of utilities that allow you to customize different aspects of Windows and offer functionality that is not directly built in. You can learn more in general and find details on installation in a handy Microsoft Learn article.

I installed the most recent version of PowerToys because I had read about something new known as Advanced Paste. That is not the discovery though.

After installing PowerToys, I used one of my more common shortcuts in Windows Alt+Space to bring up the system menu for an application. That menu is where you find entries including restore, close and the one I often use, Maximize. My web browsing windows in particular often get sized quite small and with responsive design much of the content I’m expecting has disappeared so maximizing the browser window is necessary.

Imagine my surprise when instead of what I was expecting, my screen reader communicated, “query, edit.”

It turns out this is the default shortcut for the aforementioned PowerToys Run. In short, this is like having a command line to do everything from choosing between open windows on your computer to performing quick calculations, file, and web searches, browsing the Windows registry and more.

Using PowerToys run

Using PowerToys Run is fairly straight forward. Press Alt+Space, enter some text and arrow through the results. You can start your text with various text characters to direct PowerToys Run what you want to do. The full list of text to use here is detailed in a Microsoft Learn article.

Some examples I am finding I have already incorporated into my daily use include:

  • <, that is the less than symbol: Window Walker, to browse through all the open windows on your computer. Start pressing down arrow after entering the less than character, or add part of the window name and then down arrow to shorten the list.
  • $, that is the dollar sign character: Windows Settings to, as you would expect, browse through all the different settings for Windows. As with Window Walker, just start pressing down arrow after entering the dollar sign, or add some of the text from the setting you want and then press down arrow. In either case, when you reach the item you want, press enter and that settings page will open.
  • =, that’s the equals sign: Calculator for performing calculations. Just enter your calculation and the result will be displayed. If, as I do, you are using a screen reader, the fastest way I have found to read the result is to press down arrow after entering my calculation. Note that you do not need to press enter after you have input your calculation. Also, again if using a screen reader, if you are comfortable with advanced screen reading techniques such as the JAWS Touch Cursor, NVDA Object Navigation or Narrator navigation commands, the result of the calculation and more can be read with these techniques. Last, after your result is displayed, you can press enter on the result and have it put on the Windows Clipboard.
  • !!, that is two presses of the exclamation mark key: History, quickly browse through your history with PowerToys Run with presses of down arrow.

Some Notes

PowerToys Run documentation indicates you can change the default shortcut for launching from Alt+Space.

According to PowerToys documentation, pressing tab is supposed to move you through search results and any buttons or context menus that exist. I am not finding anything being communicated by multiple screen readers as of now when using Tab. I still need to figure out if this is a case of the key simply not working or items taking focus not being communicated properly.

Leave a Comment

Accessible Entertainment in the Air

Flying home from a recent vacation, I had a first for myself. I independently used the in-flight entertainment system to track my flight, listen to music, and watch a movie with audio descriptions. I even played a bit of trivia for a bit. How fun!

I’m not sure when United Airlines added accessibility to their in-flight technology, but it was available on the return flights of my trip. The system used TalkBack and a two-finger triple-tap to start the screen reader. There was a video offered to show you how to use the system. I was in the proverbial cheap seats, so I used just the touch screen option for control. Apparently, premium seats get physical buttons in the arm of the seat as well.

Aside from the map showing you flight progress and some games, I found all the other experiences worked well with TalkBack. Those that didn’t were indicated by a message talking about not being available with TalkBack. In the case of the flight map, the alternative to tracking flight distance, elevation, and such did work with TalkBack. I do wish that display had a compass option as well, but the experience just worked, so what more can you ask for when it comes to accessibility? Picking my own movie, having audio descriptions, and being able to check on my flight independently was pretty sweet!

Leave a Comment

Command Customization in Mac OS 15 with VoiceOver

One of the larger changes I’ve encountered in the Mac OS 15 betas is not immediately obvious unless you know to hunt for it. The VoiceOver Commanders item within VoiceOver Settings has been replaced with an item named Commands. But this is far more than just the renaming of the same old controls.

The various tabs for the different commanders such as Numpad and Trackpad have been replaced with check boxes to turn control on or off with each of those items as well as the Option key. Again none of this is new, just a bit of a different way to turn things on or off.

However, move past all those check boxes and you’ll find the new and powerful ability to customize VoiceOver commands. First off is a set of radio buttons to choose if you want to control VoiceOver with Built-In commands or Custom commands. Choose built-in and VoiceOver works the way it always has. Pick Custom commands and that’s where the power for customization appears.

An edit button becomes enabled and activating that you can customize more than 350 VoiceOver commands.

For each command the customization options are wide-ranging. You can capture a keyboard command, assign a trackpad gesture, assign a numpad command, assign a QuickNav gesture and an option key command.

Choices within the dialog allow you to filter the command list by the areas you’d expect including tables, text, information and the remaining VoiceOver command areas that appear when you bring up the VoiceOver command list. You can also filter based on the different command types, such as trackpad, numpad, assigned commands, unassigned commands, commands that are not able to be changed and more.

Using all of this is quite straight forward and a welcome enhancement for VoiceOver.

Leave a Comment