ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming

University of Michigan
ACM Symposium on User Interface Software and Technology 2024 (UIST '24)
ProgramAlly system diagram showing the 3 main components: (1) program representation for creating and running programs, (2) program creation interfaces, and (3) a program generation server. (1a) The underlying program representation is in the form 'find ADJECTIVE TARGET on ADJECTIVE TARGET' with adjectives being color, location, or size, and targets being types of objects or text. (1b) Shows a diagram of how programs are run. For the example program 'find NUMBER on BUS', the app first looks for buses in the frame. Then, if a bus is found, the image is cropped to just contain the bus. In the cropped frame, text detection will be used to look for numbers. (2) The app has 3 ways to create programs: block-based programming, programming by example, natural language programming. (3) The program generation server is used to generate programs from the latter two interfaces. For programming by example, the app lists all possible nodes/ items to filter for. Then the server generates a scene graph from frames that contain that selected node, and uses that graph to reconstruct the program. For the natural language programming mode, the app uses a few-shot prompting approach for GPT-4.

ProgramAlly is an end-user programming tool for creating visual information filtering programs. ProgramAlly provides a multi-modal interface, with block-based, natural language, and programming by example approaches. ProgramAlly's main components: (1a) An underlying program representation, the framework for running visual filtering programs (1b). (2) A set of three, multi-modal programming interfaces to support programmers with different levels of expertise. (3) A program generation server which synthesizes filtering programs from images or natural language.

Abstract

Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating custom filters for visual information, e.g., `find NUMBER on BUS', leveraging three end-user programming approaches: block programming, natural language, and programming by example. To implement ProgramAlly, we designed a representation of visual filtering tasks based on scenarios encountered by blind people, and integrated a set of on-device and cloud models for generating and running these programs. In user studies with 12 blind adults, we found that participants preferred different programming modalities depending on the task, and envisioned using visual access programs to address unique accessibility challenges that are otherwise difficult with existing applications. Through ProgramAlly, we present an exploration of how blind end-users can create visual access programs to customize and control their experiences.

An overview of ProgramA11y's interface. (1) Block-Based programming mode. This shows a program called 'ride share program'. The screen has a program summary which reads: 'Find any text on any license plate on any car. Then, find any color on any car'. Below that there is a text box which says: 'Edit program with a question'. Finally, the screen shows each 'Find' and 'On' statement in the program and the interface for editing them. For VoiceOver users, this sounds like: 'Find any text, actions available. Edit adjective. Edit object text. Delete item'. (2) The next screen shows the menu that appears when activating one of the edit actions. It has a list of items like 'any text', 'email', 'number', etc. (3) Natural language programming mode. It shows a text box that reads 'Are my keys on the table?' with a button that says 'submit question'. (4) Programming by example mode. The interface shows a camera view on top, displaying a package. Underneath is a list of all text and items found in the camera feed that shows 'Eliza Li' being selected. Overlaid on the screen is an alert that says 'Generating program, please wait'. Underneath, the caption says 'demonstrate filtering by selecting an item'. (5) Shows the app running a filtering program. The title reads 'Expiration date program'. In the camera feed, there is a can of beans with an expiration date on the label. The program output reads 'Found date, JAN 10 2024'.

ProgramAlly's multi-modal programming interfaces using block-based programming, natural language programming, and programming by example approaches. Each approach is suited to different tasks or skill levels, and providing options makes ProgramAlly more broadly approachable and accessible.

Video Presentation (full talk coming soon!)

Citation

Jaylin Herskovitz, Andi Xu, Rahaf Alharbi, and Anhong Guo. 2024. ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming. In Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST '24). Association for Computing Machinery, New York, NY, USA, Article 85, 1–15. https://doi.org/10.1145/3654777.3676391

BibTeX


@inproceedings{10.1145/3654777.3676391,
  author = {Herskovitz, Jaylin and Xu, Andi and Alharbi, Rahaf and Guo, Anhong},
  title = {ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming},
  year = {2024},
  isbn = {9798400706288},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3654777.3676391},
  doi = {10.1145/3654777.3676391},
  booktitle = {Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology},
  articleno = {85},
  numpages = {15},
  keywords = {Accessibility, Assistive technology, Blind, Design, Do-it-yourself, End-user programming, Visual impairment},
  location = {Pittsburgh, PA, USA},
  series = {UIST '24}
}