thunderbong 3 days ago

From the article -

> Windows might not be trendy among developers, but it’s where accessibility works best. I don’t have to worry about whether I can get audio working reliably. The NVDA screen reader works on Windows and is free and open source, actively maintained, and designed by people who are screen reader users themselves.

> That said, I’m not actually developing on Windows in the traditional sense. WSL2 gives me a full Linux environment where I can run Docker containers, use familiar command-line tools, and run the same scripts and tools that my colleagues use. Windows is just the accessibility layer on top of my real development environment.

> I use VS Code. Microsoft has made accessibility a core engineering priority, treating accessibility bugs with the same urgency as bugs affecting visual rendering. The VS Code team regularly engages with screen reader users, and it shows in the experience.

  • mbb70 3 days ago

    His comments on VS Code reminds me of the quote "Good accessibility design is mostly just good design":

    > Consistent keyboard shortcuts across all features, and the ability to jump to any part of the interface by keyboard

    This is something I notice and appreciate about VS Code as a fully sighted person. Just like I appreciate slopped sidewalk cutouts when I'm walking with luggage.

    A11y is a big commitment and cost, and of course not all a11y features benefit everyone equally, but it has a larger impact than most people realize.

    • Neurrone 3 days ago

      Yup, it does. I was an early adopter of VS Code. It has been extremely satisfying seeing the progress they've made with accessibility. I provide feedback on a semi-frequent basis. Nowadays its only to flag regressions.

    • hombre_fatal 3 days ago

      I've been working on polishing accessibility features for a hobby web app, mostly out of curiosity to see what a deep dive would look like.

      Some of it is definitely UX polish, like if a button can be pressed to toggle a sidebar, then ESC should dismiss the sidebar and return focus to the button that toggled it. And when the sidebar opens, focus should be moved to the top of the sidebar.

      Though you can also get trapped in a fractal of polish-chasing. When it comes to screen readers and live-content like a rich chat app or MUD client, I'm not sure how you would target anything more broad than, say, Safari + VoiceOver on macOS and then some other combo on Windows. You quickly realize the behavior you see is an idiosyncrasy in the screen reader itself.

      • Neurrone 2 days ago

        > Though you can also get trapped in a fractal of polish-chasing

        I think this applies to anything :)

        > You quickly realize the behavior you see is an idiosyncrasy in the screen reader itself.

        Yeah this is definitely a pain point during development. There is standardization and efforts to reduce these differences though, so I hope this gets better over time.

    • gtirloni 3 days ago

      Fixing issues that seem to only affect people with disabilities often have positive unexpected consequences for others, indeed.

    • necovek 2 days ago

      Apart from rolling luggage, one also experiences a lot of physical world a11y issues when they start taking their kids around in baby strollers.

  • esafak 3 days ago

    MS has to make accessibility a priority because it's mandated by government (its customer).

    Smaller companies would benefit from better libraries and design systems that make it easier to incorporate accessibility. Make accessible the default.

gizmo686 4 days ago

Some vaguely related research: https://www.science.org/doi/10.1126/sciadv.aaw2594

If you compare different languages, the speed people tend to speak (measured in syllables per second) varies significantly. However, the number of possible syllables also varies significantly. Once you account for that, the speed of speaking in terms of information is fairly consistent across languages.

I'm not aware of any specific research directly on point to what the author of the posted blog describes. But his hypothesis that having a consistent speaker reduces the cognitive overhead of decoding seems to be part of the story.

However, we would expect a similar effect in people who read, as the writing is also highly standardized. However, I've generally seen silent reading speads for English estimated at around 250. Getting up to 800 WPM puts you well within the realm of speed reading territory.

The relatively high structure of code and rote emails probably helps too.

  • ljf 3 days ago

    I've been using Optimal Recognition Point (ORP) style apps (I use Balto Speed Reader) to comfortably read at 800wpm or more - for fact books and materials I can keep this up for quite some time - though for fiction I find it too fast to parse all the characters and tones of voice (https://www.researchgate.net/figure/Spritz-display-The-Optim... )

    • giantrobot 3 days ago

      Holy shit getting to that paper was such a hassle. I had to jump through so many hoops to just download a PDF. Not your fault at all but anti-bot stuff and general enshittification has is ruining the web for actual people.

  • Neurrone 3 days ago

    I was referring to how synthetic speech always talk in exactly the same way down to inflection and pauses whenever it encounters the same phrase, which isn't how people talk. So this helps a lot with comprehension. Structure of the content does help as well.

  • poulpy123 2 days ago

    The fact that the speed of information is consistent across languages makes it unlikely to have a speedup as described by the article

Fokamul 4 days ago

Big respect to you, I cannot imagine being functional without sight. Pretty interesting post.

"I use my computer during natural pauses in the conversation or presentation."

Casually dropping that everyone is speaking so slow. That you must use the time between sentences for something meaningful, pretty funny. :-)

  • Neurrone 3 days ago

    > Casually dropping that everyone is speaking so slow. That you must use the time between sentences for something meaningful, pretty funny. :-)

    Didn't mean it that way . But that is truly the only way that I can use the computer while in a meeting.

    • joseda-hg 3 days ago

      If I'm listening to a Podcast or a Video at 2X or 3X, coming back to normal feels glacial by comparisson

      If you're used to 800 wpm bursts, I'd Assume normal speaking pace will feel slow any way you cut it

      On a slightly related note, have you felt this affecting your speech patterns?

      • Neurrone 3 days ago

        > If you're used to 800 wpm bursts, I'd Assume normal speaking pace will feel slow any way you cut it

        Edit: if you're referring to videos or podcast then yes, assuming the objective is to get information as quickly as possible.

        Actually that isn't really the case. That might happen if you're asking someone to read something to you for an extended period of time, but that's not how normal conversations happen.

        > On a slightly related note, have you felt this affecting your speech patterns?

        Nope.

Neurrone 3 days ago

Author here, surprised this somehow got onto HN since I only posted on Mastodon.

Happy to answer any questions.

  • c6401 3 days ago

    Really liked the article.

    The interesting part for me was that you can recognize synthetic voice much faster than human speech. Is there a specific voice you are using for 800wpm or it can be any TTS? Also, I think older voices sound more robotic that the newer ones (I mean pre AI, like the default on android is newer for me). Is there a difference for how fast you can listen to the newer more nicely sounding ones or the older more robotic ones?

    • Neurrone 3 days ago

      > Is there a difference for how fast you can listen to the newer more nicely sounding ones or the older more robotic ones?

      Yes. The main requirements for the TTS I use is it must be intelligible at very high rates of speed and it must have no perceivable latency (i.e, how long it takes to convert a string of text to audio). This rules out use of almost all voices, since a lot of them are focused on sounding as human as possible, which comes at the expense of being intelligible at high rates. The newer voices also usually don't have low latency.

      > Is there a specific voice you are using for 800wpm or it can be any TTS?

      I'm using ETI Eloquence. If I switched to another voice capable of being intelligible at ESpeak, I would have to slow down because I'm not used to it and have to train myself to get back to the speeds I'm used to.

      • c6401 2 days ago

        Thank you for the answers. Even I'm not new to TTS usage, overall, this feels a bit like cyberpunk for me, like a neural interface that can provide you information as fast as you can consume it, not just how fast your "ears" can recognize it. Like a human modem.

  • Vinnl 3 days ago

    Great post. When you're reading a Mermaid diagram, do you just happen to have memorised that "dash dash greater than" means "arrow"? I assume the screen reader doesn't understand ASCII art.

    And how painful is reading emails? HTML email is notoriously limited compared to HTML (and CSS) in the browser, but it's pretty hard to add structure to a plain text email too. How annoying is it when I do so using e.g. a "line" made out of repeated dashes?

    • Neurrone 3 days ago

      Oh boy, don't get me started on emails. HTML emails are such a pain because of the hacks needed to get it to render properly across multiple devices. So I hear a lot of information about the tables being used for layout purposes, which is a pain because the tables are not semantically meaningful at all. And then there are emails that just have one or more images.

      For a line of dashes like "-------", most screen readers can recognize repeating characters, so that string gets read for me as "7 dash". If using an <hr> element, then there is no ambiguity about what it means.

      • cess11 3 days ago

        Users of the email client mutt has a similar problem, it doesn't render HTML and CSS and displays it as text, so instead they've developed a variety of workarounds like pushing the email body through a terminal web browser before showing it in mutt.

        Might work for you too.

        Edit: Also, do you MUD?

        • Neurrone 3 days ago

          Oh yes. It was one of my formative childhood experiences. My first mud was Alter Aeon, but I haven't played in almost 10 years. I enjoyed myself during the 5 years or so that I played and got to know a lot of people. The first first thing I ever programmed was a bot to automatically heal group members.

          Then Empire Mud, but I left due to disagreements with the admin. I loved the concept but it didn't really have the playerbase to support it.

          More recently, I was on Procedural Realms. But I was affected by 3 separate instances of data corruption / loss, the last of which resulted in an unplanned pwipe since there were no offsite backups and the drive on the server failed. Years of progress gone due to lack of backups, so I'm never going back.

          Ever since, I've been trying to find something else. Perhaps I'm just getting older but I don't have the patience to grind that I once had, which rules out most hack and slash muds. These days, I prefer something with interesting quests, places to explore and mechanics.

          What muds do you play?

          • cess11 2 days ago

            Neat. I mostly play Discworld MUD, which isn't very often due to small kids these days. It's a good all-rounder, has both fine grind and massive amounts of quests, exploration and crafting. Over the years I've become friends with many screen reader users there, and some of them were the fastest hunting group leaders I've seen.

            http://discworld.starturtle.net/lpc/

            • Neurrone 2 days ago

              I tried it briefly but bounced off after the tutorial finished, I couldn't figure out what to do.

              Is reading the books required for enjoyment? I haven't read anything from the Discworld series.

              • cess11 2 days ago

                After the tutorial, if you choose morporkian as language and Ankh-Morpork as starting location you'll be put in one of the busiest places in the world, outside a bar. Either outside or inside you'll find people who can help you get started. The 'say' command says something to the entire room, and 'tell username message' sends them a private message.

                There's also a newbie group chat where you can ask for help, the syntax is 'newbie' followed by your message. It'll go away once you get too many levels in your skills.

                A drawback with Ankh-Morpork is that it has cops, they might interfere if you decide to attack something that isn't a rat or cockroach or somesuch, but if you get caught and put in jail you'll eventually be released. Getting killed is a bit worse, you either waste your experience points by getting a raise from an NPC, or send a message to a particular type of priest that can resurrect you.

  • dcre 3 days ago

    Great article. I was of course surprised to learn that it's possible to learn to understand the super-fast TTS, since videos and podcasts start to get very tough to follow around 2.5x and higher. I've been wondering: surely better algorithms for generating high-speed speech are possible, especially as we have more and more compute around to throw at it. It's not easy to search for, since "speed" for most tools is about speed of generation rather than wpm. As normal-speed neural net TTS models get incredibly good, I am hoping to see more attention paid to the high-speed use case.

  • xiande04 3 days ago

    Thanks for the blog post!

    I was wondering what TTS voices you use? I've heard from other blind people that they tend to prefer the classic, robotic voices rather than modern ML-enhanced voices. Is that true in your experience, too?

    • noahjk 3 days ago

      That was my initial thought, too - "I bet they can use a nicer voice now!"

      Sounds like the robotic voice is more important than we give it credit for, though - from the article's "Do You Really Understand What It’s Saying?" section:

      > Unlike human speech, a screen reader’s synthetic voice reads a word in the same way every time. This makes it possible to get used to how it speaks. With years of practice, comprehension becomes automatic. This is just like learning a new language.

      When I listened to the voice sample in that section of the article, it sounds very choppy and almost like every phoneme isn't captured. Now, maybe they (the phonemes) are all captured, or maybe they actually aren't - but the fact that the sound per word is _exactly_ the same, every time, possibly means that each sound is a precise substitute for the 'full' or 'slow' word, meaning that any introduced variation from a "natural" voice could actually make the 8x speech unintelligible.

      Hope the author can shed a bit of light, it's so neat! I remember ~20 years ago the Sidekick (or a similar phone) seemed to be popular in blind communities because it also had settings to significantly speed up TTS, which someone let me listen to once, and it sounded just as foreign as the recording in TFA.

      • xiande04 3 days ago

        Yeah, that bit about each phoneme sounding exactly the same everytime really made a lot of sense. Even if the TTS phoneme sounds nothing like a human would say it, once you've heard it enough times, you just memorize it.

        I guess sounding "natural" really just amounts to adding variation across the sentence, which destroys phoneme-level accuracy.

      • Neurrone 3 days ago

        > When I listened to the voice sample in that section of the article, it sounds very choppy and almost like every phoneme isn't captured.

        Every syllable is being captured, just speed up so that the pauses between them are much smaller than usual.

dherikb 3 days ago

Very good article.

After read about the poor scenario where the Linux accessibility tools is today (https://fireborn.mataroa.blog/blog/i-want-to-love-linux-it-d...), I was wondering: if maybe the developers start to use these accessibility tools to improve their speed reading (and productivity as well), this could also helps to prioritize the accessibility features and bug fixes in Gnome, KDE, Qt, etc.

ray__ 4 days ago

This is really interesting. I wonder–would it be possible to listen to an audiobook or PDF at 800 wpm once one learns how to understand the screenreader "language"? Presumably the cognitive load would get heavy if the content were a stream of unstructured prose as opposed to code.

  • Neurrone 3 days ago

    Yes, that is how I usually consume my content. Cognitive load is actually lower for unstructured prose compared to code, think about fiction for example. Code is much denser.

    When I read to relax, it is for enjoyment, so I don't aim to read as fast as possible. This is why I still listen to human narrated audiobooks, since a good narrator adds to the experience.

supriyo-biswas 4 days ago

I worked with a developer briefly who would produce code at an extremely high speed (this was before LLMs) and I've observed them write 50% of the code for two projects in the matter of a few days.

While I never got around to asking them how they coded so fast, this was probably one of the tools in their arsenal.

  • Cthulhu_ 3 days ago

    I can imagine it's practice and self-disciplilne; pick and know a language/tool instead of reinventing the wheel or doing research first, write like you've always written, do new projects in that same toolkit multiple times, and (self discipline), just get on with it.

    I struggle with that tbh, every time I'm on a new project I get into a "beginner's mindset" and look up the basics for a tool again instead of trusting myself that I know enough and what I write will be good enough.

  • elevaet 4 days ago

    FYI in case you didn't read the article - it's not about LLM coding but about a blind software developer who uses a screenreader at 800wpm to read code. It's really astounding to hear how fast that is I recommend checking it out!

    • supriyo-biswas 4 days ago

      I understand it's about LLM coding, but if I don't put that disclaimer there'll be tons of comments telling me how programming skills are obsolete in the LLM age.

      • brna-2 4 days ago

        Please read again.

narmiouh 3 days ago

This gave me new perspective. Both on what's possible with our ears/cognition and what our eyes do for us (for example seeing the extra quote so easily). Appreciate this very much and Thank you for what you do Neurrone!

  • Neurrone 3 days ago

    Thanks for reading :)

spyrja 3 days ago

For anyone wanting to trying this out on YouTube, select your video, open the Web Developer Tools with Ctrl+Shift+I, then type this (followed by a carriage-return):

  document.getElementsByClassName("video-stream html5-main-video")[0].playbackRate = 5; // Or whatever speed you choose
mtlynch 3 days ago

I'm getting intense deja vu reading this. I remember a very similar article from 3-5 years ago. Even the title feels like I've seen it before. I thought this was a repost from that one but I remember the previous post had a photo of the author's desk with a keyboard, desktop, and no monitor.

I tried searching algolia, but I can't find it.

markasoftware 4 days ago

ok, so you can understand the words at 800wpm...can you really comprehend what's being said? When I listen to youtube videos at 2x speed I can usually pick apart all the words just fine but I often have to slow it down to properly process the meaning behind those words.

  • klabb3 3 days ago

    I think it’s about different modes of operation. If I’m skimming an academic text, it’s not that I’m reading faster, I’m just jumping between text to find interesting keywords and sentences. A way to pick out raisins from the cookie.

    When listening to a podcast or reading a book the pleasure is important, so I almost never speed up. If the pacing (or information density) is too low, I just don’t listen. On the contrary I’ve read books that are so dense I have to slow down and repeat. Those can be great works.

    One mode is for navigation, one is for embedding the brain in the story, the knowledge, or whatever it is. It’s like looking out the window on a train, I’m not thinking ”wow I have seen these cows for 1.3 seconds, what a waste of time, I could have processed them in less than a second”.

    Not to speak for the blind, but I assume an enormous utility need of navigating and structuring information from a linear medium into whatever is the representation in our brains.

  • photios 4 days ago

    It's a matter of practice and progression. I listen to podcasts and audiobooks in English at 3x speed, but it took me maybe 2-3 years to get to this level.

    I understand all casual and technical content just fine. The only thing tripping me is fiction where I struggle with character names that I don't know how to spell (my visual memory needs it!). That's an English-only problem though. I don't have any of those issues with content in my native language (Bulgarian).

    • keeeba 4 days ago

      How have you tested your recall in the long and short term? And what were the results?

      • photios 4 days ago

        Gut feeling, of course :)

  • brna-2 4 days ago

    I don't know exactly what top speeds I used, above 2.5x, under 5x, but I listened to some lectures on youtube as a background process to doing some work. I would be sure I have missed something, as I worked so I had this practice of just setting the video time back and checking if it is stuff that I already know. I was amazed to see that mostly all data was retained already and I had the feeling of listening to the same stuff twice.

    These speeds were already non-comprehensive to my colleagues, but I thought there is no point in trying to get used to higher speeds, and to see the author processing these kinds of flows of data is just inspiring and amazing, Ill try to get better. But I am not planing on listening to code :D

    • brna-2 4 days ago

      I see my limit is now around 3x, and FYI Lex Fridman is much easier on the ears at 3x then the guests. :D

      • brna-2 4 days ago

        Also helpful, try listening to something for a minute at higher speeds e.g. 5x and slow down to 2.5x or 3x after that for a warm up.

  • smusamashah 3 days ago

    From the article:

        I adjust its speed based on cognitive load. For routine tasks like reading emails, documentation, or familiar code patterns, 800 WPM works perfectly and allows me to process information far faster than one can usually read. I’m not working to understand what the screen reader is saying, so I can focus entirely on processing the meaning of the content. However, I slow down a little when debugging complex logic or working through denser material. At that point, the limiting factor isn’t how fast I can hear the words but how quickly I can understand their meaning.
  • precompute 4 days ago

    I can read at speeds higher than 800wpm and comprehend everything. It's about practice. Although, yes, simpler material is easier to read at higher speeds.

  • nottorp 3 days ago

    I'm sure being blind stimulates developing your remaining senses a lot.

    Tbh even closing your eyes should.

  • raincole 4 days ago

    Uh, the author is visually impaired. So they really have no other choice than "really comprehending what's being said."

thewileyone 3 days ago

I've watched a young blind coder in person, using a screen reader to build a webpage. I was impressed by his ability despite his disadvantage!

evertedsphere 4 days ago

i've always wondered about the possibility of hybrid visual-audio interfaces for sighted users. a screen reader producing lots of words at the same time that you're looking around the screen—perhaps the visual ui would have to privilege symbols over words, to not force you to perform two language processing tasks at once

imagine the bandwidth

collectedparts 4 days ago

Just FYI OP (assuming OP is the author of the post) there's no margin on your blog. Text goes all the way to the edges.

As a related sidenote, I wonder how quickly ChatGPT replaces much of the customized tools here? ChatGPT is probably pretty proficient at being able to describe the contents of eg a screenshare, or a screenshot of a website.

  • Neurrone 3 days ago

    Edit: typo

    I didn't post this onto HN, so I only just found out about this thread.

    Thanks for mentioning the margin issue, I've tried fixing it now. Let me know if its still an issue.

    > I wonder how quickly ChatGPT replaces much of the customized tools here?

    Probably not many. It is prone to Hallucinations, and the latency involved for getting a response means that I only use it when I have to.

    • collectedparts 3 days ago

      Awesome, looks great! Thanks for jumping into the thread here.