Using large language models to accelerate communication for eye gaze typing users with ALS

The SpeakFaster UIWord completion and next-word prediction based on n-gram LMs24,32 exploit the statistical dependence of a word on a small number of (typically up to four) preceding words. By contrast, LLMs are able to take advantage of broader context, including tens or hundreds of preceding words entered by the user and previous turns of the ongoing conversation. We have previously demonstrated33 that a fine-tuned 64-billion-parameter Google LaMDA25 model can expand abbreviations of the word-initial form (e.g. ‘ishpitb’) into full phrases (e.g. ‘I saw him play in the bedroom’, Fig. 1) at a top-5 exact-match accuracy of 48–77% when provided with conversational context, i.e. previous dialogue turn(s). Failures to find exact matches tend to occur on longer and more complex phrases. While promising, a practical solution needs to ensure that the user is able to type any arbitrary phrase in subsequent attempts in case of a failure in the initial abbreviation expansion (AE), i.e. the user will never run into a ‘dead end’ in the UI.Fig. 1: The primary interaction pathway of abbreviated text entry in the SpeakFaster UI: the initials-only pathway.The KeywordAE LLM serves this UI pathway This UI pathway requires the user to enter only the word-initial letters of the intended phrase. The ‘Speaker’ button to the right of each candidate phrase (e.g. Label 4) allows the user to select and speak the correct phrase out via text-to-speech. Gaze-clicking the ‘Expand’ button (Label 2) is optional since calls to the LLM are triggered automatically following the eye-gaze keystrokes. The gaze-driven on-screen keyboard is omitted from the screenshots in this figure (also Figs. 2 and 3).We therefore developed a UI and two underlying fine-tuned LLMs as a complete, practical solution. LLM ‘KeywordAE’ is capable of expanding abbreviations that mix initials with words that are fully or incompletely spelled out (Fig. 2). The KeywordAE model is also capable of expanding initials-only abbreviations, and hence provides a superset of the capabilities of the fine-tuned LLM in ref. 33 LLM ‘FillMask’ is capable of providing alternative words that begin with a given initial letter in the context of surrounding words (Fig. 3). The two models were each fine-tuned with ~1.8 million unique triplets of {context, abbreviation, fullphrase} synthesised from four public datasets of dialogues in English. Here, ‘context’ refers to the previous turns of the ongoing dialogue, including the turns authored by the user and the conversation partner (See Supplementary Section 1.1 for details of LLM fine-tuning and evaluation).Fig. 2: The Keyword Abbreviation Expansion (KeywordAE) UI pathway.This is an extension of initials-only AE (Fig. 1) that allows words spelled out fully or partially to be mixed with initials in the original abbreviation, in an order that matches to the intended phrase. We refer to such fully- or partially-spelled words as ‘keywords’. They guide the system towards the intended phrase. Label 3 (‘bedr’ for ‘bedroom’) illustrates the support for partially-spelled keywords supported by KeywordAE v2, which differs from KeywordAE v1, where only fully-spelled keywords (e.g. ‘bedroom’) are supported.Fig. 3: The FillMask UI pathway.This is an additional interaction flow that allows users to recover from failure to find the full phrase from the initials-only abbreviation. The FillMask UI allows the user to find sensible replacements for an incorrect word that starts with the same letter.To form conduits to the fine-tuned LLMs, we designed a UI with three pathways, namely initials-only AE, KeywordAE and FillMask, to support a complete abbreviated text input experience (see Fig. 1). The initials-only pathway is the common starting point of all phrase-entry workflows in the SpeakFaster UI. Among the three pathways, it involves the fewest keystrokes and gaze clicks and alone suffices for short and predictable phrases. The user starts by typing an initialism that represents the intended phrase (Label 1 in Fig. 1). The abbreviation is typed with a conventional soft keyboard or gaze-driven on-screen keyboard (e.g. Tobii Ⓡ Computer Control). As the user enters the abbreviation, the UI automatically triggers calls to the KeywordAE LLM after every keystroke, including the user-typed abbreviation along with all previous turns of the conversation as input to the LLM. Each call returns the top-5 most likely options based on the conversational context and the abbreviation, which are rendered in the UI for the user to peruse and select (Label 3 in Fig. 1). The number of options provided (5) is based on the screen size of gaze tablet devices and the point of ‘diminishing return’ for motor saving rate from offline simulation results (see next section). If one of the candidate phrases matches the intended phrase, the user selects the phrase by clicking the ‘Speaker’ button associated with it (Label 4 in Fig. 1), which dispatches the phrase for text-to-speech output and ends the phrase entry workflow. The selected phrase (‘I saw him playing in the bedroom’ in this example) becomes a part of the conversational context for future turns.If the intended phrase is not found via the initials-only pathway, however, two alternative UI pathways are available to assist the user in finding the intended phrase. One of the pathways is KeywordAE. The user gaze-clicks the ‘Spell’ button (Label 2 in Fig. 2), which turns the abbreviation in the input bar into gaze-clickable chips, one for each character of the initials-only abbreviation (e.g. bottom of Fig. 2). The user selects a word to spell by gaze-clicking the corresponding chip. This turns the chip into an input box, in which the user types the word by using the on-screen keyboard (Label 3 in Fig. 2). Subsequent calls to the LLM will contain the partially- or fully-spelled words in addition to the initials of the unspelled words. A call to the KeywordAE is triggered automatically after every keystroke. After each call, the UI renders the latest top-5 phrase expansion returned by the KeywordAE LLM (Label 4 in Fig. 2). If the intended phrase is found, the user selects it by a gaze click of the speaker button as described before (Label 5 in Fig. 2). We constructed two versions of KeywordAE models: KeywordAE v1 requires each keyword to be typed in full, while KeywordAE v2 allows a keyword to be typed incompletely (‘bedroom’ as ‘bedr’). Simulation results below show v2 leads to greater keystroke saving than v1.The KeywordAE UI pathway is not limited to spelling out a single word. The UI allows the user to spell multiple words, which is necessary for longer and more unpredictable phrases. In the unlikely case where the AE LLM predicts none of the words of the sentence correctly, the KeywordAE pathway reduces to spelling out all the words of the phrase.FillMask is another way to recover from the failure to find the exact intended phrase. Unlike KeywordAE, FillMask only suits the cases in which very few words (typically one word) of an expansion are incorrect (i.e. the phrase is a ‘near miss’). For instance, one of the phrases ‘I saw him play in the backyard’ missed the intended phrase ‘I saw him play in the bedroom’ by only one incorrect word (‘backyard’, Label 2 in Fig. 3). The user clicks the near-miss phrase, which causes the words of the phrase to appear as chips in the input bar. Clicking the chip that corresponds to the incorrect word (‘backyard’, Label 3 in Fig. 3) triggers a call to the FillMask LLM, the response from which contains alternative words that start with the same initial letter and fit the context formed by the other words of the sentence and by the previous turn(s) of the conversation. The user selects the correct word (Label 5) by clicking it and then clicking the speaker button to finalise the phrase entry (Label 6).In addition to the single-word interaction shown in Fig. 3, the FillMask pathway allows the UI to replace multiple words (or do replacement multiple times in a given word slot) after the initial replacement. In rare cases where the FillMask LLM fails to provide the intended word, the user can fall back to typing the correct word in the input box by using the eye-gaze keyboard.As shown above, KeywordAE and FillMask are two alternative interaction modes for recovering from a failure to obtain the intended phrase via the initials-only pathway. The user’s decision on which pathway to choose should be determined by whether a near-miss option exists. This proposition is supported by simulation results in the next section. In the current study, the SpeakFaster UI allows the user to use the FillMask mode after using the KeywordAE mode, which is useful for finding the correct words in hard-to-predict phrases. But entering the KeywordAE mode is not allowed after using the FillMask mode, because FillMask should be used only during the final phase of a phrase entry workflow, where all but one or two words of a candidate phrase are correct. These heuristics and design considerations of the UI pathways were made clear to the users through initial training and practice at the beginning of the user studies described below. The SpeakFaster UI is only one of many possible UI designs for supporting AAC text entry with LLMs19,20. Its justification comes from prior studies on LLM’s capabilities in expanding abbreviations33, its consistency with the conventional lookup-based AE in AAC30 and the empirical results from the user studies reported below.Simulation resultsTo measure the approximate upper bound of the motor-action savings in this our text-entry UI, we ran simulation on the test split of a corrected version of the Turk Dialogues corpus (TDC)33,34. To simulate an ideal user’s actions in the SpeakFaster UI when entering the text for a dialogue turn, we first invoked AE without keywords (Fig. 1). If the matching phrase was found, the phrase was selected and the simulation ended. If no matching phrase was found, however, we tested three interaction strategies. The first strategy (Strategy 1) invoked the KeywordAE (Fig. 2) iteratively by spelling more of the words out, until the matching phrase was found. The second strategy (Strategy 2) was identical to Strategy 1, except FillMask (Fig. 3) was used in lieu of KeywordAE whenever there remained only a single incorrect word in the best-matching phrase candidate. The flowcharts for Strategies 1 and 2 are shown in Fig. 4A, B, respectively. The third strategy, referred to as Strategy 2A, was a variant of Strategy 2. It utilised FillMask more aggressively, i.e. as soon as two or fewer incorrect words remained in the best option. In all three strategies, KeywordAE was invoked incrementally by spelling out more words in the best-matching candidate. This incremental spelling was implemented differently for the two versions of KeywordAE, due to differences in what abbreviations were supported. For KeywordAE v1, which supported only fully-spelled keywords, the simulation spelled out the first incorrect word in the best option; for v2, in which keywords could be partly spelled, the simulation added one additional letter at a time in the first incorrect word. To leverage the contextual understanding of the AE and FillMask LLMs, all the previous turns of a dialogue from the TDC corpus were used for expanding abbreviations and finding alternative words, unless otherwise stated.Fig. 4: Simulation strategies of phrase entry assisted by the Keyword Abbreviation Expansion (KeywordAE) and FillMask LLMs.A Simulation Strategy 1: AE with only initials is followed by KeywordAE if the initials-only AE doesn’t return the desired phrase. Keyword AE in v1 iteratively spells out the first (leftmost) words in the best-matching phrase option until the desired phrase is found. KeywordAE in v2 iteratively appends letters to the first (leftmost) incorrect word in the best-matching phrase option. B Simulation Strategy 2: same as Strategy 1, except that FillMask is used whenever only one incorrect word remains in the best-matching phrase.As a baseline for comparison and to emulate the traditional n-gram-based text prediction paradigm, we also ran simulations on the same corpus with an n-gram LM (Gboard’s finite state transducer35 trained on 164,000 unigrams and 1.3 million n-grams in US English) that supported word completion and prediction, modelling ideal user behaviour that selects the intended word as soon as it became available among the top-n word completions or prediction options.To quantify the number of motor actions, we broadened the definition of keystrokes to include not only keypresses on the keyboard but also UI actions required to use the phrase- and word-prediction features in SpeakFaster, including entering the KeywordAE mode (a click to the ‘Spell’ button in Fig. 2), specifying a word to spell for KeywordAE, entering the FillMask mode, selecting a word to replace through FillMask, and selecting phrase and word options returned by the LLMs. Similarly, the number of keystrokes in the Gboard simulation included selecting options from word completion and next-word prediction.The result of the SpeakFaster simulations indicated a significant saving of motor actions compared to the baseline from Gboard’s forward predictions (Fig. 5A). This held true for both KeywordAE v1 and v2. Under KeywordAE v2, given that SpeakFaster utilised all previous dialogue turns as the context and provided five options at each step (orange bars in Fig. 5A), Strategy 1 and Strategy 2 led to the keystroke-saving rate (KSR) values 0.640 and 0.657, respectively, significantly exceeding the Gboard KSR (0.482). These KSRs from the KeywordAE v2 model also beat the best KSR from KeywordAE v1 (0.647) by a small but noticeable margin, reflecting a benefit of allowing keywords to be partially spelled. The superior KSR of Strategy 2 relative to Strategy 1 indicates a benefit of augmenting KeywordAE with FillMask, which surfaced the correct word options with fewer motor actions required. However, the comparison between Strategy 2 and Strategy 2A shows that FillMask negatively impacts motor saving rate if used too aggressively. Specifically, premature uses of FillMask, i.e. whenever two incorrect words remained (instead of one incorrect word as in Strategy 2), reduced the KSR from 0.657 to 0.647 (Fig. 5A). In Fig. 5A, the grey bars show that SpeakFaster outperformed the Gboard baseline in KSR even without utilising the previous dialogue turns as context, although the KSR gains were significantly lower compared to if the context was utilised.Fig. 5: Simulation results show significant motor savings in the SpeakFaster UI.These results are based on the simulation strategies presented in Fig. 4 and utilising full or no conversation context as shown. A Keystroke saving rates (KSRs), compared with a forward-prediction baseline from Gboard (blue bar). The orange bars show the KSRs when conversational context is utilised, while the grey bars show the KSRs without utilising conversational context. All data in this plot are based on 5-best options from the KeywordAE and FillMask LLMs. B The KSRs from Strategies 2 as a function of the number of LLM options, in comparison with Gboard forward prediction. C Fraction of dialogue turns successfully entered with initials-only AE call, as a function of the number of options provided and the availability of conversational context. Each data point in this figure is the average from a single simulation run on the dialogue turns for which the sentence length was 10 or shorter (counting words and mid-sentence punctuation) from the 280 dialogues in the test split of the Turk Dialogues (Corrected) dataset33,34.The results in Fig. 5A are all based on providing 5-best phrase options from the KeywordAE and FillMask LLMs. To illustrate the effect of varying the number of LLM options, Fig. 5B plots the KSRs against the number of LLM options. Similar to the trend from Gboard, KSRs in SpeakFaster increased monotonically with the number of options, but started to level off at approximately five, which forms the basis for our UI design decision of including 5-best options (Fig. 1). Fig. 5C shows that when conversational context was made available to the KeywordAE LLM (either v1 or v2), approximately two-thirds of the dialogue turns in the test corpus could be found with only the initials-only AE call (i.e. a single LLM call). This fraction became approximately halved when the conversational context was unavailable, which again highlights the importance of conversational context to the predictive power of the LLMs.The simulation results above show that the theoretical motor-action saving afforded by context-aware AE and FillMask surpassed that of the traditional forward prediction by 30–40% (relative). This result builds on the previous AE LLM in ref. 33 and goes a step further by supporting abbreviations that include spelled words (KeywordAE) and suggesting alternative words (FillMask), which removes ‘dead ends’, thus allowing any arbitrary phrase to be entered. However, as shown by prior studies15,24,32,36, the predictive power of motor action-saving features in text-entry UIs is often offset by the added visual and cognitive burden involved in using these features, besides human errors such as misclicks and misspellings. Our system additionally involved network latencies due to calls to LLMs running in the cloud (see Supplementary Section 1.3). Therefore, the practical performance of the LLM-powered AE-based text-entry paradigm in SpeakFaster must be tested with empirical user studies. To this end we conducted a controlled lab study on a group of users typing manually on a non-AAC mobile device as a pilot study of the novel text entry paradigm, followed by lab and field studies on two eye-gaze typing users with ALS.User study overviewWe tested the SpeakFaster text-entry UI with two groups of users. First, a group of non-AAC touch-typing users typed with their hands on a mobile touch-screen device running SpeakFaster powered by the KeywordAE v1 and FillMask LLMs. In a separate AAC eye-gaze user study, users with ALS who were experienced eye-gaze typers entered text by using eye trackers integrated with SpeakFaster. In all our studies, participants only viewed the partner conversations and typed their conversation turns. Except in the AAC eye-gaze field user study, where users participated in conversations in their natural environment, all partner conversations for the other AAC and non-AAC user studies were presented as text to the user. The non-AAC user study served as a pilot for the AAC eye-gaze user study by proving the learnability and practicality of the LLM-based text-entry UI. A common goal of the two studies was to understand the cognitive and temporal cost introduced by the SpeakFaster UI and how that affects the overall text-entry rate compared to a conventional baseline. To study this under different levels of spontaneity and authoring task load, our study protocol consisted of both a scripted phase and an unscripted phase.The scripted phase consisted of 10 dialogues from the test split of the TDC corpus33,34. Each dialogue is a two-person conversation with a total of six turns, three uttered by each person. Our user study participant played the role of one of the persons in the conversation, and the to-be-entered text was displayed to them close to their typing area. In the unscripted phase, the user engaged in a set of five six-turn, text-based dialogues with the experimenter where only the starting question was predetermined, and the rest was spontaneous but required the user to keep each conversation turn under ten words and not include any personal information. The experimenter, who entered text on a custom mobile app connected with the study participant’s device, started with an open-ended question such as ‘What kind of music do you listen to?’ (see Supplementary Section 1.4 for the full list) and then the user would reply. The experimenter and user then followed up in alternating turns, until a total of six turns was reached for each dialogue. In both the scripted and unscripted phases, for each block of five dialogues, the first and last dialogues formed the baseline, in which the user typed with the regular keyboard i.e. either Gboard in the non-AAC user study or the Tobii eye-gaze keyboard in the AAC eye-gaze users study, utilising word suggestions provided by the default keyboard at will. In the other three dialogues, the user entered text by using the SpeakFaster UI, starting out with the initials-only abbreviation scheme and using any of the pathways to either spell out words (KeywordAE) or select replacements (FillMask) as per their preference until they were satisfied.Prior to the data collection portion of the lab study, each user watched a video demonstration of the SpeakFaster system and took a short practice run to familiarise themselves with the SpeakFaster UI and the AE text entry paradigm. Each non-AAC user study participant practiced a minimum of five practice dialogues. The content of these practice dialogues was different from the ones used in the subsequent study blocks. Prior to the unscripted phase, the users were also familiarised with the unscripted test condition through practicing a single dialogue. The eye-gaze user practiced for 4 h over 2 days and this is described more in the Supplementary Section 1.4.Non-AAC users’ SpeakFaster text-entry rates are similar to baselineIn order to study abbreviated text entry under different degrees of motor cost, the 19 participants (10 female, 9 male, all adults) who provided informed consent were randomly assigned to two typing-posture groups in the non-AAC user study. Nine users were assigned to the one-finger group and instructed to type with only the index of their dominant hand (right hand for all these users). The remaining ten users were assigned to the no-constraint group and were given no limitation related to typing posture. They all operated with both hands during the experiment, with varied posture details in which fingers or thumbs were used.In the scripted portion of the user study, no significant difference was observed in the accuracy of text entry between SpeakFaster and the Gboard baseline. The average word error rates (WERs) of the OneFinger group were 1.55% and 2.53% under the baseline and SpeakFaster conditions, respectively. For the NoConstraint group, the respective average WER were 3.96% and 2.89%. A two-way linear mixed model (Posture × UI) on the WERs showed no significant main effect by Posture (z = −1.758, p = 0.079) or UI (z = 0.079, p = 0.250). Nor was there a significant interaction in WER between Posture and UI (z = 1.516, p = 0.129).The effect on text-entry rate by the LLM-powered SpeakFaster UI showed an intricate mixed pattern. While the text-entry rate saw increases on average under the SpeakFaster UI relative to the baseline (Gboard) UI during the scripted dialogues, the average rate showed a decrease when the users engaged in unscripted ones. Analysis by a linear mixed model did not reveal a significant main effect by UI (z = 0.141, p = 0.888). However, a significant two-way interaction was found between UI and DialogType (z = −2.933, p = 0.003). Post hoc paired t-test (two-tailed, same below) confirmed a significant difference in the SpeakFaster-induced changes in the text-entry rate (relative to baseline) between the scripted and unscripted dialogues (t18 = −4.85, p < 0.001, Cohen’s d = 1.11, 95% CI of difference, same below: −6.00 ± 2.53 WPM). Specifically, while SpeakFaster increased the average rate by 2.510 ± 3.024 WPM (95% CI of mean, relative: 13.0% ± 24.5% WPM) under the scripted dialogues, it decreased the average rate by 3.494 ± 3.294 (relative: 10.2% ± 25.0%) under the unscripted ones. The three-way linear mixed model did not reveal any other significant main effects or interactions.Significant motor savings under SpeakFasterWhile the effect of SpeakFaster on the text-entry rate exhibited a complex pattern of interactions, showing an absence of overall significant change from the baseline, the KSR was affected in a clear-cut and pronounced manner by SpeakFaster (Fig. 6B). The same three-way linear mixed model, when applied on the KSR as the dependent variable, revealed a significant main effect by UI (z = 10.317, p < 0.001). The linear mixed model revealed no other significant main effects or interactions. Relative to the Gboard baseline, the SpeakFaster UI paradigm led to a large and significant increase in KSR for both the scripted dialogues (+0.564 ± 0.080 abs., t18 = 13.44, p < 0.001) and the unscripted ones (+0.450 ± 0.114 abs., t18 = 7.556, p < 0.001).Fig. 6: Non-AAC user study results show improved keystroke-saving rate (KSR) with mixed changes in text-entry rate.The results shown are text-entry rate (A), KSR (B), the percentage of sentences that involved only a single AE server call (C) and the percentage of FillMask responses that contain options selected by users (D) from the non-AAC user study. In each plot, the two groups of bars correspond to the two posture groups (OneFinger and NoConstraint), as labelled at the top. The error bars in these plots show 95% confidence intervals (CIs) of the mean. The numbers of subjects are 9 and 10 for the one-finger and no-constraint groups, respectively. This user study was conducted once. Overall, the text entry speed (WPM) using SpeakFaster was not significantly different from the baseline despite introducing significant keystroke savings.Panel C of Fig. 6 shows the percentage of dialogue turns in which the user successfully entered the sentence by using only the initials-only AE call, i.e. without spelling out words in the abbreviation or using FillMask. As the orange bars show, the percentages were on par with the results from the simulation in the scripted dialogues (c.f. Fig. 5C). The percentages of sentences that succeeded with a single AE call were lower for unscripted dialogues (magenta bars, 65% on average), reflecting the slight domain mismatch in the unscripted text content from the scripted ones that the AE and FillMask models were trained on.Simulation accurately predicts users’ keystroke savingsThe KSR values observed from the users in the lab study could be predicted with high accuracy by the simulation results. The blue dots in the top panel of Fig. 7 show a significant positive correlation between the average KSR values from all users and the simulated ones on a turn-by-turn basis among the scripted dialogues (Pearson’s correlation: R158 = 0.905, p < 0.001). The unscripted dialogues (orange dots) also exhibited a significant correlation between the simulated and observed KSRs (R158 = 0.636, p < 0.001). However, it can be seen that the users’ average performance did not fully realise the motor-saving potentials predicted by the offline simulations, as most data points in Fig. 7 fall below the line of equality (the solid diagonal line), potentially reflecting human errors such as typos and mis-operations, as well as the actions needed to recover from them. The degree to which the user behaviour underperformed the simulation results was greater during the unscripted dialogues than the scripted ones. For example, several unscripted dialogue turns showed negative KSRs, despite the fact that offline simulation predicted positive KSRs based on the final committed sentences. This likely reflects greater revisions due to human errors and change of mind when the users operate under the dual cognitive load of formulating a dialogue response and operating the SpeakFaster UI in order to enter the response.Fig. 7: User behaviour in SpeakFaster is well-predicted by simulation results.The top panel shows correlations between the average KSRs observed in the users (y-axis) and those from simulation (x-axis), on a turn-by-turn basis. The blue and orange dots correspond to the scripted and unscripted dialogue turns, respectively. The simulation results are based on KeywordAE model v1, Strategy 2, five predictions and using all previous turns of the ongoing dialogue as the context. Each dot corresponds to a unique turn from one of the dialogues used as the text materials in the lab study. The numbers of unique scripted and unscripted dialogue turns are 60 and 160, respectively. The bottom panel shows a comparison of the simulated KSR values from the scripted (blue) and unscripted (orange) dialogue turns. The KSRs from the scripted turns were significantly higher than those from the unscripted ones (unpaired t-test: t218 = −3.656, p < 0.001).The bottom panel of Fig. 7 shows that the simulated KSR values were significantly lower for the unscripted dialogues than the scripted dialogues (mean: 0.527 vs. 0.650, −18.99% relative, unpaired t-test: t218 = −3.656, p < 0.001, Cohen’s d = 0.553, 95% CI of difference: −0.124 ± 0.067). This was likely due to a domain mismatch between the TDC dataset and the unscripted content composed by the users during the lab study. However, the fact that SpeakFaster significantly boosted KSR even for the unscripted dialogues (Fig. 6) underlines the robustness of the motor saving against domain shifts.Temporal aspects of user interactions in SpeakFasterFigure 8 shows the temporal aspects of how the non-AAC users interacted with the SpeakFaster UI. An inter-keystroke interval (IKI) is defined as the time interval between two consecutive keystrokes issued directly by the user using the soft keyboard (Gboard). IKIs exclude non-keystroke actions and the keystrokes are automatically applied due to the user selecting options for word completion and predictions. The IKIs were significantly longer under the SpeakFaster UI than the baseline UI, showing that it was slower on average for the users to plan and perform the keystrokes when typing the characters of the abbreviations and performing the keystrokes required for subsequent spelling of words than typing words in the familiar, sequential fashion (main effect by UI: z = 3.671, p < 0.001). The linear mixed model also identified a significant UI × DialogType interaction (z = 2.303, p = 0.021). A post hoc t-test confirmed that the increase in the IKI was greater under the unscripted dialogues (+381.1 ms abs., 87.1% rel.) than under the scripted ones (+194.8 ms abs. 51.7% rel., t18 = 3.066, p = 0.0067, Cohen’s d = 0.703, 95% CI of difference: 186.366 ± 124.2 ms). Similar to the observations related to KSRs above, this differential effect of SpeakFaster on IKI increase may be interpreted as the greater cognitive load under the dual task of composing a free-form dialogue response and abbreviating the response in the SpeakFaster’s abbreviation regime.Fig. 8: Evaluating and selecting LLM-provided options consumes a large fraction of the time during SpeakFaster UI usage.Temporal analyses of text entry in the SpeakFaster UI and the Gboard baseline in the non-AAC user study. A The inter-keystroke intervals (IKIs), response times for selecting AE and FillMask (FM) options returned from the LLMs, and the latencies of calls to cloud-based LLMs. The bars are organised into five groups. The first two groups are from the OneFinger posture group, showing scripted and unscripted dialogues, respectively. Likewise, the third and fourth groups show the results from the NoConstraint posture group. The fifth group shows LLM latencies. B the response time for AE option selection showed a clear increasing relation with the length of the underlying phrase, measured as a number of words defined as the character length of the phrase divided by 5. The numbers of subjects in the one-finger and no-constraint groups are 9 and 10, respectively. The rightmost two bars in (A) and the bars in (B) are computed on the pooled group of 19 participants.The dark and light grey bars in Panel A of Fig. 8 show the temporal metrics related to using the LLMs for the AE (including KeywordAE) and FillMask workflows, respectively. Compared with the latencies of the calls to the cloud-based LLMs (two rightmost bars in Panel A), the users’ average response times in selecting the AE-suggested phrases and FillMask-suggested words were 2–3 times longer. These response times not only exceeded the LLM latencies, but were also longer than the average IKIs by 3–6 times, indicating that they were a significant component of the total time it took to use the SpeakFaster UI.The average response times for AE were approximately twice as long as those for FillMask. This is attributable to the fact that the AE options are multi-word phrases while the FillMask options are single words, which highlights an additional benefit of the word-replacement interaction in FillMask. As Fig. 8B shows, the AE response time showed a strong positive correlation with the length of the phrase (Spearman’s ρ423 = 0.428, p < 0.001). While selecting phrase options of length two words or shorter took only ~2000 ms on average, selecting phrases eight words or longer took more than twice as long.The results of the non-AAC user study showed that the LLM-based SpeakFaster text entry UI led users to achieve savings in motor actions including keystrokes up to 50 percentage points (absolute) higher than the conventional way of mobile text entry. In terms of speed, the results were mixed. While in the scripted dialogue condition the users achieved an average of 13% speedup, they showed a 10% slowdown under the unscripted condition, reflecting an interplay between the added cognitive load of using the UI and that required to compose a spontaneous text message. Timing analysis revealed that reviewing the phrase options from the KeywordAE LLM took about 3–6 times the average IKI; it took relatively less time to review the word options in FillMask, but the review cost was still significant (2–3 times as long as the average IKI). These timing findings highlight an important trade-off between the cost of reviewing LLM outputs and the savings in motor actions. Mobile touch-typing IKIs were relatively short (≈500 ms, Fig. 8A), which may mask the benefit from the predictive power of the LLMs. However, in eye-gaze typing, the IKIs can be significantly longer, owing to the dwell time and the gaze travel time between keys. These considerations indicate stronger potentials for acceleration in eye-gaze text entry than mobile text entry. With this hypothesis, we proceeded to study AAC eye-gaze users’ interaction with SpeakFaster.SpeakFaster enables significant KSR and WPM speed-up for eye-gaze usersThe two study participants were both adult males diagnosed with ALS (17 and 11 years prior to the study, respectively) and provided informed consent before participating in this study. Both users were native speakers of American English and experienced eye-gaze typists who communicate daily through eye trackers and associated keyboards. At the time of the study, both participants were quadriplegic, unable to speak, however, their eye movements remained functional and their cognitive abilities were reported within normal limits. They were experienced with the Tobii Ⓡ eye-gaze on-screen keyboard and its n-gram word competition and prediction features similar to Gboard. The first participant engaged in a controlled lab study (LP1) and the second in a field deployment to gather more naturalistic data (FP1).In the controlled lab study, our participant (LP1) followed a structured study protocol consisting of a scripted part followed by an unscripted part identical to that of the mobile user study described above. However, LP1 used the KeywordAE v2 model, which supported partially-spelled keywords and triggered LLM calls for initials-only AE and KeywordAE after every eye-gaze keystroke, eliminating the need for gaze trigger of AE LLM calls as in the non-AAC user study. Prior to the data collections, the user practiced the SpeakFaster text entry paradigm under the direction of the experimenter for a total of 4.1 h over two separate days before the lab study, for which the learning curve can be found in Supplementary Section 1.4.Figure 9 A compares the mean text-entry rates in WPM between the SpeakFaster paradigm with the Tobii keyboard baseline. Averaged over the scripted dialogues, the user achieved an average text-entry speed of 6.54 WPM while using SpeakFaster, which exceeded the baseline typing speed (4.05 WPM) by 61.3% (two-sample t-test: t28 = 2.76, p = 0.010, Cohen’s d = 1.03, 95% CI of difference: 2.484 ± 1.841 WPM). A similar rate enhancement occurred for the unscripted dialogues (SpeakFaster: 6.38 WPM, baseline: 4.37 WPM, 46.4% increase), although the difference did not reach significance (t-test: t13 = 0.818, p = 0.43, Cohen’s d = 0.575). In addition to increased speed, a significant increase in the KSR was also observed for the user with SpeakFaster for both the scripted and unscripted dialogues. Again, statistical significance was reached only for the scripted dialogues (0.360 vs. the baseline of 0.227, rank-sum test: ρ = 1.97, p = 0.049, Cohen’s d = 0.575, 95% CI of difference: 0.133 ± 0.177, Fig. 6B).Fig. 9: AAC eye-gaze users’ text-entry rates and KSR show gains under the SpeakFaster testing condition compared to a non-LLM-assisted baseline.A, B Show the comparisons of the text-entry speed and keystroke saving rate (KSR) between SpeakFaster’s AE UI and a forward prediction baseline based on the Tobii eye-gaze keyboard in the lab study participant LP1. C, D Compares the text-entry rate and KSR of the field study participant FP1 while using SpeakFaster versus a Tobii keyboard baseline, for both scripted dialogues and unscripted dialogues in a controlled lab study. Error bars show 95% confidence interval (CI) of the mean over the numbers of dialogue turns in the lab and field studies, each of which was conducted once respectively, as indicated in the x-axis labels. Note the different y scales between the panels.In the lab study, 77.8% of the scripted dialogues and 66.7% of the unscripted ones required only a single initials-only AE LLM call. For the trials in which LP1 used FillMask, the LLM predicted the correct words (as determined by user selection) 58.3% of the time for the scripted dialogues and 21.4% of the time for the unscripted ones. Despite achieving success a majority of the time (58.3%), the FillMask LLM’s success rate observed on LP1 on the scripted dialogues is lower than the success rate predicted from offline simulation (70.7%), indicating that the user occasionally failed to choose the correct words when they appeared. The fact that the success rate of the initial AE and FillMask was lower for unscripted dialogues reflects the domain mismatch between the user’s personal vocabulary and the model’s training corpus, highlighting personalisation of the model as a useful future direction.To study the efficacy of the SpeakFaster paradigm under more natural usage, we conducted a field study with another eye-gaze user (FP1). As we reported previously37, FP1 showed an average eye-gaze text-entry speed of 8.1 ± 0.26 WPM (95% CI) over a period of 6 months of measurement in his real-life communication with close relatives and caregivers. This baseline speed of gaze typing is based on 856 utterances typed with the Tobii Ⓡ Windows Control eye-gaze keyboard with a PCEye Mini IS4 eye tracker. FP1 also engaged in test dialogues with an experimenter by using the SpeakFaster UI based on KeywordAE v1 and manual triggering of LLM calls for AE. Over the 27 unscripted phrases entered with SpeakFaster on six different days, the user achieved an average speed of 10.4 ± 2.6 WPM, which is 28.8% faster than the daily baseline (two-sample t-test: t881 = 2.97, p = 0.0031, Cohen’s d = 0.580, 95% CI of difference: 2.335 ± 1.544 WPM, Fig. 9C). Accompanying this increase in the average speed of text entry was an increase of KSR from −0.14 to 0.32 (rank-sum test: ρ = 4.37, p < 0.001, Cohen’s d = 0.566, 95% CI of difference: 0.463 ± 0.314, Fig. 9D).Motor savings outweigh cognitive overhead for eye-gaze typingThe latencies for the AE calls were 843.0 ± 55.4 ms and 832.7 ± 120.4 ms (95% CI of mean) for the scripted and unscripted dialogues, respectively (Fig. 10A). The latencies of the FillMask calls were shorter (scripted: 617.9 ± 41.8 ms; unscripted: 745.2 ± 67.1 ms) than AE, due to its serving configuration that took advantage of the shorter output lengths (Supplementary Section 1.3). These LLM-serving latencies were approximately four times shorter than the average eye-gaze IKIs measured on user LP1 (Fig. 10A, blue and orange bars, 3511–4952 ms) and therefore had only a minor slowing effect on the text-entry rate of AAC eye-gaze typers. In comparison, the time it took the user to select the correct AE responses was significantly longer (Fig. 7B: scripted: 12,732 ± 5207 ms; unscripted: 21,225 ± 19,807 ms), which was 3–6 times the average duration of a keypress, reflecting the significant cost for scanning the phrase options from the AE calls. By contrast, FillMask involved a much shorter (2–3×) candidate selection time than AE (Fig. 7B: scripted: 7032 ± 4584 ms; unscripted: 4745 ± 2023 ms), reflecting the benefit of the FillMask interaction in providing shorter, single-word candidates, which reduced the scanning cost.Fig. 10: Savings in motor actions outweigh the cost of LLM option evaluation in eye-gaze AAC users.Comparison of inter-keypress intervals, server latencies, and user-selection reaction times observed in the lab study on user LP1. A Compares the inter-keystroke intervals (IKIs) of LP1’s eye-gaze keypresses, the time it took the user to select LLM-provided options for KeywordAE and FillMask, and the server latencies (including network latencies) for the LLM calls. The error bars show 95% of the mean over the number of events as specified in the x-axis labels. B Shows breakdown of the amount of time spent on different tasks under the baseline and SpeakFaster typing conditions, observed in the eye-gaze user LP1 (two bars at the top) and the average over the 19 mobile-study users (two bars at the bottom). Only the data from the scripted typing condition are shown for clarity. The colour code of the bar segments is omitted in the bottom two bars due to space limit, but is identical as that of the top two bars.Compared with the average IKI in the non-AAC users (Fig. 8), these IKIs from the eye-gaze typist shown in Fig. 10A were 3–6 times longer. This provides insight as to why SpeakFaster leads to a significant speed-up for eye-gaze typists in this study while introducing minimal changes among the participants in non-AAC user study described above. Specifically, Panel B of Fig. 10 shows a breakdown of the time intervals spent on several different types of actions during the baseline typing condition and SpeakFaster typing condition, as shown by different colours. The overhead of using SpeakFaster is broken down into the LLM latencies, the actions involved in using KeywordAE (entering the spelling mode, selecting a word to spell, as well as reviewing and selecting phrase options), and those involved in using FillMask (entering the FillMask mode, selecting the target word, as well as review and selecting word options). Among these subtypes of overhead, the LLM latencies were a relatively minor factor. The total overhead of using the SpeakFaster system is 5215 ms/word for the AAC users and 1735 ms/word for the non-AAC users. While this timing overhead is close to the time spent on keystrokes for the non-AAC users (2252 ms/word); it was much shorter than the time spent on keystrokes for the AAC eye-gaze user (who is at 14,774 ms/word) leading to a reduction in the overall time on a per word basis.By contrast, the non-AAC users only showed a more modest average reduction of time spent on keystrokes (1416 ms/word) when using SpeakFaster, which was insufficient to fully offset the increased overhead. This was largely due to the fact that the average IKIs were already much shorter during mobile typing than in eye-gaze typing.In summary, the quantitative measurements of eye-gaze typing in both users found a considerable advantage of the SpeakFaster UI of text input compared to the baseline of conventional eye-typing systems with forward prediction. The evidence is seen both in a controlled lab setting and a field study in comparison with a real-life, long-term baseline.

Using large language models to accelerate communication for eye gaze typing users with ALS

An enhanced machine learning algorithm for type 2 diabetes prognosis with a detailed examination of Key correlates

Chemistry wordoku #067 | Puzzle

Tight House race in Pennsylvania could affect federal science spending

Quick chemistry crossword #060

Rare disease initiative aims to speed diagnoses and treatment in Latin America

Hot Topics

An enhanced machine learning algorithm for type 2 diabetes prognosis with a detailed examination of Key correlates

Chemistry wordoku #067 | Puzzle

Tight House race in Pennsylvania could affect federal science spending

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

An enhanced machine learning algorithm for type 2 diabetes prognosis with a detailed examination of Key correlates

Chemistry wordoku #067 | Puzzle

Tight House race in Pennsylvania could affect federal science spending

Quick chemistry crossword #060

Popular Articles

An enhanced machine learning algorithm for type 2 diabetes prognosis with a detailed examination of Key correlates

Chemistry wordoku #067 | Puzzle

Tight House race in Pennsylvania could affect federal science spending