Capturing an org-mode entry using speech to text

I have a Cosmo Communicator from Planet Computers. I got it primarily because I need a 'phone with a physical keyboard, and I need a 'phone with a physical keyboard for 2 main reasons: I prefer the tactile nature of physical keyboards, allowing me to look at the screen as I type, and I depend GNU/Emacs and org-mode for organising my life. Currently (and probably for the foreseeable future), I access GNU/Emacs through a terminal using the Termux Android app.

I recently figured out how to take voice notes that can be recorded directly into my org-mode set up, which reduces the time and effort I need to get something noted quickly.

Before now, if I wanted to take a note into my org-mode set-up, I would have to first open the clam-shell 'phone, unlock it, start or find a termux session, launch emacsclient (which will launch a new GNU/Emacs session if one isn't running already), find the correct org-mode file and then the correct org-mode headline, and only then start typing the note. Very labour intensive. I understand that org-capture is designed to help with this, but I've never been able to figure out how it's really supposed to be used. Also – and I am open to correction here – I don't think it helps until the point when I have an open GNU/Emacs window in front of me, by which time I'm already at the third-last step in that sequence.

What I outline below allows me to go to my 'phone's home screen, tap a widget, speak, and all the rest is done automatically.

Before embarking on this, it's important to note that my set up depends on "Google" being installed on my Android device, and it having been "Enabled". This means that the solution isn't fully Free Software. If someone knows of a Free Software speech-to-text implementation I could use instead, I will happily look into transferring over to it. At the time of writing, I have not found such a thing.

So, the sequence of set-up steps

Install F-Droid

This solution depends on Termux. Although Termux is available on Google Play, a recent change over there means that the developer isn't providing any more updates to Termux. If you're already using Termux from Google Play, this set-up will work I think, but if it doesn't, and for other reasons anyway, I recommend using Termux from F-Droid. In which case you need to install it. These instructions are not going to get into it, but the F-Droid site is a perfect resource for learning how to do that.
Install Termux, Termux:API and Termux:Widget

If you're using the Google Play version of Termux, you'll have to install the others from Google Play, too. Otherwise, install them all from F-Droid. You'll need..
- Termux as the main terminal utility.
- Termux:API to integrate the speech-to-text capture into Termux, and
- Termux:Widget to allow for invoking a small script from the Android home screen.
Start Termux.
From the Termux command line, run pkg upgrade to ensure that you have the latest version of all the packages that come with it.
From the Termux command line, run…
```
pkg install emacs termux-api termux-tools jq
```
which will install GNU/Emacs, jq for processing JSON objects, termux-api and termux-tools for the termux-related functionality.
Perform whatever GNU/Emacs setup you need to perform. This is up to your taste, as these instructions assume you're familiar with GNU/Emacs, for why else would this page interest you?
Perform whatever org-mode setup you need to perform. Again, I'm assuming you have an org-mode set up that you suits your way of working.

I have integrated my Org Agenda files with a revision control system for years. Currently it uses git, and if this is something that works for you, I recommend it. You can – of course – use an equally-good revision control system, like subversion, if that suits. However, where relevant, these instructions make use of git.

You should create a new org-mode file and add it to you org-agenda-files setting, so that any changes to it will be incorporated into your org-agenda calls. The value here is that your captured notes will go into it and your other files won't be affected.
In Termux, create a directory to contain the script that will capture the spoken message:
```
$ mkdir -pv ~/.shortcuts/tasks
```
Create a new script in that directory. Call it what you want. However, the following is vitally important: make sure that the "shebang" is correct.

If you don't know, the "shebang" is the first line of a shell script that informs the calling shell what command to use to execute it. For Termux:Widget scripts to work, they must have a shebang, and the shebang must be correct. If neither of these is true, the script won't work, and you won't get any feedback helping you identify the problem.

To get the shebang correct, type the following command into a Termux shell:
```
$ which bash
```
This will return something like /data/data/com.termux/files/usr/bin/bash. The shebang, therefore will be the following on its own as the very first line in the script (not the second with, like, a blank first line; it has to be the very first line):
```
#!/data/data/com.termux/files/usr/bin/bash
```
Whetever follows the exclamation mark is to be exactly what is returned by the which bash command.
After the shebang (of course, because the shebang has to be the very first line), your script should do the following:
1. Capture spoken instructions using the following piped sequence of commands:
```
termux-dialog speech -t "Termux" | jq .text | sed 's/"//g'
```
  termux-dialog speech -t "Termux" presents a speech-capture dialogue with the title Termux. You can set the title to something that suits you. This will send a small JSON object to stdout which contains a field call "text".
  
  jq .text (don't forget the space between the q and the .) extracts the value of the "text" field in the JSON object and sends it to stdout
  
  sed 's/"//g' strips " characters from the output. This may be a little crude, but you're hardly going to speak double-quotes, are you?
2. Your script will then send the output of the above sequence into your dedicated org-mode file for capturing these notes, with the appropriate context around it. For example (see below), my setup creates a new 2nd-level headline as a TODO item, and it sets the SCHEDULED date cookie on that item to yesterday (so that it appears at the top of my agenda), and it sets the priority cookie to [#A].
  
  Your script will append all of this into the relevant file.
3. If you're using a revision control system, then your script should commit the new note into it so that it can be propagated to where you need it. Again, see below for how I use git for this.

Once you've the script written, set the permissions to allow for it to be executable with one of the following:

chmod -v +x ~/.shortcuts/tasks/<script_name> # to make it generally executable

chmod -v u+x ~/.shortcuts/tasks/<script_name> # to make it executable for the script's owner only

Follow the instructions to set up Termux:Widget on your home screen, which will present to you all the executable scripts in ~/.shortcuts/ and ~/.shortcuts/tasks/, which you can launch by tapping on them. Placing the script into ~/.shortcuts/ will launch a terminal screen to run it, but placing it into ~/.shortcuts/tasks will cause it to run in the background, which is what you probably want.

Now, you should test the script from the command line to confirm it works, simply by calling it from the terminal prompt:

$ ~/.shortcuts/tasks/<script_name>

which will present the speech-entry dialogue screen, and after you have spoken, it will close and you will see the note captured into the org-mode file.

If that works, then you can test it from the home screen widget.

Once you have it working, then you may consider some of the other possibilities. See below for how I have implemented it, which does some other fancy things:

I use keywords for different actions: "note" for org-mode notes, "wiki" to perform a Wikipedia search (using termux-open), "duck" to perform a DuckDuckGo search and "locate" to perform an OpenStreetMap.org search.
I send feedback to the Android screen using the termux-toast command. I also capture output into a log file.
As I am a heavy user of GNU/Emacs' --bg-daemon mode, I use emacsclient commands to instruct emacs to do other things when capturing the note, like invoking org-agenda-list, which refreshes my org agenda.

Finally, all my other GNU/Emacs instances (on my many computers at home and at work) will automatically pick up the new note from the git repository. Whenever I refresh the agenda, I will see the new note as an overdue CAPTURED item, which will prompt me to do something about it.

Have fun, and let me know if you see any faults with my set up.

My personal setup

~/.shortcuts/tasks/voice_command.sh – Script to capture voice command.

#!/data/data/com.termux/files/usr/bin/bash

# Where the log outputs are to be sent.
export LOG_FILE=${HOME}/tmp/widget-test.out

# To convert a string into text suitable for a web query
urlencode() {
  # urlencode <string>

  old_lc_collate=$LC_COLLATE
  LC_COLLATE=C

  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:$i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf '%s' "$c" ;;
      *) printf '%%%02X' "'$c" ;;
    esac
  done

  LC_COLLATE=$old_lc_collate
}

# To text suitable for a web query into normal text
urldecode() {
  # urldecode <string>

  local url_encoded="${1//+/ }"
  printf '%b' "${url_encoded//%/\\x}"
}

# To send a notification to the 'phone
notify () {
  termux-toast -g top "${1}"
  echo "Notification ${1}" >> ${LOG_FILE}
}

# A debug notification/entry
debug () {
  if [ "${DEBUG}" = "Y" ]; then
    notify "DEBUG: ${1}"
  fi
}

# Processing command-line options.
export DEBUG=N
while getopts "d" opt; do
  case ${opt} in
    d) # We want DEBUG output
       if [ "${DEBUG}" = "N" ]; then
         DEBUG=Y
       else
         set -x
       fi
       # if the command-line includes " -d -d" or "-dd" bash -x is used.
       ;;
    *) echo "Oops"
       exit 1
       ;;
  esac
done

# NOTE is what is converted from speech to text, in text form
export NOTE
# LEAD_WORD is the first word of the note, which is used to instruct
# this script
export LEAD_WORD
# ACTION is every thing after the LEAD_WORD
export ACTION

# Ask for the voice command.
NOTE="$(termux-dialog speech -t "Termux" | jq .text | sed 's/"//g')"
debug "NOTE is \"${NOTE}\""

# Separate out the lead work and the instruction
LEAD_WORD="$(echo ${NOTE} | cut -d' ' -f1)"
debug "LEAD_WORD is \"${LEAD_WORD}\""

ACTION="$(echo ${NOTE} | sed "s/^${LEAD_WORD} //")"
debug "ACTION is \"${ACTION}\""

# Convert the lead word to lower case, making it easier to test for.
LEAD_WORD="${LEAD_WORD,}"
debug "LEAD_WORD is \"${LEAD_WORD}\""

# If the LEAD_WORD is "note", then this is an org capture
# note. ("org", "org mode" and "capture" were too fluffy and failed a
# lot)
if [ "${LEAD_WORD}" = "note" ]; then
  debug "ACTION is \"${ACTION}\""
  # org-capture.sh's own shebang is for /bin/bash, but that won't work
  # on Termux, so we invoke the script through Termux' bash.
  /data/data/com.termux/files/usr/bin/bash ${HOME}/eibhear_org/scripts/org-capture.sh "${ACTION}" >> ${LOG_FILE}
  notify "ORG Capture of \"${ACTION}\" complete"
# If the LEAD_WORD is "duck", perform a DuckDuckGo search on the ACTION
elif [ "${LEAD_WORD}" = "duck" ]; then
  export SEARCH_TERM="$(urlencode "${ACTION}")"
  debug "SEARCH_TERM is \"${SEARCH_TERM}\""
  termux-open "https://duckduckgo.com/?q=${SEARCH_TERM}&t=termux-open"
# If the LEAD_WORD is "wiki", perform a wikipedia search on the ACTION
elif [ "${LEAD_WORD}" = "wiki" ]; then
  export SEARCH_TERM="$(urlencode "${ACTION}")"
  debug "SEARCH_TERM is \"${SEARCH_TERM}\""
  termux-open "https://en.wikipedia.org/wiki/Special:Search?search=${SEARCH_TERM}&sourceid=termux-open"
# If the LEAD_WORD is "locate", perform a openstreetmap.org search on the ACTION
elif [ "${LEAD_WORD}" = "locate" ]; then
  export SEARCH_TERM="$(urlencode "${ACTION}")"
  debug "SEARCH_TERM is \"${SEARCH_TERM}\""
  termux-open "https://www.openstreetmap.org/search?query=${SEARCH_TERM}"
else
  notify "Can't parse \"${NOTE}\", so don't know what to do with it."
fi

${HOME}/eibhear_org/scripts/org-capture.sh – Script to capture an org-mode entry for later processing.

#!/bin/bash

# Processing command-line options.
export DEBUG=N
export DONT_COMMIT=N
while getopts "dn" opt; do
  case ${opt} in
    d) # We want DEBUG output
       if [ "${DEBUG}" = "N" ]; then
         DEBUG=Y
       else
         set -x
       fi
       # if the command-line includes " -d -d" or "-dd" bash -x is used.
       ;;
    n) # We don't want to commit and push this note
       DONT_COMMIT=Y
       ;;
    *) echo "Oops"
       exit 1
       ;;
  esac
done
# shift to the first non-option parameter.
shift $(( ${OPTIND} - 1 )); unset OPTIND

if [ "${DONT_COMMIT}" = "N" ]; then
  # fetch and update -- it doesn't really matter if this doesn't work
  # egit-update-org-agenda-files is a personal elisp utility I have
  # that updates the org-agenda-files from my git repository.
  emacsclient -e '(egit-update-org-agenda-files)'
fi

# Send the information into an agenda file. To get it to pop up to the
# top of the agenda, set the scheduled date to yesterday. A stupid,
# but effective, hack.
echo "** CAPTURED [#A] (from org-capture.sh) ${1}" >> ${HOME}/eibhear_org/capture.org
echo "   SCHEDULED: <$(date -d '1 day ago' '+%F %a')>" >> ${HOME}/eibhear_org/capture.org

# Revert all the agenda files and rebuild the
# agenda. e-revert-org-agenda-file-buffers is another personal elisp
# function I wrote to do this.
emacsclient -e '(progn (e-revert-org-agenda-file-buffers) (org-agenda-list))'

if [ "${DONT_COMMIT}" = "N" ]; then

  # Add the updated capture file
  git -C ${HOME}/eibhear_org add capture.org

  retcode=${?}

  # egit-get-alerter-func is a personal elisp function I use to get
  # the name of the function that will send alerts, as this will
  # differ from system to system (GNU/Linux, Windows, Android,
  # Sailfish OS, etc.)
  if [ ${retcode} -ne 0 ]; then
    emacsclient -e "(apply (egit-get-alerter-func) \"Org Capture\" (list \"Problem git-adding capture.org\"))"
    exit ${retcode}
  fi

  # Commit the updated capture file
  git -C ${HOME}/eibhear_org commit -m "A new note recorded through org-capture.sh"

  retcode=${?}

  if [ ${?} -ne 0 ]; then
    emacsclient -e "(apply (egit-get-alerter-func) \"Org Capture\" (list \"Problem git-committing new note\"))"
    exit ${retcode}
  fi

  # Push the updated capture file
  git -C ${HOME}/eibhear_org push

  retcode=${?}

  if [ ${?} -ne 0 ]; then
    emacsclient -e "(apply (egit-get-alerter-func) \"Org Capture\" (list \"Problem git-pushing new note\"))"
    exit ${retcode}
  fi
fi

# Notify of the completion of the capture and update the agenda.
emacsclient -e "(progn (e-revert-org-agenda-file-buffers) (org-agenda-list) (apply (egit-get-alerter-func) \"Org Capture\" (list (format \"Note (%s) taken\" \"${1}\"))))"

capture.org – Template org-mode file

# For quick capture of notes. The TODO keywords are CAPTURED, denoting
# that it was entered by the org-capture script, and TRANSFERRED,
# denoting that the note has been transferred to another org-mode file
# and therefore has been processed from here.

#+TODO: CAPTURED | TRANSFERRED

* TODOS -- Remove from here as each is moved to the respective target location :captured:
** TRANSFERRED [#A] (from org-capture.sh) call mother to wish her a happy birthday
   SCHEDULED: <2021-03-01 Mon>
** CAPTURED [#A] (from org-capture.sh) dentist appointment on the 25th at 1:30
   SCHEDULED: <2021-03-03 Wed>

Update [2022-08-25 Thu]

Following Uwe's suggestion, I was able to confirm that this now works for me on Android 11. I've moved these addendums to the bottom of the post so that they are no longer in the way.

Update [2022-08-16 Tue]

See the comment below from Uwe, who has been able to get this to work on Android 11.

I don't have an Android 11 device to hand right now, but once I do and get this working, I'll remove these updates and correct the post where necessary.

Update [2021-08-10 Tue]

The termux-dialog speech command doesn't work on Android 11. Looking briefly into it, it seems that Google changed the speech-to-text API in Android 11, and termux-api hasn't adopted that change. It's not fair to call it a bug, as such, but until the termux guys get a chance to fix it, what follows can be described as not working on Android 11.

agdad on github has raised an issue about this, but as I am not on github, I can't promote is as something I would like see fixed, too. I have brought it to the attention of the termux gitter/matrix room, but I don't know if it has been seen.

Therefore, if this is something of interest to you, and if you are on github or have some other way to bring this to the attention of the termux team, please let them know.

Once I learn that this works on Android 11, I'll update this post.

You can comment on this post below, or on the matrix room here. If you want, you can "Log in" using your [matrix] ID.

All comments are subject to this site's comment policy.

Éibhear/Gibiris

Capturing an org-mode entry using speech to text

My personal setup

About