New action node for Speech to Text

Theo · March 7, 2025, 10:26am

We are (somewhat surprisingly) receiving a lot of requests from end users to implement speech to text features, as this will make it easier for users to enter information on the go.

Have Appfarm considered creating a speech to text action, or something that allows the user to generate strings using the microphone?

Apart from the general improved app-usability, and accessibility perspectives, this would also be a great addition, as are beginning to integrate AI agents in our apps, and having the ability to record a message, convert it to a string and then send it to the agent would be a really great feature.

Olav · March 11, 2025, 8:02am

Thank you, that is a very interesting feature request!

We haven’t explicitly considered this feature yet, but we are currently working to bring several smart features to Create. I see several potential solutions here: a separate action node, building it into the text edit component as a setting you can enable, or releasing it as a shareable component/integration.

I will register the feature request internally and we will discuss the possibilities!

oskar.bragnes · March 12, 2025, 7:24am

This is a very relevant discussion! Hæhre is also looking for speech-to-text functionality and would greatly benefit from such a feature. We see significant potential in making it easier for users to input information on the go, particularly in field operations where typing can be impractical.

Additionally, as we are starting to integrate AI agents into our applications, having the ability to record messages, convert them into text, and process them accordingly would be extremely valuable. We would love to see this implemented, whether as an action node, a setting in the text edit component, or a shareable integration.

Looking forward to hearing more about the possibilities!

ErikAKSkallevold · March 12, 2025, 12:00pm

Hi,

What is preventing your users from using their OS-provided Speech-To-Text functionality?

This is present in all major operating systems:

In which scenario would you need a dedicated Speech-To-Text actionNode?

// Erik

Olav · March 12, 2025, 12:42pm

I just built an example in an Appfarm app downloaded as a PWA on my iPhone. I have also tested that this works very well on Windows.

Theo · March 12, 2025, 1:01pm

That is a really good point, and I think it solves all our needs.

The only reason that I can think of, is that it would be easier for the end users if we could trigger the microphone automatically for certain actions, but this should probably be enough. Maybe it is possible to trigger the built-in speech-to-text function by using a script in the app, and if that is the case, then it would definitely not be necessary.

Thanks!

Theo · March 12, 2025, 1:20pm

I quickly tested, and turns out you can use the built-in microphone in the browser as well!

I placed a button on a page and ran the following code:

const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;

recognition.onresult = (event) => {
    const speechResult = event.results[0][0].transcript;

    updateText({
        text: speechResult,
    });

    resolve();
};

recognition.start();

where the updateText is another action that updates a text variable. The result was that I was able to trigger speech-to-text on any device, without having to setup this in the settings

Theo · March 13, 2025, 7:29am

After looking into the data handling and privacy, it turns out this is not suitable for enterprise apps after all, as both the features provided by the OS and the browser are collecting data, which is a no-go for us.

Only solution I see as of now is to download and host our own models, using something we find open source, and then integrate towards that.

A secure speech-to-text would very much be desired for enterprise use, and hope this is something Appfarm would consider

Olav · March 13, 2025, 9:19am

I tried to look into this and as far as I can see, neither Microsoft nor Apple stores the content provided by speech-to-text at rest.

Could you provide us with your privacy requirements for a speech-to-text feature, and how the current implementations in the different operating systems and browsers breach with those? This would inform our choices when exploring ways to support this feature.

Theo · March 13, 2025, 10:09am

From what I understand, the voice inputs are shared with Apple for IOS at least: Legal - Siri, Dictation & Privacy- Apple

Since we handle a lot of non-public market information, and a lot of the use cases includes the use of sensitive data, we cannot allow users to dictate/transcribe if there is a chance that the data is shared outside of the organization.

From what I read, the Apple dictation appears to take this into consideration, however it states that it sends the request to the Apple servers and might store them if the user has opted in.

In my case (user who has not opted in), they state that the voice input is sent to Apple:

Our requirements are in short that the voice input and text output should not be (or inadvertently allowed by the user to be) shared or accessible to any third-parties

houman.mohebbi · March 21, 2025, 3:39pm

I have written an article on how to use ai models locally and offline in client in Appfarm. (In norwegian only). Might solve your privacy issues.

Topic		Replies	Views
Sending all app data to Open AI Ask the community	21	145	May 30, 2025
openAI Assistant API for chatbot Ask the community	7	775	April 26, 2024
Sendgrid integration Feature requests	2	103	August 30, 2024
Release Notes 107 Release notifications	0	78	September 24, 2024
Open AI Vector Stores Ask the community	0	32	June 16, 2025

New action node for Speech to Text

Related topics