Learn how to make a real-time translated translation service using the Agora Web SDK and Google. Cloud.

This post was originally published in Agora.io.


Introduction

Doing business globally is a goal for almost every company. The chance to scale up to an international level can increase profits but may require knowledge of multiple languages to communicate with clients or partners from around the world.

Getting an interpreter to help translate multilingual video conferences is impractical, because it may be annoying and will make the meetings longer than needed. You may also want to keep some information confidential.

In this tutorial, we will develop a web application that supports speech-to-text transcription and translation using JavaScript’s Web Speech API, the Agora Web SDK, the Agora RTM SDK, and the Google Cloud Translation API to avoid dependency on translators and remove the language barrier during video calls.

Build a Live Translated Transcriptions Service in Your Video Call Web App 1
Screenshot of the project we will be developing in this tutorial

Prerequisites

  • Basic knowledge of how to work with JavaScript, JQuery, Bootstrap, and Font Awesome
  • Agora Developer Account — Sign up here
  • Know how to use the Agora Web SDK and the Agora RTM SDK
  • Google Cloud account
  • Understand how to make requests and receive responses from REST APIs

Project Setup

We will build on our existing project: Building Your Own Transcription Service Within a Video Call Web App. You can begin by cloning this project’s GitHub repository. You will now have a project that looks like this:

Build a Live Translated Transcriptions Service in Your Video Call Web App 2
Screenshot of the Agora Transcription Service UI

We will now remove the self-note related HTML and the extra buttons. If you face difficulties understanding what the above code does, see this tutorial.

We have also added code for muting and unmuting video and audio to the video calling application. You can learn more about muting and unmuting from the Agora documentation. Your code will now look like this.

You now have a fully functional transcription service along with muting and unmuting capabilities.

Adding Real-Time Translation to Our Application

We will now add the following code into our HTML file under the existing input field row and add an option for a user to enter their Google Cloud project’s API key.

<div class="row mt-3 mb-3">
        <div class="col-sm">
          <p class="join-info-text">Google Cloud API Key</p>
          <input id="gcpKey" type="text" placeholder="Enter Google Cloud API Key" required class="form-control">
        </div>
        <div class="col-sm">
          <label for="format">Select a Language you Understand</label>
          <select class="form-control" name="transcriptionLang" id="transcriptionLang">
            <option value="af">Afrikaans</option>
            <option value="sq">Albanian</option>
            <option value="ar">Arabic</option>
            <option value="hy">Armenian</option>
            <option value="ca">Catalan</option>
            <option value="zh">Chinese</option>
            <option value="hr">Croatian</option>
            <option value="cs">Czech</option>
            <option value="da">Danish</option>
            <option value="nl">Dutch</option>
            <option value="en" selected>English</option>
            <option value="eo">Esperanto</option>
            <option value="fi">Finnish</option>
            <option value="fr">French</option>
            <option value="de">German</option>
            <option value="el">Greek</option>
            <option value="ht">Haitian Creole</option>
            <option value="hi">Hindi</option>
            <option value="hu">Hungarian</option>
            <option value="is">Icelandic</option>
            <option value="id">Indonesian</option>
            <option value="it">Italian</option>
            <option value="ja">Japanese</option>
            <option value="ko">Korean</option>
            <option value="la">Latin</option>
            <option value="lv">Latvian</option>
            <option value="mk">Macedonian</option>
            <option value="no">Norwegian</option>
            <option value="pl">Polish</option>
            <option value="pt">Portuguese</option>
            <option value="pt-br">Portuguese (Brazil)</option>
            <option value="ro">Romanian</option>
            <option value="ru">Russian</option>
            <option value="sr">Serbian</option>
            <option value="sk">Slovak</option>
            <option value="es">Spanish</option>
            <option value="es-es">Spanish (Spain)</option>
            <option value="es-us">Spanish (United States)</option>
            <option value="sw">Swahili</option>
            <option value="sv">Swedish</option>
            <option value="ta">Tamil</option>
            <option value="th">Thai</option>
            <option value="tr">Turkish</option>
            <option value="vi">Vietnamese</option>
            <option value="cy">Welsh</option>
          </select>
        </div>
</div>
Build a Live Translated Transcriptions Service in Your Video Call Web App 3
Screenshot of how our application looks after adding the new row

Create a Google Cloud Translation API Key

  1. In the Cloud Console, go to the Create service account page.
  2. Select a project.
  3. In the Service account name field, enter a name. The Cloud Console completes the Service account ID field based on this name.
  4. In the Service account description field, enter a description. For example, Agora Live Translated Transcription.
  5. Click on Create and Continue.
Build a Live Translated Transcriptions Service in Your Video Call Web App 4
Screenshot of the page where you can create a service account

6. Click the Select a role field and choose the Cloud Translation API Admin role.

Build a Live Translated Transcriptions Service in Your Video Call Web App 5
Screenshot of the role set screen for service accounts

7. Click Continue.

8. Click Done to finish creating the service account.

9. Enable the Cloud Translation API from here.

Build a Live Translated Transcriptions Service in Your Video Call Web App 6
Enable the Translation API on GCP

10. Click the Credentials tab in the left sidebar, and then click on Create Credentials.

11. Create and copy the generated API Key.

Build a Live Translated Transcriptions Service in Your Video Call Web App 7
The generated API key has to be copied from this page

Core Functionality (In Javascript)

Now that we have the basic structure laid out as well as the keys generated, we can begin adding functionality to the translation service. It may look intimidating at first, but if you follow GCP’s official docs, it’ll be a piece of cake.

The code below takes in the user’s inputted GCP key and the user’s preferred transcription language. As soon as the user stops speaking, their words are transcribed in the chosen language using JavaScript’s Web Speech API.

This same message is sent in the speaker’s language to all users through the Agora RTM SDK. When this message is received, we check for the receiver’s preferred language and use the Google Translate API to convert the original sent message to user-understandable text. This way, even if the remote user has a different language from the local user, the logic would still work as expected.

...
recognition.start();
            // Start transcribing and translating
            var gcpKey = $("#gcpKey").val();
            var transcriptionLang = $('#transcriptionLang').val();
            recognition.onresult = function (event) {
                var current = event.resultIndex;
                var transcript = event.results[current][0].transcript;
                transContent = transContent + transcript;
                singleMessage = transContent;

                // Write code to send, process and show translated transcription to host.
                rtmText = {
                    singleMessage: singleMessage,
                    senderLang: $('#transcriptionLang').val(),
                    time: new Date().toLocaleString("en-US", { year: 'numeric', month: 'long', day: 'numeric', hour12: true, hour: 'numeric', minute: 'numeric', second: 'numeric' })
                };
                msg = {
                    messageType: 'TEXT',
                    rawMessage: undefined,
                    text: JSON.stringify(rtmText)
                };
                channel.sendMessage(msg).then(() => {
                    console.log("Message sent successfully.");
                    console.log("Your message was: " + rtmText.singleMessage + " by " + accountName + " in the following language: " + rtmText.senderLang + " sent at: " + rtmText.time);
                    if (rtmText.senderLang == transcriptionLang) {
                        $("#actual-text").append("<br> <b>Speaker:</b> " + accountName + "<br> <b>Message:</b> " + rtmText.singleMessage + "<br> <b>Sent On:</b> " + rtmText.time + "<br>");
                        transContent = '';
                    } else {
                        var xhr = new XMLHttpRequest();
                        xhr.open("POST", `https://www.googleapis.com/language/translate/v2?key=${gcpKey}&source=${rtmText.senderLang}&target=${transcriptionLang}&callback=translateText&q=${singleMessage}`, true);
                        xhr.send();
                        xhr.onload = function () {
                            if (this.status == 200) {
                                var data = JSON.parse(this.responseText);
                                console.log(data.data.translations[0].translatedText);
                                $("#actual-text").append("<br> <b>Speaker:</b> " + accountName + "<br> <b>Message:</b> " + data.data.translations[0].translatedText + "<br> <b>Sent On:</b> " + rtmText.time + "<br>");
                                transContent = '';
                            } else {
                                var data = JSON.parse(this.responseText);
                                console.log(data);
                            }
                        };
                    }
                }).catch(error => {
                    console.log("Message wasn't sent due to an error: ", error);
                });
            };
            // Receive RTM Channel Message
            channel.on('ChannelMessage', ({
                text
            }, senderId) => {
                // Write code to receive, process and show translated transcription to all users.
                rtmText = JSON.parse(text);
                console.log("Message received successfully.");
                console.log("The message is: " + rtmText.singleMessage + " by " + senderId + " in the following language: " + rtmText.senderLang + " sent at: " + rtmText.time);
                var xhr = new XMLHttpRequest();
                xhr.open("POST", `https://www.googleapis.com/language/translate/v2?key=${gcpKey}&source=${rtmText.senderLang}&target=${transcriptionLang}&callback=translateText&q=${rtmText.singleMessage}`, true);
                xhr.send();
                xhr.onload = function () {
                    if (this.status == 200) {
                        var data = JSON.parse(this.responseText);
                        console.log(data.data.translations[0].translatedText);
                        $("#actual-text").append("<br> <b>Speaker:</b> " + senderId + "<br> <b>Message:</b> " + data.data.translations[0].translatedText + "<br> <b>Sent On:</b> " + rtmText.time + "<br>");
                        transContent = '';
                    } else {
                        var data = JSON.parse(this.responseText);
                        console.log(data);
                    }
                };
            });
...
Build a Live Translated Transcriptions Service in Your Video Call Web App 8
Screenshot of the demo from the user’s perspective.

Note: For testing, you can use two or more browser tabs to simulate multiple users on the call.

Conclusion

You did it!

You have successfully made a multilingual transcription service inside a web video call application. In case you weren’t coding along or want to see the finished product all together, We have uploaded all the code to GitHub.

About the author 

Radiostud.io Staff

Showcasing and curating a knowledge base of tech use cases from across the web.

TechForCXO Weekly Newsletter
TechForCXO Weekly Newsletter

TechForCXO - Our Newsletter Delivering Technology Use Case Insights Every Two Weeks

>