Speech to Text with Project Oxford using Node.js

Microsoft’s Project Oxford’s APIs are a bit more complicated to use than what many are expecting, especially when using Node.js! The Project Oxford SDK does not provide code in Node for all APIs, i.e Speech to Text. Using the documentation provided online for REST API’s can be a bit confusing, but I think I can help out a little. Note that this API takes in a pre-recorded wav file and listens for speech.

So your first step is getting your Access Token.  You need to make a POST request to https://oxford-speech.cloudapp.net/token/issueToken . The body of your POST request must include the following as x-www-form-urlencoded data:

  • grant_type: ‘client credentials’
  • client_id: <whatever you would like to call it>
  • client_secret: <your api key from the project oxford site>
  • scope: ‘https://speech.platform.bing.com&#8217;

You should receive a JSON object with an access token, a token type, an expiration time (in seconds) and the scope.

From here your next step is to make another POST request to https://speech.platform.bing.com/recognize/query with the following parameters:

  • version: 3.0
  • requestid: <this can be any unique GUID>
  • appID: D4D52672-91D7-4C74-8AD8-42B1D98141A5  (this is the magic value for this to work)
  • format: json
  • locale: en-US (or whichever language you prefer)
  • device.os: <which ever device you are using>
  • scenarios: ulm
  • instanceid: <this can be any unique GUID>

You can get newly created GUIDs for your instanceid and requestid from various online sites or an npm module named guid made to randomly create them. The body of your post must be waveData. And your Headers are as follows:

  • Authorization: Bearer<authorization token we received in the first post>
  • Content-Type: audio/wav; samplerate=8000  (be sure the sample rate matches the wav file you are using)

Testing the API using an app like POSTMAN  would look something like this:

POST https://speech.platform.bing.com/recognize/query?scenarios=catsearch&appid=D4D52672-91D7-4C74-8AD8-42B1D98141A5&locale=en-US&version=3.0&format=json&requestid=b2c95ede-97eb-4c88-81e4-80f32d6aef74&instanceid=106a4690-b664-ca61-addb-cdc705560791&device.os=osx

 and then adding the headers of course. If you are using a tool like request in your app.. then your POST request may look something like:

Screen Shot 2016-02-11 at 4.37.13 PM.png

Note in this example the accessToken is a variable rather than the long accessToken given to you earlier. This should help make the code a lot cleaner.

For a good example on how to use this Project Oxford Speech to text check out this github gist by Luke Hoban. He does a great job of making all this of this into one seamless function.

Good luck 🙂


One thought on “Speech to Text with Project Oxford using Node.js

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s