Speech to Text with Project Oxford using Node.js

Microsoft’s Project Oxford’s APIs are a bit more complicated to use than what many are expecting, especially when using Node.js! The Project Oxford SDK does not provide code in Node for all APIs, i.e Speech to Text. Using the documentation provided online for REST API’s can be a bit confusing, but I think I can help out a little. Note that this API takes in a pre-recorded wav file and listens for speech.

So your first step is getting your Access Token.  You need to make a POST request to . The body of your POST request must include the following as x-www-form-urlencoded data:

  • grant_type: ‘client credentials’
  • client_id: <whatever you would like to call it>
  • client_secret: <your api key from the project oxford site>
  • scope: ‘;

You should receive a JSON object with an access token, a token type, an expiration time (in seconds) and the scope.

From here your next step is to make another POST request to;with the following parameters:

  • version: 3.0
  • requestid: <this can be any unique GUID>
  • appID: D4D52672-91D7-4C74-8AD8-42B1D98141A5  (this is the magic value for this to work)
  • format: json
  • locale: en-US (or whichever language you prefer)
  • device.os: <which ever device you are using>
  • scenarios: ulm
  • instanceid: <this can be any unique GUID>

You can get newly created GUIDs for your instanceid and requestid from various online sites or an npm module named guid made to randomly create them. The body of your post must be waveData. And your Headers are as follows:

  • Authorization: Bearer<authorization token we received in the first post>
  • Content-Type: audio/wav; samplerate=8000  (be sure the sample rate matches the wav file you are using)

Testing the API using an app like POSTMAN  would look something like this:


 and then adding the headers of course. If you are using a tool like request in your app.. then your POST request may look something like:

Screen Shot 2016-02-11 at 4.37.13 PM.png

Note in this example the accessToken is a variable rather than the long accessToken given to you earlier. This should help make the code a lot cleaner.

For a good example on how to use this Project Oxford Speech to text check out this github gist by Luke Hoban. He does a great job of making all this of this into one seamless function.

Good luck 🙂


Published by Gabrielle

The Unexpected Nerd | Software Engineer | Gamer | Sneaker Enthusiast | Alpha Woman💗💚

3 thoughts on “Speech to Text with Project Oxford using Node.js

  1. This is great! I was also trying work on this for my project but I had to stop because I would not have able to finish on time. I had trouble understanding how to send the wavData and where did you get the wavData. If it’s not much trouble, could you explain about that part?

  2. Hi there, this is really great! I was having hard time trying to use microsoftt api and couldn’t get it to work. This blog really helped me out but I still don’t understand how you get your audio file, wavData?

Leave a Reply

%d bloggers like this: