Python? CAN YOU HEAR ME?

4 min readJun 12, 2020

With a non-descriptive title, this is a tutorial in how to make a simple Python application that recognizes your voice and transform it in text, with this set up you can build a lot of cool things and it is fairly simple.

We are going to use speech_recognition library to be able to hear what we are saying and Google Speech Recognition to understand what was said and reply back, sounds cool, right?

I learned this while watching Edx CS50 course, and I can’t recommend it enough, especially for students like myself that don’t come from a university and might never hear about big O notation, memory allocation and other things before.

This will be a step-by-step on how to install everything and you will be able to see the final code in the end.

Setting up

For this tutorial you will need to have installed in your machine:

Python

To check if you have Python installed on your machine, open the terminal and write “python” on the command line.

$ pythonPython 3.7.6 (default, Jan  8 2020, 13:42:34)[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwinType "help", "copyright", "credits" or "license" for more information.>>>

If you have it installed it should open the python console like the one above, to exit it, type “exit()” and then “enter”.

PyAudio / PortAudio

You need to install PyAudio if you want to use your microphone (what is our case) and PortAudio, you can find the complete documentation in here and in here.

Also, make sure you have Homebrew installed on your machine.

brew install portaudiopip install PyAudio

Speech_recognition

You can find the documentation for speech_recognition in here.
To install, run on your terminal:

pip install SpeechRecognition

To quickly test if it’s working, run on your terminal:

python -m speech_recognition

You should be able to talk and speech_recognition will write what it thinks you said, 2 out of 5 it spells my name right.

Writing your Python file

I created a file called voice_recognition.py and the first thing you will want to do is to import speech_recognition

import speech_recognition

Now, let’s create a variable that will hold our recognizer class, this is the one responsible to recognize speech from an audio source.

recognizer = speech_recognition.Recognizer()

Alright, now what we will do next is create a function called speech() and open and close an instance of a speech_recognition.Microphone() with a“with statement” and we will call it source:

def speech():
    with speech_recognition.Microphone() as source:
        print("say something")
        audio = recognizer.listen(source)speech()

Cool, now just to test if everything is working, write the following under audio:

     response = recognizer.recognize_google(audio)
     print(response)

If you run the program it should print whatever you said in the terminal.

python voice_recognition.py

Cool, but the thing about recognize_google is that it needs to be connected to the internet to work (there is another thing called Sphinx that works offline, btw), so it is a good idea to try and catch any errors if they occur, and to do that, we will delete “response” and the “print” and add the following under “audio”:

try:
    response = recognizer.recognize_google(audio)
    print(response)except speech_recognition.UnknownValueError:
    print("I CAN'T HEAR YOU")except speech_recognition.RequestError:
    print("too busy, talk to you later")

With this statement, we will first try to print what the user said, but if it can’t understand it, it’s going to give an UnknownValueError and print “I CAN’T HEAR YOU”.
The same will happen if the user is not connected to the internet, or if there is a problem with the servers, any request error it will print “too busy, talk to you later”.

Now you should run it again and test if everything is working fine.

See the final code below:

You can also fork this repo on my GitHub account in here.

With this tutorial, you can transform speech in text and the text in a variable that you have access on your Python file, so why not use it to have a conversation with your computer? Or maybe create commands, and from your terminal, you can tell your computer to open certain browsers or play your favourite song on YouTube.

If you do something cool, let me know on Twitter.

Also, check out my other posts on Python:

Using Python and Selenium to Tweet

Drawing With Python

Python? CAN YOU HEAR ME?

Setting up

Writing your Python file

Also, check out my other posts on Python:

Written by Renata Miriuk