By abhi
Share this simulation
Petoi's Bittle is a palm-sized, open-source, programmable robot dog for STEM and fun. Bittle can connect with Raspberry Pi and can be easily extended
I used PyAudio at the beginning, but it is an old library. So I used Sounddevice and Soundfile instead.
From a functional point of view, the methods to do this can be divided into:
This belongs to the second category and it's similar to template matching. DTW can calculate the cost to match one piece of audio with template audio. We can pick the audio with the lowest cost. This method does not need training and is also applicable even if you want to add new commands. The bad thing is that the calculation is time-consuming. But at least the command audios are short in time and we can find ways to eliminate the silence and extract MFCC(Mel-frequency Cepstral Coefficients) feature.
This is a demo Speech Command Recognition with torchaudio β PyTorch Tutorials which is done by PyTorch Official. But we need to re-train the model when we have new commands coming in.
I was inspired by a blog Audio Handling Basics: Process Audio Files In Command-Line or Python | Hacker Noon. The blog mentions that we can eliminate the silence part of an audio recording according to the short-term energy of audio data. A Python library called librosa provides some functions for doing that.
I tried some open-source methods:
bash rpi_simple.sh
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install make build-essential libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev libatlas-base-dev
pip install your_filename
sudo apt install libblas-dev llvm llvm-dev
export LLVM_CONFIG=/usr/bin/llvm-config
pip install -r requirements_pi.txt
vosk-model-small-en-us
and vosk-model-small-cn
. Move the folder into ./models.python main.py
This demo video presents a comprehensive overview of the simulation uploaded by the user