Real-Time Speech Recognition and Voice-Enabled AI Chatbot Integration using BING and OpenAI

sai harish cherukuri
4 min readJun 28, 2023
Image

Are you tired of typing away questions manually into ChatGPT or BingAI?
Do you want to talk to it and have it react in the same way with audio?
Do you want to compare the responses generated by ChatGPT and BingAI?

This blog guides you in building a personal voice assistant powered by BingAI and ChatGPT using Python. The voice assistant will be able to answer questions, provide information, and engage in conversations with users. The chatbot will utilize speech recognition to capture user input, interact with AI models, and provide relevant responses.

Hope the ones who enjoy reading this give a free clap !!!

Prerequisites:

Before getting started, ensure that you have the following installed:

To get installed use can use either ‘pip install’ or ‘pip3 install’ command.

Python (version 3.10.11)
SpeechRecognition library (pip install SpeechRecognition==3.10.0)
Whisper library (pip install -U openai-whisper)
Boto3 library (pip install boto3==1.26.114)
PyDub library (pip install pydub==0.25.1)
OpenAI library (pip install openai==0.27.4)
EdgeGPT library (pip install EdgeGPT==0.1.23)
Streamlit library (pip install streamlit==1.23.1)

Speech Recognition Library — It allows you to convert spoken language into written text, enabling your applications to process and understand spoken commands or transcribe audio recordings.

Whisper — An automatic speech recognition (ASR) system developed by OpenAI. It provides high-quality speech recognition capabilities, allowing you to transcribe audio input into text.

Boto3 — Helps to integrate Python applications with AWS services, such as Polly (text-to-speech), S3 (storage), EC2(computing), and more.

PyDub — Python audio processing library that provides a simple and easy-to-use interface for manipulating audio files

OpenAI — It allows Python applications to interact with powerful AI models, such as GPT-3.

EdgeGPT — EdgeGPT provides methods for text generation and conversation modeling using GPT.

Streamlit — Python framework that simplifies the process of building interactive web applications.

Follow the below steps to get the open-source cookies.json file.

source

The obtained JSON file is used to store cookies for the EdgeGPT Chatbot. This file allows the Chatbot to maintain the conversation context and continue the conversation with the user across multiple interactions.

bot = Chatbot('cookies.json')
response = await bot.ask(prompt=user_input, conversation_style=ConversationStyle.precise)
for message in response["item"]["messages"]:
if message["author"] == "bot":
bot_response = message["text"]

The response generated by the Chatbot includes references or citations represented within square brackets and caret symbols, such as ‘[^’ or ‘^]’. We remove the citation tags, to ensure that the final displayed response to the user does not include these tags, providing a cleaner and more readable output.

bot_response = re.sub('\[\^\d+\^\]', '', bot_response)

For transcribing Audio to Text, we can use openai-whisper library. Since whisper library does not have a built-in feature for recording microphone input, we must install SpeechRecognition.

import speech_recognition as sr
recognizer = sr.Recognizer()
microphone = sr.Microphone()
  • The code creates a recognizer object from the speech_recognition library, which is used for audio processing and speech recognition.
  • A microphone object is also created to capture the user’s voice input.

For Higher quality voice we can use the AWS Polly engine to create a realistic text-to-speech voice. Integrating with AWS services can be managed by the Boto3 library.

AWS Polly allows you to select the voiceId based on your preferences.

def synthesize_speech(text, output_filename):
polly = boto3.client('polly', region_name='us-west-2')
response = polly.synthesize_speech(
Text=text,
OutputFormat='mp3',
VoiceId='Matthew',
Engine='neural'
)

Where the Pydub library helps in playing the audio file received from AWS Polly.

def play_audio(file):
sound = pydub.AudioSegment.from_file(file, format="mp3")
playback.play(sound)

For ChatGPT integration with the Python application, we need the OpenAI secret key. Generate a new secret API key in the OpenAI platform.

https://platform.openai.com/account/api-keys

Use the generated secret API key for your OpenAI API call

  st.title("CHAT GPT RESULTS")
openai.api_key = key
response = openai.Completion.create(
engine="text-davinci-003",
prompt=user_input,
max_tokens=130
)
bot_response = response.choices[0].text.strip()
bot_response = re.sub('\[\^\d+\^\]', '', bot_response)
st.write(bot_response)
synthesize_speech(bot_response, 'response.mp3')
play_audio('response.mp3')

Finally, built a personalized voice assistant that combines responses from both BingAI and ChatGPT in a sample web app using Streamlit.

Speech/Audio input :

Text input:

References:

https://github.com/openai/whisper

https://github.com/Ai-Austin/Bing-GPT-Voice-Assistant

https://awesomeopensource.com/project/acheong08/EdgeGPT

LinkedIn Profile

https://www.linkedin.com/in/saiharish-ch/

--

--