Creating a cross platform GUI for OpenAI's Whisper¶

Github link¶

Why use it?¶

The initial motivation for this repo is to have a GUI to manipulate Whisper. This is so that I can generate initial video to text files to put up transcripts on the kivyschool.com website. One good thing about this initial integration is that I can then switch models, make the GUI better, or even run this when I have no access to a terminal/cmd/powershell. Having complete control is really nice and being able to use Kivy to rapidly prototype a GUI is invaluable (this was built in 2 days discounting all the time I spent documenting it, total time was 6 days). Later on I still need to integrate multiprocessing to push blocking code onto a Python subprocess.

What can you learn from this?¶

In associated github repo and associated youtube playlist, you will learn:

How to integrate Kivy and OpenAI's Whisper automatic speech recognition system.
How to deal with various Poetry bugs very easily (assuming you got the setup in the PREREQUISITE SETUP video of Pyenv + python-poetry)
How to integrate machine learning/tensorflow Python packages with Kivy and package them with PyInstaller
How to create and visualize Kivy app, then easily make it.
How to use and manipulate Kivy's FileChooser widget on desktop.
How to get and set data on various Kivy widgets.
How to manipulate Kivy popups
How to manipulate filepaths with Pathlib
How to package a complex Kivy app with PyInstaller
See firsthand how to create an app from concept to completion.
See the app at various stages by running the various midpointX.py files

Playlist for this series¶

AI Transcript provided by KivyWhisper ¶

Hello and welcome back to Kivy School. Harness OpenAI's automatic speech recognition system with a custom, easy to use cross-platform GUI.

Now what's the motivation? Number one, I want AI written transcripts for kivyschool.com. Number two, I want to check out Whisper AI. Number three, I want to show that Kivy is easy to build with.

So if you would like to check it out, make sure Git is installed. You do git clone this repo right here. And then you have two options. You can do poetry update if you use Python poetry, or you can do pip install-r path to requirements.txt. This is because poetry allows you to create a requirements.text for any pip users out there.

Then you should enter your virtual environment, and then you type python main.py, and then check it out. You can also check out the PyInstaller executable that I've made and released on the GitHub repo. It's right here, and I'm showing you how to get to it.

So this is the final product quote-unquote review. What it is is a basic Kivy GUI with OpenAI's Whisper integration. It's packaged to a one-file exe with PyInstaller. It has a file chooser, so you can pick what file to transcribe. It also has basic error checks and a pop-up when an error occurs. And it's a good project for any first-time Kivy or Python user.

So this is the Kivy Whisper app that I've made, and it's run through the exe that is created in the distribution folder in my project. One thing to note is that this is within my virtual environment, so I'm not going to be able to run the Whisper portion on my virtual machine, because it's not powerful enough to run Whisper.

But I can still show you that the GUI works. And then a couple things. It's packaged to one-file exe with PyInstaller. I've shown you that right here. It has a file chooser, so you can pick what file to transcribe. You see the file chooser right here. You can also choose different drives. It also has basic error checks. So for example, if you want to transcribe and you have nothing to show, it says, something went wrong.

Please check the file type and destination. Press any key to close. And then it's a good project for any first-time Kivy or Python user. And it's also only 105 lines of code. If you go to main.py, which is the main thing I'm running, it's only 105 lines. It's good because Python and Kivy allow you to abstract away underlying concepts to quickly prototype an app.

It would be even less if I did not count comments. As you can see, there's already one comment here. There's some comments here. 105, it's pretty small for what it does. There's also huge room for improvement to make it pretty. Another point of improvement is multiprocessing to handle Whisper.

Now, the big deal with that is that Whisper is running some blocking code, and you need to move the blocking code of Whisper and offload it to some multiprocessing subprocess. So that's one more improvement that I would make if I had a little bit more time to continue with this project.

So this version, I've shown this already. Like manually, this is just an image. This version is also built from scratch with PyInstaller from the Gitcloned repo on a Windows virtual machine. So it is tested on a blank install. Another thing is I didn't test Whisper though, as my virtual machine cannot run Whisper unlike the host.

So let's begin. What's the plan? The plan is we make a new virtual environment with Poetry, we install Whisper with Poetry, apply Whisper to a test video, and then make the Kivy GUI. When we're going to make the Kivy GUI, there's a couple of features we want. The one feature is you select a file, and for that we can use a file chooser.

We output a text file in the same current working directory, and then we make the text file markdown compliant for the website, which now that I've finished this project, you don't actually need to do anything to make it markdown compliant.

If anything, markdown can actually just display the text. You just have to manually edit the output, because for example, Kivy is being heard as Kiwi. Some other words are not being heard correctly, so that's a small thing, and it still requires some manual... But it saves a lot of time, because with Whisper AI, it does the first pass for you, and then you just read it, and then you can do the second pass manually, and then after that, paste onto a markdown file, and then have it uploaded to the website.

And that's pretty easy and really fast, and it saves a lot of time after making the Kivy GUI. So let's continue. Let's make a new virtual environment with Poetry and install Whisper with Poetry. So there's one problem when I did this, which Poetry add OpenAI Whisper does not work. It does not work off the bat. And what I've noticed with these AI machine learning TensorFlow libraries is that they just don't work well with Poetry.

And then here, there's a comment that says they don't use PEP 508 markers. So what they are good with is these AI machine learning and TensorFlow libraries. They're always tested with PIP, but they're not tested with Poetry. I will show you the comment here. Just cannot install OpenAI Whisper, and it says right here, Poetry assumes that all distributions of a package have the same requirements.

If you want to use OpenAI Whisper with Poetry, you should ask them to express platform-specific variations in their dependencies by using PEP 508 markers. Closed. So that's just one thing I noticed, that does not mean you should not use Poetry with machine learning, with TensorFlow. You just need to Google for the fixes. And the fix was in this GitHub page. Let me just show it right here.

It says, OK, Poetry can still install... Where are you? Poetry can still install Whisper. All you have to do is set the Git repo and then set the revision. And in the PyProject.com mail, and I'll even show you my own PyProject.com mail, it's right here, and it works. So next step, let's continue. So now that we have Whisper installed, let's apply Whisper to a test video. What's the point of making a Kivy GUI, making it work, if Whisper itself doesn't work?

So we always test the things that fail first. And the thing that would fail first is if Whisper just didn't run at all. So let's run, apply Whisper to a test video. Here's our example code, Whispertest.py. We import Whisper, we set the model as this model, which is base. The result is transcribe this video, one of the old clock videos, and then print the result, right? So what did I get? A module that was compiled using NumPy 1 cannot be run with NumPy 2 as it might crash. So I got a strange error, right?

It's using an old version of NumPy. OK, it's asking for NumPy 1.x. So like less than 2, right? So what we can do is just manually add NumPy to PyProjectTomL and just specify less than 2. So again, I will show you my PyProjectTomL. I just said less than 2 because it told me it would crash with NumPy greater than 2. So I'll just use a version less than 2, right? So we did that. And then I run main.py again.

But it works now, but there's a new error. And what's the new error? It says something about FP16. I have no idea what it is. I just Google the error, and then let's see what the solution is. The solution is using the second example in the readme, you can just set FP16 to false. OK, and then for my notes, see line 76 of main.py. So we can go here, control G, 76. Right here, you can just set it to false. You won't get the error message.

It's still ran, as you can see here. It still ran. It just says it's not supported. So I don't know. I just don't want to see this error message because the message is there for a reason. I don't know what the reason is, but I know I don't want to see it. So I have that little fix. And now it works. Now the whisper test works.

And you can see right here, welcome back to Kibii School. I'm going to be teaching about Kibii Clow, right? It's not Kibii Clow. So it's still really good for just a first pass. But definitely, you still need to do some human touch in order to get this good enough to put on the website.

But it saved me like two hours reading the video and then transcribing it manually. I think that in itself saved a lot of time for me. And it's really good because of that. And it's worth it. So here's more working example code. We import whisper model, whisper load model base. The only change is FP16 is equal to false. So now we know that whisper works.

We know how to run it. The next step is to make a Kivy GUI. So in my mind, this is what the basic app should look like. This is going to be the source file. This will be the destination file. And there's a button here that just says transcribe. And in the middle here, there's going to be some file chooser, right? So let's do it.

And then this is just a plan of attack, right? The idea is there's going to be a root box layout right here. And in this box layout, it will be this second box layout right here and this file chooser. This box layout is going to be vertical, right? There's two widgets, this box layout and this file chooser.

And within box layout two, it will be a horizontal box layout like this. And it will contain three widgets, source button, destination button, and transcribe button. So that's the plan of attack. Let's see it work in action. So this version is saved as midpoint1.py. Now let's look at midpoint1.py and check out the KV. So we can go here.

We can look at midpoint1.py. As you can see, there are two box layouts, right? The root widget is just a box layout, right? With a vertical and then let's reference the image right here. So as you can see, this first box layout is orientation vertical. It's going to be the yellow box layout. And then there's two sub widgets, which is a box layout and a button, right?

And within this box layout, it's going to have three buttons. And then the fourth button, which is it's a placeholder for the file chooser. It just says HW for hello world for right. So let's enter our virtual environment. Let's see the KV whisper and then Python midpoint. I think it's midpoint1.py. Let's check it out, right? So this is going to be my base of what the app is going to look like.

And it's similar to the one that I have pre envisioned beforehand, right? And again, it says this version is saved as midpoint1.py, right?

So with the plan completion, what have we done? We've made a new virtual environment with poetry. We've installed whisper with poetry and we fixed some bugs. We've also applied whisper to a test video with the basic whisper code. And now we're on the step of making a Kivy GUI. Selecting a file and then outputting the text file in the same current working directory.

So that's it for part one. Part two is we're going to finish the Kivy GUI.

Thank you for watching. This has been Kivy School. Have a great day.

Article Error Reporting¶

Message @BadMetrics on the Kivy Discord.