Creating a cross platform GUI for OpenAI's Whisper Part 4¶

GitHub link¶

Why use it?¶

The initial motivation for this repo is to have a GUI to manipulate Whisper. This is so that I can generate initial video to text files to put up transcripts on the kivyschool.com website. One good thing about this initial integration is that I can then switch models, make the GUI better, or even run this when I have no access to a terminal/cmd/powershell. Having complete control is really nice and being able to use Kivy to rapidly prototype a GUI is invaluable (this was built in 2 days discounting all the time I spent documenting it, total time was 6 days). Later on I still need to integrate multiprocessing to push blocking code onto a Python subprocess.

What can you learn from this?¶

In associated github repo and associated youtube playlist, you will learn:

How to integrate Kivy and OpenAI's Whisper automatic speech recognition system.
How to deal with various Poetry bugs very easily (assuming you got the setup in the PREREQUISITE SETUP video of Pyenv + python-poetry)
How to integrate machine learning/tensorflow Python packages with Kivy and package them with PyInstaller
How to create and visualize Kivy app, then easily make it.
How to use and manipulate Kivy's FileChooser widget on desktop.
How to get and set data on various Kivy widgets.
How to manipulate Kivy popups
How to manipulate filepaths with Pathlib
How to package a complex Kivy app with PyInstaller
See firsthand how to create an app from concept to completion.
See the app at various stages by running the various midpointX.py files

AI Transcript provided by KivyWhisper ¶

Hello and welcome back to Kivy School. This is part four of Kivy Whisper. Now what we're going to do is connect Whisper AI in Kivy for real. So we've already done the prerequisite work.

Now we're going to remember from the very beginning, what's the whisper code right in whispertest.py. Import Whisper, you have your model, you transcribe this audio file, then set FP16 as false, and then you can see the text right here. What are the important bits? It's to set the file name to transcribe and instead of printing to standard out or command prompt, we want to write to a file. So we need to change this part totally.

So what we have, we have set a file name right here. Comes from file chooser selection, check root ID selected ID.txt. Now we just need to remove the initial string selected file. And instead of print to standard out or command prompt, now we got to write the file right.

So I'll show you here. It's going to be midpoint for a.py. So as we select a file, pretend this is an audio file or video file, we have selected file. This is what we want very specifically. The CV box post install.log. This is what we want. This is the file location. And then this is what we will use to supply to whisper to transcribe. And we'll continue.

Then this is the reminder of the original mockup, right? We still need to add the destination button. Also, the file name destination can be the old file name and all we can do is append the string whisper transcript. But for that, let's change the button to a text input and then add to the selection response by setting the input text inputs text.

And if the user wants to change the file name, they can change the text input. So let's run this again. Let's say I click this, right? I can say. Whisper transcript one. And then when I press this button, ideally, what it should do is this file right here. It's going to be the text input. When I press this button, ideally, what it should do is this file right here. It's going to be converted by whisper into text. And then this text box right here will be the file name of the text file once we press the button. So that's the idea, right?

That's the destination button. But it's not a button anymore. It's going to be a text input. Okay, next. So I changed the button to text input. Add it to the selection response by setting the text inputs text right here. I also set multi-line false as a file name cannot have multiple lines, right? And one problem I had when dealing with this widget is that V-Align center does not work in text input.

But this guy right here, he said, okay, if you want to center your text input, you can simply add padding to the top and to the bottom. And this is the code you can see right here. Right. Kivy text input horizontal and vertical align. So for me, I just want the vertical align. And then he said right here. Center H align center plus padding on the y axis only does the job well. That's what I'm using to center my text. Otherwise, the text will be down here. And it would be kind of ugly.

Okay. Now the app looks like this. Midpoint for you pie, which is what I'm showing. So there's still some problems. You still need to respond to selection and fill in the text input when a file is selected. So if you see right here, it's not changing the text input to respond to my selection again. Let's change. Let's go back here. Drive D. Let's go back here. Okay. So what you need to do is I gave text input an ID through KV. Text input ID.

And in file choosers on selection, I can change the text input text like so. Which is this long thing that I will explain. And then another example from Stack Overflow is pathlib. And the important thing about this pathlib stem, it just means you get the file. You get the file, you don't get the extension. pathlib.path.stem You just get the file, you don't get the last extension. As you can see right here, the file is .tar.gz.

It just removes the .gz. So you have a file.tar, which is okay. For me, I assume you're just using like a .mp4. You remove the .mp4 with the stem. I assume it's a .wav. You just remove the .wav .stem. So all this does is it takes off the last file extension. But there is a catch, which is it's only the last extension. It doesn't remove all the extensions. Which is a very strange edge case that you might experience.

I'm just letting you know. That's what stem does. Okay, so let's check. midpoint. Right here, if we select, it will say the file name is vbox post install. Instead of .log, we just say vbox post install whisper transcript. And then .text is assumed. The root id's text input text equals pathlib.path of self-selection 0. So this selection right here is the first selection that you've made in the file chooser.

And then .stem to remove the end extension. Plus whisper transcript, right? So you just say whisper transcript. Now there's another problem. If self-selection is an empty list, this will bug out because you'll have an index error. You're getting the first element of an empty list. So you have to say if the length of self.selection is greater than 0, right? Or else empty string.

Now why else empty string? Because if you have no choice, you should just remove everything. Because you have no selection, right? Right here is empty string. But if I select something, now I fill it in. But if I change files, my selection is suddenly empty, right? And then it becomes empty string. Go up. I select something. You have something to say. So plus whispers transcript. OK, I lost the selection.

Now you go back to empty. And right now, if you check the selection of this file chooser, it will be nothing. Because I changed folders. And that's the behavior of file chooser. Now here again from Stack Overflow. As an example right here, it says path of text.stem. This turns path to file is just the file and the file name itself. An easy cross platform way to remove extensions. But fails if there are multiple periods in the file name. Which I will not worry about. That's a strange edge case.

But it does exist. Similar to part 3. This will set the text input ID. Text input widgets text. First checks if self.selection is greater than 0. The list is not empty. If that's true, we will get the base file name without the extension using path.stem. And then append whisper transcript. And then if the selection list is empty, we will set the text input text to be empty as well. So again reminder of the whisper code.

The important bits are we have a file name. This is the file that we want to transcribe. And then we need to write to a file. Right? So we have a file name. A potential file name destination. All that's left is to write to a file. Right? And then this is some easy Python writing code. You say F equals open file name.text. You use W. And then F.write. The string right here. And then close the file. If you don't close, it will be open forever. You don't want to do that because you want to open it and rewrite to it again. It wasn't closed the first time. And then your seek is messed up. Things don't work well.

Every time you open a file, you should remember to close it. TLDR. I chose W. Because if you use X as an option right here. It will complain if the file does not exist. Which it probably won't. Because a lot of these whisper transcripts. File names just don't exist previously. That's not good. I don't want to use X. A will create a file. But it will append to the end. This might create insanely long text files.

So one example I could think of is. You want to transcribe this file over and over and over again. Just for some reason. Right? A will create a file. And it will just append the transcript over and over and over and over again. I think that's strange. Let's just when you transcribe a file. You get the correct length of transcription.

So W is good for my use case. Because it will create a file if it does not exist. And replace the old text with the new updated text. So on the third button. Right? Now it's time to call whisper. And get the text and plug the info in. Here I've saved the progress as midpoint 4b.py.

Now we will call this function and then write to a file. Let's check 4b.py. So again right here. You've selected something. Now when I click this. It's supposed to transcribe. So before running whisper code. I want to open a popup. In case whisper fails. Right? And the popup looks like this. It's saved as midpoint 4c.py. So again.

Let's just create an easy error. Where we have no selection. Right? If you have no selection. You can't transcribe nothing. Right? You don't want to waste. CPU time. If something went wrong. Please check the file type and destination. Press any key to close. Like this. Okay. So let's continue again. And then how did I make the popup? Right? First I changed the button text. I need something to hold the custom popup code. Right? I turned the root box layout widget into a custom box layout widget that inherits from box layout. It's only real job is to hold the popup code and the custom code that calls whisper. To save this 4c.py.

So let's compare between 4b and 4c. Very quickly. As you can see right here. I used to have just a box layout. Right? But because I need something to hold the popup code. I turned it into custom box layout. And then you can see right here. This becomes just a class definition. It's not actually custom box layout. But when I go down here. Now I'm actually using the custom box layout. So it's a box layout instance in the KV.

Right here it's just like describing an abstract class. If you scroll down here. I'm actually using it. Then what else? Alright here also. I did not define the box layout. Because it's predefined. But because it's a custom box layout. I should say custom box layout. Box layout right here. If you are a Kivy veteran. You know I already use the app. But I just want to be clear.

Then here we say just the one simple thing. Define a function called popup open. And the popup is like this. The title is something went wrong. And then the content. It just has one widget. Which is a label. Please check the file type and destination. New line. Press any key to close. Size hint none none. And size 400 400. And then we open the popup.

And the good thing about popup is. It's very easy to close. But if you would like to have more things to close it. You should just call popup. If you really want to close the popup. In another way. So we've compared. So how did I make the popup? Another problem with the popup. Is that the text was too long. You see right here. It's too long. So I ended up manually adding a slash and new line. To make the line fit.

Here's another stack overflow and a popup. Another stack overflow answer. It says. Keep the label and popup long text. They're saying. Limit the area to specific size. That's the size of the widget itself. You can do this. But. I didn't want to do it. I just want simple answer. New line. But if you have a problem similar to this. You can check this answer out. And the next. Now we have a popup. When whisper errors out.

Time to actually run whisper. So reminder again. Of the whisper code. You need the file name and we need to write to a file. Now we've got the file name from our file chooser. And we know how to write a file because we have the basic. File writing code from earlier. So now we have the custom box layout class. To hold our custom code.

Let's add a new function. Run whisper. What are the args? Let's keep it simple and only use the medium model. Requires five gigabytes of VRAM. If your machine does not have that much VRAM. You can change the whisper code to use small. Base or tiny. Now what are the args for run whisper? Run whisper also needs the file to read. And the file to output. Which I've said three times already.

I personally like using star args and positional arguments. In my case the positional arguments will be. Number one. The file to read. Number two. File to output. However. Because we are running whisper in the root widget. We can just query the children out of the box layout instead. The selection label and the output text input. Make the function simpler and require zero arguments. Just because this is a very simple app.

Run whisper looks like this. This is just a note for me. Let's go. The first part is to run root.transcribe. And the transcribe function checks if there's a selected file. Because one of the fail states is you have not selected a file. You shouldn't transcribe nothing. So if there is a file that you've chosen. You go to 2A. Which is actually to run whisper. If there is no file selected. It will run 2B. Which opens the error popup. So the popup code is not crazy. Right here. Just create a popup and open it. By default. Any interaction of popup will dismiss it. So we don't need to worry about removing it. As for the whisper code.

You can see that I get the file name right here. From the selected text ID widget. And I remove this unnecessary string right here. In the beginning selected file. Next I get the transcript name from here. Which is the text input ID widget. And then I just add the file extension.txt. The next step is to set up the transcript location. This requires me to get a folder. That holds the file name variable. And then add the transcript name as the file name. Right.

So. What does this mean? TLDR. This is our file name right. This whole file name. This is the transcript name. This is the transcript name. Right. Which is this and this. So it's a very big name. Transcript.txt. The question is. Where do you write the file? Right. Because by default. Python will always write to current working directory and current working directory is not always this file location that you have written. In my mind, the easiest way is just to put the transcript text not in the current working directory, which may be different from your selected file, but in the directory where your file is.

So let's say you have an audio file in the 2024 folder. Right. If you just put the transcript in the current working directory. It's not the 2024 folder. So I want to put the transcript text in the 2024 folder along with the audio file. So this is how the line transcript location works. So first. We got pathlib.path file name. Right. You have the full file name. Of your audio file.

Now what if you say parent. Right. You want parent. So parent gets the directory that audio file lives in. Right. Now we have the first half. And the second half is we need to create the text file name. So we go here. Transcript name. Is this text right here. Plus the text extension. So what is this? Self IDs. Text input dot text. Right. But this text input doesn't always have the dot txt extension. Right. So. Text input dot text. Right.

And then. You have a beginning which is going to be the folder that holds your media file. And then in the end. You have the transcript text file and then you just do pathlib dot path. File name dot parent. Slash transcript name. And that combines the two together. And then this becomes the full file location of my whisper transcript text for any file. And for any file, that you have. Right. And then. You have the text file. And then you just do pathlib dot path for any file that exists that's a media file, that is transcribable. It will put the text file in the same folder so you don't have to go look around for where the current working directory of Python is when you run it.

If you have a media file that exists, you're going to put the transcript in the same folder. That's all this is. Okay. And then this was the whisper code discussed in part one. The only difference is setting the module to medium instead of base. There's also code to write the text transcript as a base. And then you can also set the text transcript as a file as specified by the transcript location on the last slide.

So again. I chose W because I wanted to create a new file if it doesn't exist. And I wanted to replace the old text if it existed. Then with the file as file. F dot write. Result dot text. And with keyword. Very specifically, it will automatically close the file. That's all you need to know. When you use with. And with file. Opening like this. You will just close the file. You don't need to call file.close().

And then now. There's another thing. Which is what does the try accept block do? If for some reason Whisper fails, it will return an error. In case the error occurs, I want to print the standard out. What the error is right here. And then.

I want to also get the trace back code. So here it is. Import trace back. Print string. Trace back dot format. Exec. Right. Because. The problem is when you just print the error. It will only print the last line. If you want the entire error. You got to import trace back and then print the entire error.

After I print out the error. I want the pop up to open to provide feedback to the user. So I just open the pop up.

So as a recap from the beginning. We have made a virtual env with poetry. We've installed whisper with poetry. We have applied whisper to test video. We've also finished making the KV. You selected a file. We've used the file chooser. We've output the text file in the same current directory. We've done it. We've made a text file markdown compliant. It's not necessary. And again. We've also connected whisper AI and KV. Right. We know how to change the file choosers drive. We change file chooser dot path. We make a drop down to change the file chooser drive. And in addition to the old to do.

We've made the KV GUI. We've used the file chooser. And we also output the text file. So here's a feature list.

If I were to continue this project, probably in one month, there's one more problem. Which is running whisper AI means running blocking code. It's a very easy fix with multiprocessing. And it's all you have to do is spawn a sub process and put the whisper code there freeing Kivy to run without locking up. So Kivy is like a for loop. But if you block the for loop. The KV GUI itself will freeze. So in order to have KV not freeze. You just use multiprocessing.

Another thing I like, now that I'm using this myself, is I want a search feature because sometimes I don't want to look through the file directory. Sometimes I want to search for the audio file. So the hint is just to use rglob. And rglob pattern for directory search. Basically this is how you use python to find files in your file system recursively. It says python rglob pattern for directory search. So something if you want to do something similar. You can look at it.

And then thank you for watching. The next part is going to be, now that you have this working KV GUI for whisper, we're going to create an executable with PyInstaller so you can send it to your friends. Thank you for watching and have a great day.

Article Error Reporting¶

Message @BadMetrics on the Kivy Discord.