Chapter 13: Integrating OpenAI API - Enhancing Your Application with AI | Lovable.dev Course

Description

Let's implement the AI magic! In this tutorial, I demonstrate how to integrate OpenAI's Whisper for transcription and GPT-4 for metadata generation in our ClipCaption application using Lovable.dev.
🧠 What we'll implement:

Setting up secure OpenAI API connections
Using Whisper API for audio transcription
Implementing GPT-4 for metadata generation
Creating SRT subtitle files
Generating platform-specific content descriptions

⚡ This AI integration is what makes our application truly valuable for content creators, automating hours of manual work.
#OpenAI #Whisper #GPT4 #AIIntegration #SaaS #Lovable #AIApplications

Summary

Integrating OpenAI API to Enhance Applications with AI: A Step-by-Step Guide

In this comprehensive tutorial, you'll learn how to integrate OpenAI's powerful AI capabilities into your applications, specifically focusing on the ClipCaption tool built with Lovable.dev. The instructor walks through the complete process of setting up secure connections with OpenAI's API and implementing two key AI features: Whisper for audio transcription and GPT-4 for intelligent metadata generation.

The video begins with obtaining and configuring an OpenAI API key, explaining the importance of keeping these credentials secure. You'll then see how to set up edge functions that act as middlemen between your application and OpenAI's services. The tutorial addresses common implementation challenges, including file size limitations for the Whisper API (25MB maximum) and demonstrates how to convert video files to compatible audio formats using FFMPEG.

A significant portion of the video focuses on creating an automated workflow that: uploads videos to temporary storage, generates thumbnails, converts videos to optimized audio files, sends these files to Whisper for transcription, and then leverages GPT-4 to generate platform-specific content descriptions and metadata. The instructor also covers language detection capabilities to ensure the AI generates appropriate content based on the video's spoken language.

Throughout the tutorial, you'll learn practical prompt engineering techniques to get the best results from AI models and troubleshooting strategies for when implementations don't work as expected. The video provides valuable insights into building robust AI integrations that can save content creators hours of manual work by automatically generating transcriptions, SRT subtitle files with proper timing, and customized descriptions for different social media platforms.

This tutorial is ideal for developers looking to enhance their applications with AI capabilities and demonstrates how combining video processing with OpenAI's language models can create powerful tools for content creators and marketers.

Transcript

0:00 [Music] 0:02 to get you open AI API H let's first 0:05 understand again what is API API is just 0:08 a way for you to talk with other 0:10 services in the cloud so in our case we 0:13 want to set a audio file to open AI it's 0:17 the company that own for example C GPT 0:20 they have a model called whisper which 0:22 can take our audio file and convert it 0:25 to text okay so we want a way to talk 0:28 with them to do that you need an API key 0:32 what is API key so let's write here open 0:35 AI API and what I'm doing now it's a lot 0:38 of companies have their own API okay 0:41 it's not only open AI you have it with I 0:44 by Elon Musk you have it with Facebook 0:48 with or Lama okay we have a lot of those 0:51 so let's just write open AI API and get 0:55 into the first website log in with your 0:58 user okay I have already one 1:00 and now you need to go to your project 1:02 you can do a new one if you need manage 1:06 projects go to API keys and create a new 1:10 key now you need to understand a API key 1:12 is something secret so I can't show you 1:14 this process but basically if right now 1:18 I will create a new secret key and I 1:19 will give it the name 1:23 clip 1:26 caption and I can I guess I can choose 1:30 my project and I will create the secret 1:31 key I will get a new key okay so I'm 1:33 going to pause the recording okay guys 1:36 so I created my new secret key I already 1:38 copied it so let's get back to lavable 1:40 and lavable just telling you hey just 1:43 add the API key and it's going to 1:45 connect it to this Edge function I 1:47 remind you edge function is actually a 1:49 way it's like the middleman okay it's 1:51 talking with your uh platform and with 1:54 the open AI API and just telling him 1:57 telling that what it should do okay it's 2:00 instruct it so let's add the API key I'm 2:03 just going to pass it 2:06 submit and I will and let's see what 2:09 will happen Okay so let's see what 2:12 happened basically it will look like 2:13 nothing happened okay H we had also 2:16 small error so we fixed it but basically 2:19 now if I will upload a video still 2:20 nothing will really happen Okay so let's 2:23 just upload a 2:25 video the the video failed let's see the 2:28 logs 2:30 bucket not found now the reason for that 2:32 is because it tried to probably upload 2:34 it to a bucket that doesn't exist or a 2:37 bucket that not not anymore assigned 2:39 okay it's supposed to be in the temp 2:41 files so let's try to fix it as I 2:45 thought is asking to change some uh 2:47 stuff and direct it to Temp files let's 2:50 apply the changes okay let's see now 2:53 what's going on when we uploading a 2:54 video 3:01 video uploaded but processing failed to 3:03 start our team has been notified Okay 3:06 the reason for that it's because we 3:08 don't really tell the API to do anything 3:11 yet so let's now H write instructions 3:14 for our API and I think at this point I 3:17 can go back to my uh 3:20 documentation okay integrate GPT 4 for 3:22 platform specific metad data okay let's 3:24 copy both of them 3:35 we need to make 3:37 sure that after we upload a video and 3:41 it's 3:43 converted to audio file we send it to 3:48 open AI 3:51 API and we do the 3:54 following then we'll go down make sure 3:58 it makes 4:00 with our existing super 4:05 base uh tables and columns okay great 4:10 let's go to chat 4:11 mode that's basically the most tricky 4:14 part of our platform okay this is the 4:16 moment we're going to take the video 4:19 file convert it to audio and send it to 4:22 the AI to actually doing its 4:24 magic uh you can notice I still didn't 4:27 even uh mention that we need H to get 4:29 get the result and populate our fields 4:31 in super base so that's the that's the 4:35 next thing I'm going to do after it will 4:36 get me is answer 4:39 okay okay great he made a plan uh let's 4:42 tell him 4:43 that great make a plan and make sure 4:49 that the results we will get from the AI 4:55 will 4:56 populate and inserted 5:00 to the 5:01 relevant table and columns we have in 5:06 our super 5:08 base 5:10 okay okay as you can see he actually 5:13 making a really big uh plan I just 5:16 realized we need to make sure that the 5:19 data is going to generate will be based 5:21 on the language that is spoken in the 5:23 video in the video so let's make sure it 5:26 happened make sure that 5:30 translations and the 5:33 generations will be based on automatic 5:37 detection of the 5:41 spoken language in the 5:45 video it should recognize it 5:50 automatically based on 5:53 the audio file 5:57 okay um 6:00 great and that's why I did it cuz like 6:03 yeah he planed to do it with whisper but 6:05 as you can see he's not taking ventage 6:07 of its language detection capabilities 6:10 so now we're making sure we're also 6:11 using that okay it's really 6:13 important okay let's apply the changes 6:16 that he 6:17 asked basically now he keep asking me to 6:20 apply a lot of changes for our super 6:22 base Edge functions and tables okay 6:26 that's fine that makes sense it's a big 6:28 change that we're doing now and it's 6:31 doing it part of a big plan okay he's 6:34 saying he finished I'm not sure at all 6:36 that he's finished 6:39 so I'm going to copy this 6:43 plan I'm going to pass it and in the 6:46 beginning I'm going to 6:48 say check if you 6:51 did everything we need in this 6:55 plan yeah so as I could I as as i f 7:00 a lot of stuff is missing so yeah 7:03 implement the plan uh I hate when you do 7:05 it but sometimes he do it they make like 7:08 the AI sometimes lazy I don't know maybe 7:10 they want to save money and stuff and 7:11 still show say that it's your fault but 7:15 yeah you need sometimes to double check 7:18 him okay great he's saying he did 7:20 everything but let's ask him again did 7:23 you made all the plan fully and again I 7:28 will pass them okay let's try to upload 7:31 a video and just see what happened I 7:33 suspect it still won't 7:36 work yeah as I thought failed let's see 7:39 the 7:40 logs okay something about the functions 7:43 that means it's something internally in 7:45 super base let's just try to fix it okay 7:48 I understand what happened a whisper API 7:51 can only handle up to 25 megab let's 7:55 make sure we upload the actual video to 8:00 our Temp 8:02 Storage then convert it interally the 8:07 smallest 8:09 supported audio file format then send it 8:13 to 8:14 whisper make a plan for it so basically 8:19 again let me review the problem and how 8:21 we try to fix it right now when I'm 8:23 uploading a video even if it uploads the 8:26 video a whisper API can't handle the 8:29 video okay it can only handle up to 25 8:31 megabyte I still need to solve the fact 8:34 that our platform doesn't convert the 8:36 video to audio file we tried to do it 8:38 through the browser it kind of didn't 8:40 work now I'm trying to make him find a 8:43 way cuz I don't really know what 8:45 possible okay so I got into chain of 8:47 Errors like four or five errors in those 8:50 cases I really suggest you either 8:52 revered versions or do what I'm doing 8:55 now I'm telling him something is messed 8:57 up you think we should remove all 8:58 changes in super BAS and in our code and 9:00 start again implementing the changes for 9:02 uploading a video I remind 9:06 you we 9:08 should have the video uploaded to a temp 9:15 bucket then convert it to audio file 9:20 also in a temp bucket using 9:26 ffmpeg then send this 9:29 temp audio 9:32 file through Edge function to 9:39 withp then we 9:42 should 9:46 generate 9:48 transcription of the text of the video 9:52 and SRT file with subtitles 9:58 timing then we 10:01 should send the result to GPT 10:07 40 for fine tuning of the SRT file to be 10:15 matched for subtitles of a 10:19 video and to make all the columns we 10:25 have in the meta 10:28 data Generations like 10:33 YouTube 10:35 title descriptions 10:38 Instagram captions and so on and then I 10:42 will do it in chat mode and I will send 10:44 it now what I did here is called prompt 10:47 engineering okay I described a problem I 10:50 offered Solutions and I gave examples CU 10:54 it can be very messy for him we're 10:56 trying to do a lot of stuff together as 10:58 you can see so I just described I wanted 11:00 to reset everything and just sayy this 11:03 is what I want to do what we should do 11:05 like help me help you 11:07 okay okay so now uh he's making a plan 11:11 for it and I will go along with him 11:14 right now when I upload a video it's 11:16 actually uploading the video and we can 11:19 see here in super base we have buckets 11:23 that he created as you can see we have 11:25 temporary buckets which is really cool 11:27 it means you will delete them after one 11:29 1 11:30 hour and if you will see like here in 11:32 the in the folders We could see we 11:35 actually have here a video 11:37 okay um but right now the problem is 11:40 that after I upload a video nothing is 11:41 happened I don't really know what 11:43 happened okay that's a problem that can 11:46 sometimes happen that uh he will upload 11:49 the video right now he will say to me 11:51 processing will soon begin but I have no 11:54 idea anymore what's going on in my 11:56 system I have tons of buckets so what I 11:58 will do in this case I will do screen 12:01 capture I will take this picture and I 12:04 will tell 12:05 them do we need all those 12:10 buckets I don't understand what's going 12:14 on after we 12:18 upload a video 12:22 Also let's make our video 12:27 uploading a simple as 12:30 possible no video 12:33 chunks just regular video 12:38 uploading then tell me if we even have 12:46 video to 12:48 audio converting 12:52 happening okay I just want them to tell 12:54 me what's going on the the the beautiful 12:56 thing about the chat mode that he 12:58 actually can look at your and tell you 12:59 what's going on okay as I thought he 13:02 have a lot of stuff that not working 13:05 well in the 13:06 system okay he's saying he did the stuff 13:09 that we want him to do uh would you like 13:12 me to update the process video Edge 13:16 function okay um before that let's fix 13:20 this error and then yeah we want them to 13:23 update we want them to update that cuz 13:25 before we do it we can't really send 13:28 anything to the AI okay I remind you 13:31 whisper yes you can upload video to 13:33 whisper by the way but the the video 13:35 need to be H smaller than 25 megabytes H 13:38 the reason I know that is because 13:40 lavable told me that 13:42 before okay saying he did it let's see 13:45 if you actually did it I will refresh 13:48 yeah perfect see now what we have here 13:52 it's probably only one video which is 13:55 great um let's try to upload a video 14:01 yeah okay great I have the error that 14:03 related to the AI API so that's good so 14:06 now we can past um this command update 14:11 the process video Edge function to the 14:13 actual video to audio okay because right 14:15 now we have videos but they don't 14:17 actually convert it to audio because I 14:18 can see my audio bucket is empty and 14:21 again he's doing the conversion with 14:22 this package called FF 14:25 MPG so I honestly I don't care about it 14:28 I just want this thing to happen so 14:31 let's send 14:33 it okay 14:35 and really hope now it will work 14:38 so I will upload a 14:44 video so it failed let's 14:49 see 14:51 mhm I think he failed to make the 14:53 thumbnail 14:57 probably let's see what's going going on 14:59 here I actually will delete it I think 15:02 it's 15:13 confusing okay 15:18 um okay look like we don't even add uh 15:22 the fmpg conversion so yeah let's do it 15:26 what you just 15:27 said I really hope it will work oh my 15:31 God guys look look what he's saying to 15:32 me here ready to begin with installing 15:35 dependencies it means he never installed 15:39 the FF MPG packages it means we could 15:43 never make it 15:44 work okay guys so we installed 15:46 everything you asked for let's upload a 15:48 video and see what happened so I 15:51 actually upload a long video with 15:53 speaking okay we have here a new 15:55 progress bar saying init initializing 15:58 FFM 15:59 PG uh I wouldn't want users to see it 16:02 but it's nice to see it right now but I 16:04 do can see already it's stuck which is a 16:07 bad thing let's wa let's see what 16:09 actually happened 16:10 here yeah I can see the video is not 16:13 actually able to upload let's refresh 16:17 and make sure 16:18 that I will take a picture I will pass 16:22 it here and it will say currently when 16:26 I'm trying to upload a video 16:30 it's stuck 16:32 on this 16:34 state what we want to happen 16:39 is 16:40 one 16:42 generate video tum nail 16:47 to 16:50 convert okay wait 16:52 actually this is 16:54 two one should 16:56 be upload video 17:00 directly to super base generate video 17:04 thumbnail we convert video 17:08 to uh low 17:11 size audio 17:13 format 17:15 for send the audio file to whisper API 17:21 AI 17:24 five uh send the 17:27 transcription to chck GPT for all for 17:33 processing 17:35 according to our 17:39 metadata six present the user in the 17:44 uxui the results in organized 17:50 way with the video 17:53 thumbnail okay so let's send it to the 17:56 chat mode and see what will happen Okay 17:58 and he make for me a plan I will say yes 18:02 okay guys so let me show you where we 18:04 are now okay so he did a lot of changes 18:07 I also have a lot of Errors my point is 18:11 because you guys will make different app 18:13 my point is hey don't don't give up you 18:16 will find a way okay just tackle around 18:19 stuff try to revert versions don't try 18:23 to make more features if something 18:24 doesn't work really insist on fixing 18:27 everything let me show you where we now 18:30 and uh how I know what I need to do next 18:33 so I need now to open the preview cuz if 18:36 I will try to upload a video here you 18:38 will see it will get stuck so I 18:41 discovered I just need to go to preview 18:43 and now if I will upload a video you 18:46 will see it's processing the 18:48 video it's uploading the 18:51 video it made the thumbnail of the video 18:54 and you can see it kept the aspect ratio 18:57 and here we still not using AI we just H 19:01 simulating the generation of all the 19:04 metadata and on top of that if I will go 19:06 now to the history page I will see the 19:09 video and if I will open it should show 19:12 the metadata but right now we don't have 19:13 anything okay and you can see we have 19:16 various other videos let's look what 19:19 going on in superbase okay so as you can 19:21 see again in lava ball it's stuck 19:23 because here in we are like in our uh ER 19:26 programming environment and it's it just 19:30 can't do it and the reason for that is 19:32 because the conversion that is doing 19:34 with the thumbnail is something is doing 19:36 through the browser itself okay it's 19:39 something the browser itself doing and 19:41 here the browser is not taking into 19:43 action because we're using like some 19:45 kind of inside browser or flowable okay 19:48 so that's one thing now I want to see 19:51 what happening in a super base so every 19:54 time I upload a video now we create a 19:58 thumbnail image okay as you can 20:01 see we create a video upload as you can 20:06 see now every thumbnail image or video 20:10 file in those buckets if you will do 20:12 right click you will get a URL okay they 20:14 have URL address for them like for 20:17 example if I will open here a new tab 20:19 and I will pass it you will see I have 20:20 this image so if now will go to my table 20:23 editor you could see that he actually 20:26 making videos for each video you upload 20:30 and here he passing the thumbnail URL of 20:33 that video okay so this is how we can 20:35 actually see in the history the 20:37 appropriate thumbnails that we used so 20:41 that was like a summary of what's going 20:43 on now now let's proceed um working on 20:46 our software