Chapter 13: Integrating OpenAI API - Enhancing Your Application with AI | Lovable.dev Course
Description
Let's implement the AI magic! In this tutorial, I demonstrate how to integrate OpenAI's Whisper for transcription and GPT-4 for metadata generation in our ClipCaption application using Lovable.dev.
🧠 What we'll implement:
Setting up secure OpenAI API connections
Using Whisper API for audio transcription
Implementing GPT-4 for metadata generation
Creating SRT subtitle files
Generating platform-specific content descriptions
⚡ This AI integration is what makes our application truly valuable for content creators, automating hours of manual work.
#OpenAI #Whisper #GPT4 #AIIntegration #SaaS #Lovable #AIApplications
🧠 What we'll implement:
Setting up secure OpenAI API connections
Using Whisper API for audio transcription
Implementing GPT-4 for metadata generation
Creating SRT subtitle files
Generating platform-specific content descriptions
⚡ This AI integration is what makes our application truly valuable for content creators, automating hours of manual work.
#OpenAI #Whisper #GPT4 #AIIntegration #SaaS #Lovable #AIApplications
Summary
Integrating OpenAI API to Enhance Applications with AI: A Step-by-Step Guide
In this comprehensive tutorial, you'll learn how to integrate OpenAI's powerful AI capabilities into your applications, specifically focusing on the ClipCaption tool built with Lovable.dev. The instructor walks through the complete process of setting up secure connections with OpenAI's API and implementing two key AI features: Whisper for audio transcription and GPT-4 for intelligent metadata generation.
The video begins with obtaining and configuring an OpenAI API key, explaining the importance of keeping these credentials secure. You'll then see how to set up edge functions that act as middlemen between your application and OpenAI's services. The tutorial addresses common implementation challenges, including file size limitations for the Whisper API (25MB maximum) and demonstrates how to convert video files to compatible audio formats using FFMPEG.
A significant portion of the video focuses on creating an automated workflow that: uploads videos to temporary storage, generates thumbnails, converts videos to optimized audio files, sends these files to Whisper for transcription, and then leverages GPT-4 to generate platform-specific content descriptions and metadata. The instructor also covers language detection capabilities to ensure the AI generates appropriate content based on the video's spoken language.
Throughout the tutorial, you'll learn practical prompt engineering techniques to get the best results from AI models and troubleshooting strategies for when implementations don't work as expected. The video provides valuable insights into building robust AI integrations that can save content creators hours of manual work by automatically generating transcriptions, SRT subtitle files with proper timing, and customized descriptions for different social media platforms.
This tutorial is ideal for developers looking to enhance their applications with AI capabilities and demonstrates how combining video processing with OpenAI's language models can create powerful tools for content creators and marketers.
Transcript
0:00
[Music]
0:02
to get you open AI API H let's first
0:05
understand again what is API API is just
0:08
a way for you to talk with other
0:10
services in the cloud so in our case we
0:13
want to set a audio file to open AI it's
0:17
the company that own for example C GPT
0:20
they have a model called whisper which
0:22
can take our audio file and convert it
0:25
to text okay so we want a way to talk
0:28
with them to do that you need an API key
0:32
what is API key so let's write here open
0:35
AI API and what I'm doing now it's a lot
0:38
of companies have their own API okay
0:41
it's not only open AI you have it with I
0:44
by Elon Musk you have it with Facebook
0:48
with or Lama okay we have a lot of those
0:51
so let's just write open AI API and get
0:55
into the first website log in with your
0:58
user okay I have already one
1:00
and now you need to go to your project
1:02
you can do a new one if you need manage
1:06
projects go to API keys and create a new
1:10
key now you need to understand a API key
1:12
is something secret so I can't show you
1:14
this process but basically if right now
1:18
I will create a new secret key and I
1:19
will give it the name
1:23
clip
1:26
caption and I can I guess I can choose
1:30
my project and I will create the secret
1:31
key I will get a new key okay so I'm
1:33
going to pause the recording okay guys
1:36
so I created my new secret key I already
1:38
copied it so let's get back to lavable
1:40
and lavable just telling you hey just
1:43
add the API key and it's going to
1:45
connect it to this Edge function I
1:47
remind you edge function is actually a
1:49
way it's like the middleman okay it's
1:51
talking with your uh platform and with
1:54
the open AI API and just telling him
1:57
telling that what it should do okay it's
2:00
instruct it so let's add the API key I'm
2:03
just going to pass it
2:06
submit and I will and let's see what
2:09
will happen Okay so let's see what
2:12
happened basically it will look like
2:13
nothing happened okay H we had also
2:16
small error so we fixed it but basically
2:19
now if I will upload a video still
2:20
nothing will really happen Okay so let's
2:23
just upload a
2:25
video the the video failed let's see the
2:28
logs
2:30
bucket not found now the reason for that
2:32
is because it tried to probably upload
2:34
it to a bucket that doesn't exist or a
2:37
bucket that not not anymore assigned
2:39
okay it's supposed to be in the temp
2:41
files so let's try to fix it as I
2:45
thought is asking to change some uh
2:47
stuff and direct it to Temp files let's
2:50
apply the changes okay let's see now
2:53
what's going on when we uploading a
2:54
video
3:01
video uploaded but processing failed to
3:03
start our team has been notified Okay
3:06
the reason for that it's because we
3:08
don't really tell the API to do anything
3:11
yet so let's now H write instructions
3:14
for our API and I think at this point I
3:17
can go back to my uh
3:20
documentation okay integrate GPT 4 for
3:22
platform specific metad data okay let's
3:24
copy both of them
3:35
we need to make
3:37
sure that after we upload a video and
3:41
it's
3:43
converted to audio file we send it to
3:48
open AI
3:51
API and we do the
3:54
following then we'll go down make sure
3:58
it makes
4:00
with our existing super
4:05
base uh tables and columns okay great
4:10
let's go to chat
4:11
mode that's basically the most tricky
4:14
part of our platform okay this is the
4:16
moment we're going to take the video
4:19
file convert it to audio and send it to
4:22
the AI to actually doing its
4:24
magic uh you can notice I still didn't
4:27
even uh mention that we need H to get
4:29
get the result and populate our fields
4:31
in super base so that's the that's the
4:35
next thing I'm going to do after it will
4:36
get me is answer
4:39
okay okay great he made a plan uh let's
4:42
tell him
4:43
that great make a plan and make sure
4:49
that the results we will get from the AI
4:55
will
4:56
populate and inserted
5:00
to the
5:01
relevant table and columns we have in
5:06
our super
5:08
base
5:10
okay okay as you can see he actually
5:13
making a really big uh plan I just
5:16
realized we need to make sure that the
5:19
data is going to generate will be based
5:21
on the language that is spoken in the
5:23
video in the video so let's make sure it
5:26
happened make sure that
5:30
translations and the
5:33
generations will be based on automatic
5:37
detection of the
5:41
spoken language in the
5:45
video it should recognize it
5:50
automatically based on
5:53
the audio file
5:57
okay um
6:00
great and that's why I did it cuz like
6:03
yeah he planed to do it with whisper but
6:05
as you can see he's not taking ventage
6:07
of its language detection capabilities
6:10
so now we're making sure we're also
6:11
using that okay it's really
6:13
important okay let's apply the changes
6:16
that he
6:17
asked basically now he keep asking me to
6:20
apply a lot of changes for our super
6:22
base Edge functions and tables okay
6:26
that's fine that makes sense it's a big
6:28
change that we're doing now and it's
6:31
doing it part of a big plan okay he's
6:34
saying he finished I'm not sure at all
6:36
that he's finished
6:39
so I'm going to copy this
6:43
plan I'm going to pass it and in the
6:46
beginning I'm going to
6:48
say check if you
6:51
did everything we need in this
6:55
plan yeah so as I could I as as i f
7:00
a lot of stuff is missing so yeah
7:03
implement the plan uh I hate when you do
7:05
it but sometimes he do it they make like
7:08
the AI sometimes lazy I don't know maybe
7:10
they want to save money and stuff and
7:11
still show say that it's your fault but
7:15
yeah you need sometimes to double check
7:18
him okay great he's saying he did
7:20
everything but let's ask him again did
7:23
you made all the plan fully and again I
7:28
will pass them okay let's try to upload
7:31
a video and just see what happened I
7:33
suspect it still won't
7:36
work yeah as I thought failed let's see
7:39
the
7:40
logs okay something about the functions
7:43
that means it's something internally in
7:45
super base let's just try to fix it okay
7:48
I understand what happened a whisper API
7:51
can only handle up to 25 megab let's
7:55
make sure we upload the actual video to
8:00
our Temp
8:02
Storage then convert it interally the
8:07
smallest
8:09
supported audio file format then send it
8:13
to
8:14
whisper make a plan for it so basically
8:19
again let me review the problem and how
8:21
we try to fix it right now when I'm
8:23
uploading a video even if it uploads the
8:26
video a whisper API can't handle the
8:29
video okay it can only handle up to 25
8:31
megabyte I still need to solve the fact
8:34
that our platform doesn't convert the
8:36
video to audio file we tried to do it
8:38
through the browser it kind of didn't
8:40
work now I'm trying to make him find a
8:43
way cuz I don't really know what
8:45
possible okay so I got into chain of
8:47
Errors like four or five errors in those
8:50
cases I really suggest you either
8:52
revered versions or do what I'm doing
8:55
now I'm telling him something is messed
8:57
up you think we should remove all
8:58
changes in super BAS and in our code and
9:00
start again implementing the changes for
9:02
uploading a video I remind
9:06
you we
9:08
should have the video uploaded to a temp
9:15
bucket then convert it to audio file
9:20
also in a temp bucket using
9:26
ffmpeg then send this
9:29
temp audio
9:32
file through Edge function to
9:39
withp then we
9:42
should
9:46
generate
9:48
transcription of the text of the video
9:52
and SRT file with subtitles
9:58
timing then we
10:01
should send the result to GPT
10:07
40 for fine tuning of the SRT file to be
10:15
matched for subtitles of a
10:19
video and to make all the columns we
10:25
have in the meta
10:28
data Generations like
10:33
YouTube
10:35
title descriptions
10:38
Instagram captions and so on and then I
10:42
will do it in chat mode and I will send
10:44
it now what I did here is called prompt
10:47
engineering okay I described a problem I
10:50
offered Solutions and I gave examples CU
10:54
it can be very messy for him we're
10:56
trying to do a lot of stuff together as
10:58
you can see so I just described I wanted
11:00
to reset everything and just sayy this
11:03
is what I want to do what we should do
11:05
like help me help you
11:07
okay okay so now uh he's making a plan
11:11
for it and I will go along with him
11:14
right now when I upload a video it's
11:16
actually uploading the video and we can
11:19
see here in super base we have buckets
11:23
that he created as you can see we have
11:25
temporary buckets which is really cool
11:27
it means you will delete them after one
11:29
1
11:30
hour and if you will see like here in
11:32
the in the folders We could see we
11:35
actually have here a video
11:37
okay um but right now the problem is
11:40
that after I upload a video nothing is
11:41
happened I don't really know what
11:43
happened okay that's a problem that can
11:46
sometimes happen that uh he will upload
11:49
the video right now he will say to me
11:51
processing will soon begin but I have no
11:54
idea anymore what's going on in my
11:56
system I have tons of buckets so what I
11:58
will do in this case I will do screen
12:01
capture I will take this picture and I
12:04
will tell
12:05
them do we need all those
12:10
buckets I don't understand what's going
12:14
on after we
12:18
upload a video
12:22
Also let's make our video
12:27
uploading a simple as
12:30
possible no video
12:33
chunks just regular video
12:38
uploading then tell me if we even have
12:46
video to
12:48
audio converting
12:52
happening okay I just want them to tell
12:54
me what's going on the the the beautiful
12:56
thing about the chat mode that he
12:58
actually can look at your and tell you
12:59
what's going on okay as I thought he
13:02
have a lot of stuff that not working
13:05
well in the
13:06
system okay he's saying he did the stuff
13:09
that we want him to do uh would you like
13:12
me to update the process video Edge
13:16
function okay um before that let's fix
13:20
this error and then yeah we want them to
13:23
update we want them to update that cuz
13:25
before we do it we can't really send
13:28
anything to the AI okay I remind you
13:31
whisper yes you can upload video to
13:33
whisper by the way but the the video
13:35
need to be H smaller than 25 megabytes H
13:38
the reason I know that is because
13:40
lavable told me that
13:42
before okay saying he did it let's see
13:45
if you actually did it I will refresh
13:48
yeah perfect see now what we have here
13:52
it's probably only one video which is
13:55
great um let's try to upload a video
14:01
yeah okay great I have the error that
14:03
related to the AI API so that's good so
14:06
now we can past um this command update
14:11
the process video Edge function to the
14:13
actual video to audio okay because right
14:15
now we have videos but they don't
14:17
actually convert it to audio because I
14:18
can see my audio bucket is empty and
14:21
again he's doing the conversion with
14:22
this package called FF
14:25
MPG so I honestly I don't care about it
14:28
I just want this thing to happen so
14:31
let's send
14:33
it okay
14:35
and really hope now it will work
14:38
so I will upload a
14:44
video so it failed let's
14:49
see
14:51
mhm I think he failed to make the
14:53
thumbnail
14:57
probably let's see what's going going on
14:59
here I actually will delete it I think
15:02
it's
15:13
confusing okay
15:18
um okay look like we don't even add uh
15:22
the fmpg conversion so yeah let's do it
15:26
what you just
15:27
said I really hope it will work oh my
15:31
God guys look look what he's saying to
15:32
me here ready to begin with installing
15:35
dependencies it means he never installed
15:39
the FF MPG packages it means we could
15:43
never make it
15:44
work okay guys so we installed
15:46
everything you asked for let's upload a
15:48
video and see what happened so I
15:51
actually upload a long video with
15:53
speaking okay we have here a new
15:55
progress bar saying init initializing
15:58
FFM
15:59
PG uh I wouldn't want users to see it
16:02
but it's nice to see it right now but I
16:04
do can see already it's stuck which is a
16:07
bad thing let's wa let's see what
16:09
actually happened
16:10
here yeah I can see the video is not
16:13
actually able to upload let's refresh
16:17
and make sure
16:18
that I will take a picture I will pass
16:22
it here and it will say currently when
16:26
I'm trying to upload a video
16:30
it's stuck
16:32
on this
16:34
state what we want to happen
16:39
is
16:40
one
16:42
generate video tum nail
16:47
to
16:50
convert okay wait
16:52
actually this is
16:54
two one should
16:56
be upload video
17:00
directly to super base generate video
17:04
thumbnail we convert video
17:08
to uh low
17:11
size audio
17:13
format
17:15
for send the audio file to whisper API
17:21
AI
17:24
five uh send the
17:27
transcription to chck GPT for all for
17:33
processing
17:35
according to our
17:39
metadata six present the user in the
17:44
uxui the results in organized
17:50
way with the video
17:53
thumbnail okay so let's send it to the
17:56
chat mode and see what will happen Okay
17:58
and he make for me a plan I will say yes
18:02
okay guys so let me show you where we
18:04
are now okay so he did a lot of changes
18:07
I also have a lot of Errors my point is
18:11
because you guys will make different app
18:13
my point is hey don't don't give up you
18:16
will find a way okay just tackle around
18:19
stuff try to revert versions don't try
18:23
to make more features if something
18:24
doesn't work really insist on fixing
18:27
everything let me show you where we now
18:30
and uh how I know what I need to do next
18:33
so I need now to open the preview cuz if
18:36
I will try to upload a video here you
18:38
will see it will get stuck so I
18:41
discovered I just need to go to preview
18:43
and now if I will upload a video you
18:46
will see it's processing the
18:48
video it's uploading the
18:51
video it made the thumbnail of the video
18:54
and you can see it kept the aspect ratio
18:57
and here we still not using AI we just H
19:01
simulating the generation of all the
19:04
metadata and on top of that if I will go
19:06
now to the history page I will see the
19:09
video and if I will open it should show
19:12
the metadata but right now we don't have
19:13
anything okay and you can see we have
19:16
various other videos let's look what
19:19
going on in superbase okay so as you can
19:21
see again in lava ball it's stuck
19:23
because here in we are like in our uh ER
19:26
programming environment and it's it just
19:30
can't do it and the reason for that is
19:32
because the conversion that is doing
19:34
with the thumbnail is something is doing
19:36
through the browser itself okay it's
19:39
something the browser itself doing and
19:41
here the browser is not taking into
19:43
action because we're using like some
19:45
kind of inside browser or flowable okay
19:48
so that's one thing now I want to see
19:51
what happening in a super base so every
19:54
time I upload a video now we create a
19:58
thumbnail image okay as you can
20:01
see we create a video upload as you can
20:06
see now every thumbnail image or video
20:10
file in those buckets if you will do
20:12
right click you will get a URL okay they
20:14
have URL address for them like for
20:17
example if I will open here a new tab
20:19
and I will pass it you will see I have
20:20
this image so if now will go to my table
20:23
editor you could see that he actually
20:26
making videos for each video you upload
20:30
and here he passing the thumbnail URL of
20:33
that video okay so this is how we can
20:35
actually see in the history the
20:37
appropriate thumbnails that we used so
20:41
that was like a summary of what's going
20:43
on now now let's proceed um working on
20:46
our software