Can you really create AI voice agents just by talking? (Lovable tutorial)

Description

📌 Try Lovable: https://lovable.dev/?via=lena

🔔 Follow me on LinkedIn for more tips on GenAI & Conversational AI: https://www.linkedin.com/in/lena-shakurova/
🤝 Need help with your AI Assistant? Schedule a consultation: https://calendly.com/lena-shakurova/consultation
☕ Support the work I do: https://buymeacoffee.com/lenashakurova

In this video, we are going to test the limits of Lovable and try to build a advanced voice agent. Vibecoding chatbots and voice agents, is it already possible? Watch the full video to find out.

Timestamps
00:00 Intro
00:37 Initial setup
01:36 Connect to GitHub
01:56 Test chatbot
03:11 Connect to ElevenLabs
05:14 Update the prompt
06:18 Connect Lovable to Supabase
06:43 Store conversation logs
08:00 Change voice
09:13 Set up RAG pipeline
16:06 Outro

📌 Subscribe to weekly newsletter with tips from working on 89+ Conversational AI projects since 2018 https://lenashakurova.substack.com/

Summary

Lovable AI Voice Agent Tutorial: Building Voice Assistants Through Conversation

In this hands-on tutorial, Lena explores whether it's possible to create sophisticated AI voice agents simply by talking to Lovable, a conversational AI development platform. The video demonstrates the process of building a voice-enabled chatbot without writing code, testing the limits of what's possible with current no-code AI tools.

Lena begins by creating a basic text chatbot using simple natural language instructions to Lovable, which automatically generates both frontend and backend code. She then transforms this into a voice agent by integrating speech recognition and text-to-speech capabilities through ElevenLabs. The tutorial shows how to customize the user interface, making it more streamlined with just a microphone button for interaction.

The video covers several advanced features including connecting to GitHub for code storage, implementing conversation history logging with Supabase database integration, and customizing the voice responses. Lena also attempts to build a RAG (Retrieval-Augmented Generation) pipeline to enable the voice agent to answer questions about her company, Parlabs, by crawling website content and storing it in a vector database.

Throughout the demonstration, viewers can see both the successes and limitations of using conversational AI tools for development. While basic functionalities like voice input/output and conversation logging work well, more complex features like implementing RAG prove challenging without coding knowledge. Lena provides honest feedback about when the tool excels and where it falls short, noting that programming knowledge is still valuable for debugging and implementing more sophisticated features.

The video serves as a practical exploration of AI-assisted development tools, offering insights into the current state of no-code voice agent creation. It's particularly valuable for developers, conversational AI enthusiasts, and those interested in the evolving landscape of AI development tools. The tutorial demonstrates that while platforms like Lovable can create functional prototypes quickly, building truly advanced voice agents still requires technical expertise and understanding of conversation design principles.

Transcript

0:00 Today we are going to try something 0:01 different. We're going to be using 0:03 Lovable, which is an AI powered platform 0:05 that allows you to build any kind of web 0:08 apps and software by simply talking in 0:11 plain English and describing what you 0:13 want. And I want to see how far we can 0:15 bring it and whether we can use Lable to 0:17 build voice agents. And to be precise, I 0:20 wonder if we can create a voice agent 0:23 that tells story about my company called 0:25 Parlabs in a natural and engaging way. 0:28 Let's find out. If you've never tried 0:30 Lovable before, this is the Lovable main 0:34 page. And once you land on it, you're 0:36 supposed to in a very short way explain 0:38 what is it that you want to build. Let's 0:40 start with a chatbot first, a text 0:43 chatbot, and then try to make it into a 0:45 voice bot. Let's say something like, 0:47 "Please create both backend and front 0:49 end for a chatbot that is powered by 0:53 LLMs and can answer user questions using 0:56 OpenAI LLMs. 1:01 Let's see what it can 1:04 build. Mhm. Okay, it is making a plan of 1:09 what it needs to build. While it's 1:12 making a plan, let's make sure it's 1:14 connected to 1:15 GitHub. Yes, that's correct. So now all 1:19 the code that lovable writes will be 1:22 stored in our GitHub so that later if we 1:25 want to work with it ourselves we can 1:27 open VS code or cursor and go and edit 1:30 it ourselves after the MVP is done. All 1:33 right so we are supposedly gotten 1:35 something and GitHub is connected as 1:38 well. Let's try to refresh the page and 1:42 build unsuccessful. Okay, we click try 1:45 to fix that and see if that helps. All 1:47 right. And now the code should be fixed. 1:50 Let's try to refresh 1:53 it. This is where it wants me to paste 1:56 my 1:57 code. Okay, it saved it. Let's try 2:02 again. Hey, how are 2:05 you? Okay, so the chatbot is working 2:09 now. Let's see if we can turn this 2:11 chatbot into a voice bot. Now turn this 2:15 chatbot into a voice bot. There's going 2:17 to be a microphone. I'm going to click 2:18 on it and start talking. You're going to 2:21 transcribe it using whisper and respond 2:24 back to me using 2:27 llms. Okay. So, it should be done now. 2:30 Now, we have this microphone button. 2:32 Let's try. Hey, how are you doing? 2:37 Hello. I'm just a program, so I don't 2:40 have feelings, but I'm here and ready to 2:42 help you. How can I assist you today? 2:45 Okay, we even got the voice even though 2:47 I was not asking for the voice. Can you 2:50 connect to 11 2:53 labs? Okay, let's see if it can connect 2:56 to 11 labs. Probably it's going to ask 2:58 me for my 11 labs at geeky. So, I'm 3:02 going to go and search for that 3:05 one. 3:08 11s 3:10 logging. Okay. Where would I have the IP 3:14 keys? Here. Create an IP key level 3:20 to 3:22 Okay. Uh, it said that I can uh specify 3:27 my IP key here from Lovable and now it 3:31 should be able to work. Hey, what's the 3:33 weather like today? 3:36 I'm sorry, but I can't provide realtime 3:39 weather updates. You might want to check 3:41 a weather app or website for the latest 3:43 information. Okay, that is better. Now, 3:46 I don't like the interface, so I want to 3:49 simplify it and 3:51 say, "Don't show me the logs. Make a 3:55 very s simple interface instead where 3:58 you have just a mic in the middle of the 4:00 screen and I can click on it, talk to 4:02 it, and once I finish speaking, then I 4:05 get a response generated using LLMs. 4:10 Okay, now it got very simple. That's 4:13 exactly what I asked for. Let's see. 4:16 Hey, what are you up to today? Hello, 4:19 I'm here to help answer your questions 4:21 and provide information. How can I 4:23 assist you today? What can you 4:26 do? It's going to generate it. It's just 4:29 slow because it's slow. I can assist 4:30 with a wide range of tasks such as one 4:34 providing information and answering 4:35 questions on various topics. Please stop 4:38 offering explanations and 4:41 you're welcome. If there's anything else 4:43 you need, feel free to ask. Okay, we can 4:45 see that it's not very conversational. 4:47 So, let's try and see if we can update 4:49 the prompt. Can you show me the current 4:52 prompt that you use to generate uh the 4:56 answer? Okay, so here's the prompt. 4:59 You're helpful, friendly assistant. 5:01 Please provide clear and concise 5:02 response. Please update the prompt and 5:05 say that you are helpful and friendly 5:07 assistant and you are leading a 5:10 conversation. It's a voice conversation. 5:12 So your text needs to be simple and 5:15 conversational. The sentence structure 5:17 needs to be simple and your responses 5:20 need to be rather short, maximum three 5:23 sentences. Okay, let's try to see what 5:26 happens if it updates the prompt. And 5:28 then I ask the exact same question, 5:30 which was, "How can you help?" 5:32 Okay, so now it updated. Let's try 5:35 again. Hey, how can you help? 5:39 Hi, I can help answer questions, provide 5:42 information, or assist with tasks like 5:44 reminders or finding things online. Just 5:47 let me know what you need. Okay, that 5:49 was way more conversational. That's 5:51 good. The next step would be to store 5:54 the conversation history in Superbase. 5:57 Let's try that. Can you create a new 5:59 superb base table called conversation 6:01 log and store all our conversation 6:05 history 6:06 there? 6:08 Okay, so it said that I need to first 6:11 connect my project to superbase. Let's 6:14 see how to do that. I'll click connect 6:16 because I already created my superbase 6:18 project and I click connect and now it 6:23 is connected. Okay, perfect. Please 6:25 create a new superbase table. There we 6:30 will store all the conversation logs 6:32 that happen between our voice assistant 6:33 and the user. This is going to be 6:35 important for us because later we can 6:38 analyze those logs at scale and we can 6:40 even create smart dashboards using lava 6:45 bomb. So let's see it 6:48 generated code to create a new table. I 6:51 just need to click apply changes and it 6:55 will now create the superbase database 6:58 and connect our front end and back end 7:00 to it so that we can store the logs. 7:02 Let's see. Mhm. Okay, let's let's try 7:05 again. Hi, what are you up to today? 7:10 Hi, I'm here to help you with any 7:12 questions or tasks you have. What about 7:14 you? Okay, that's perfect. Let's now try 7:17 to go to 7:19 superbase and try to see which tables do 7:21 we have. So we have one called 7:23 conversation logs. I think that's the 7:25 recent one. And this is the structure of 7:29 that table. And if we click here then we 7:33 can see the logs. Hi, what are you up to 7:35 today? Hi, I'm here to help you with any 7:37 questions. So that was exactly what we 7:39 now asked. So now we are storing 7:42 conversation logs and then later we can 7:44 analyze them. That is perfect. 7:47 Now, let's go back to our project and 7:49 try another thing. Please switch my 7:53 current voice to 7:56 Liam. Okay. Did the voice change? Hey, 8:00 tell me more about Edinburgh. 8:05 Edinburgh is the capital of Scotland. 8:07 Okay. Known for its historic and 8:09 cultural attractions. It features the 8:11 famous Edinburgh Castle. Okay. I got to 8:14 thank you. Edinburgh festival. You're 8:15 welcome. If you have more questions, 8:17 feel free to ask. Okay. To be honest, I 8:19 liked Sara more, but I just wanted to 8:22 see if we can also easily change the 8:24 voices. The next thing we need to do is 8:26 to create rag pipeline because if you 8:30 remember, we initially wanted to create 8:32 a storytelling voice assistant that will 8:34 tell about my company. So, what I'm 8:37 going to do, I'm going to say create a 8:40 new superbase table where you're going 8:42 to store information about my company. 8:44 My company website is 8:48 httpsparslabs.org. You you need to crawl 8:51 information from my website and store it 8:53 there and then create a rack pipeline so 8:57 that when I am asking questions that you 9:02 answer using LLM based on my data and 9:04 based on information I have about my 9:06 company called 9:09 Parlabs. Okay, this sounds a little bit 9:12 complex. So let's see if it can actually 9:14 crawl information from the website, 9:16 store it in superbase and then set up a 9:19 vector database as well. Okay, so for 9:23 the super basease, it now suggests to 9:26 create a new 9:29 table. Okay, let's do that. 9:33 Mhm. And now we got an issue. So it's 9:37 trying to fix that. 9:40 Okay, so the SQL migration was 9:42 successful. That's good. Now it's trying 9:44 to implement the rag 9:47 pipeline. I do wonder meanwhile which 9:50 information it is storing in a superb 9:52 basease. So let's go 9:53 back here and it has something called 9:57 content embeddings which for now is 10:00 empty. So it only created the structure. 10:02 Okay. Okay. So it said that it 10:05 implemented everything. 10:08 Let's try if it works. Okay, let's um 10:11 give it some 10:13 feedback. Okay, so the interface changed 10:18 a 10:19 lot. Maybe I was not very clear about 10:23 what is it that I wanted. But let's ask 10:25 about pars labs. What is pars 10:28 labs? I guess it's no longer using 10:31 voice. 10:33 I don't have enough information about 10:34 pars labs from the provided context. 10:36 Could you please provide more details or 10:39 clarify your question? This did not work 10:42 as I would have wished. Okay, let's uh 10:46 give it a simpler task. Let's not ask it 10:49 to crawl the website. Let's just copy 10:52 the website information and then say 10:55 store this information in a new 10:59 superbase table called data. 11:05 Yeah, I'm really not liking what it's 11:07 doing with the UI. It's been going so 11:09 well. And now we got something really 11:12 complicated. And I think that's it with 11:14 those kind of tools. Yes, they are 11:17 already quite powerful if you know how 11:18 to use them. And simple things they can 11:21 also do rather well. However, it can 11:25 also really quickly go wrong, especially 11:28 if you are not very clear in what you 11:31 want and if you don't formulate things 11:33 very well, then it will just make things 11:36 up that you didn't ask for. Okay, now it 11:39 is storing information in superbase. 11:43 That's good. And what we're going to try 11:44 to do afterwards is ask to train a 11:48 vector database based on this data. 11:52 Let's see. Let me know in the comments 11:54 below. Do you think it would have been 11:56 faster to just code it instead of 11:59 talking in English with Lovable to make 12:01 this voice agent? Then I'm really 12:03 curious if you think it's uh worth the 12:05 time. Let me know. Now, it's updating 12:08 the UI, which I didn't necessarily ask 12:11 for. I don't need to update the UI, but 12:13 let's see if it stored the information. 12:16 And let's and let's again check the 12:19 superbase to 12:21 see where it is storing 12:24 things. Okay, 12:26 so it has the section type, section 12:29 content, section title. Okay, good. So 12:32 it did create the database. That's good. 12:35 And here's all the parls 12:39 information. And you can even filter it. 12:43 Why choose? Why choose parl slabs? Okay, 12:46 that's that looks kind of fancy. I 12:48 didn't ask for it, but but all 12:51 right. Do you think that being a 12:53 programmer would help with talking to 12:57 Lavable? Can you imagine that someone 12:59 who doesn't know how to program at all 13:01 and who has never written SQL queries 13:04 that they would be just as well able to 13:07 build apps in superbase without any 13:10 coding knowledge? already now in the 13:12 state that Lavable is working at now. 13:15 Let me know. I am not sure. For now, it 13:19 seems like it still is very helpful to 13:22 be able to understand the code and I'm 13:25 not doing it now just for 13:28 the for the purity of the experiment. 13:32 But if I would have been reading the 13:34 code, I think I would have been able to 13:37 give a better feedback to lovable and to 13:41 control it and guide it in a better 13:43 way. Okay, one last try. Let's try to 13:47 see if we can simplify it. Hi, I only 13:50 need one view where I'm talking with the 13:53 voice. My voice is being 13:55 transcribed and then the 13:58 transcription is being passed through 14:00 rag pipeline to find the information 14:03 that is relevant to my question. This 14:06 information is then being pasted into 14:08 the LLM prompt. LLM generates the 14:11 response given the relevant information 14:14 and then you use text to speech to 14:16 respond back to me. Please fix 14:18 everything and simplify the front end. I 14:21 only need the microphone and everything 14:23 else training the embeddings rack 14:26 pipeline working with the data that all 14:28 needs to happen on the back 14:30 end. Okay, I gave it just 14:34 um another 14:38 try. When I clicked generate embeddings, 14:42 did it actually generate any embeddings? 14:46 And if so, where would it be storing 14:51 them? It doesn't look like it's in 14:55 parlabs info. It's also not in rag 14:59 queries 15:02 and it's not on the 15:06 website content n. So that's still a 15:10 mystery to 15:13 me. Okay. Okay. Okay, we got a simpler 15:16 way, a simpler interface. Let's try it. 15:20 Tell me about 15:26 parls. Oh god. 15:30 No, I was referring to the company that 15:32 you know about. I'm here to provide 15:35 information about Par. What do they do? 15:37 How can I assist you? Pars Labs 15:40 specializes in providing services and 15:42 solutions related to data processing and 15:44 management. No, this is a wrong. If you 15:46 have specific questions about their 15:47 services or product, that's it. Feel 15:49 free to ask. I think that rag was not 15:53 implemented and if you don't know how to 15:56 code, there is also not a very simple 15:58 way to debug it and test if that works 16:01 or doesn't. So my conclusion for now is 16:06 that you cannot create great chatbots or 16:09 voice assistants using lovable. You can 16:12 create simple prototypes that just work, 16:14 but if you need something a little bit 16:16 more complex, then you definitely need 16:18 to know how to code and how to put that 16:20 together. And I'm not even talking about 16:23 making your AI assistants and voice 16:25 agents sound more human. That requires a 16:28 completely different set of expertise 16:30 and you won't be able to fix that just 16:33 by using tools like lovable. But that 16:35 having said, it was a fun experiment for 16:37 me. I hope you also enjoyed it and you 16:41 learned something new about what is 16:43 possible now with the current 16:45 technology. If you want to see more 16:47 videos like this and more videos, more 16:49 serious videos about voice technology 16:52 and chatbot development, conversation 16:54 design, then follow me on this channel, 16:58 click subscribe and like this video if 17:00 you liked it. And I see you in the next