A step-by-step tutorial to build and deploy a web application for topic modelling of a Spotify Podcast
The article is in continuation of the story How to build a Web App to Transcribe and Summarize audio with Python. In the previous post, I have shown how to build an app that transcribes and summarizes the content of your favourite Spotify Podcast. The summary of a text can be useful for listeners to decide if the episode is interesting or not before listening to it.
But there are other possible features that can be extracted from audio. The topics. Topic modelling is one of the many natural language processing that enables the automatic extraction of topics from different types of sources, such as reviews of hotels, job offers, and social media posts.
In this post, we are going to build an app that collects the topics from a podcast episode with Python and analyzes the importance of each topic extracted with nice data visualizations. In the end, we’ll deploy the web app to Heroku for free.
- Create a GitHub repository, that will be needed to deploy the web application into production to Heroku!
- Clone the repository on your local PC with
git clone. In my case, I will use VS code, which is an IDE really efficient to work with python scripts, includes Git support and integrates the terminal. Copy the following commands on the terminal:
git commit -m "first commit"
git branch -M master
git remote add origin https://github.com/
git push -u origin master
- Create a virtual environment in Python.
Part 1: Create the Web Application to extract topics
This tutorial is split into two main parts. In the first part, we create our simple web application to extract the topics from the podcast. The remaining part focuses on the deployment of the app, which is an important step for sharing your app with the world anytime. Let’s get started!
1. Extract Episode’s URL from Listen Notes
We are going to discover the topics from an episode of Unconfirmed, called Want a Job in Crypto? Exchanges are hiring — Ep. 110. You can find the link to the episode here. As you may know from the news in television and newspaper, blockchain industry is exploding and there is the esigence to keep updated in the opening of jobs in that field. Surely, they will need data engineers and data scientists to manage data and extract values from these huge amounts of data.
Listen Notes is a podcast search engine and database online, allowing us to get access to podcast audio through their APIs. We need to define the function to extract the episode’s URL from the web page. First, you need to create an account to retrieve the data and subscribe to free plan to use the Listen Notes API.
Then, you click the episode you are interested in and select the option “Use API to fetch this episode” at the right of the page. Once you pressed it, you can change the default coding language to Python and click the requests option to use that python package. After, you copy the code and adapt it into a function.
It takes the credentials from a separate file, secrets.yaml, which is composed of a collection of key-value pairs like the dictionaries:
2. Retrieve Transcription and Topics from audio
To extract the topics, we first need to send a post request to AssemblyAI’s transcript endpoint by giving in input the audio URL retrieved in the previous step. After we can obtain the transcription and the topics of our podcast by sending a GET request to AssemblyAI.
The results will be saved into two different files:
Below I show an example of transcription:
Hi everyone. Welcome to Unconfirmed, the podcast that reveals how the marketing names and crypto are reacting to the week's top headlines and gets the insights you on what they see on the horizon. I'm your host, Laura Shin. Crypto, aka Kelman Law, is a New York law firm run by some of the first lawyers to enter crypto in 2013 with expertise in litigation, dispute resolution and anti money laundering. Email them at info at kelman law. ....
Now, I show the output of the topics extracted from the podcast’s episode:
We have obtained a JSON file, containing all the topics detected by AssemblyAI. Essentially, we transcribed the podcast into text, which is split up into different sentences and their corresponding relevance. For each sentence, we have a list of topics. At the end of this big dictionary, there is a summary of topics that have been extracted from all the sentences.
It’s worth noticing that Careers and JobSearch constitute the most relevant topic. In the top five labels, we also find Business and Finance, Startups, Economy, Business and Banking, Venture Capital and other similar topics.
3. Build Web Application with Streamlit
Now, we put all the functions defined in the previous steps into the main block, in which we build our web application with Streamlit, a free open-source framework that allows building applications with few lines of code using Python:
- The main title of the app is displayed using
- A left panel sidebar is created using
st.sidebar. We need it to insert the episode id of our podcast.
- After pressing the button “Submit”, a bar plot will appear, showing the most relevant 5 topics extracted.
- there is the Download button in case you want to download transcription, the topics and the data visualization
To run the web application, you need to write the following command line on the terminal:
streamlit run topic_app.py
Amazing! Now two URL should appear, click one of these and the web application is ready to be used!
Part 2: Deploy the Web Application to Heroku
Once you completed the code of the web application and you checked if it works well, the next step is to deploy it on the Internet to Heroku.
You are probably wondering what Heroku is. It’s a cloud platform that allows the development and deployment of web applications using different coding languages.
- Create requirements.txt, Procfile and setup.sh
- Connect to Heroku
- Create requirements.txt, Procfile and setup.sh
After, we create a file requirements.txt, that includes all the python packages requested by your script. We can automatically create it using the following command line by using this marvellous python library pipreqs.
It will magically generate a requirements.txt file:
Avoid using the command line
pip freeze > requirements like this article suggested. The problem is that it returns more python packages that could not be required from that specific project.
In addition to requirements.txt, we also need Procfile, which specifies the commands that are needed to run the web application.
The last requirement is to have a setup.sh file that contains the following code:
mkdir -p ~/.streamlit/echo "
port = $PORTn
enableCORS = falsen
headless = truen
" > ~/.streamlit/config.toml
2. Connect to Heroku
If you didn’t register yet on Heroku’s website, you need to create a free account to be able to exploit its services. It’s also necessary to install Heroku on your local PC. Once you accomplished these two requirements, we can begin the fun part! Copy the following command line on the terminal:
After pressing the command, a window of Heroku will appear on your browser and you’ll need to put the email and password of your account. If it works, you should have the following result:
So, you can return on VS code and write the command to create your web application on the terminal:
heroku create topic-web-app-heroku
To deploy the app to Heroku, we need this command line:
git push heroku master
It’s used to push the code from the local repository’s main branch to heroku remote. After you push the changes to your repository with other commands:
git add -A
git commit -m "App over!"
We are finally done! Now you should see your app that is finally deployed!
I hope you appreciated this mini-project! It can be really fun to create and deploy apps. The first time can be a little intimidating, but once you finish, you won’t have any regrets! The GitHub code is here. Thanks for reading. Have a nice day!