Python Project to Scrape YouTube using YouTube Data API

Do you want to build a simple Portfolio Project? (Especially for the aspiring Data Analyst).

If yes, then this blog and video is for you. In this video, we build a Python Project to Scrape YouTube data using YouTube Data API. Using YouTube API, we extract the data and then load this data into a Python Pandas DataFrame and then analyze this data. Finally, we build simple visualization from this data using the Python Seaborn library.

We start this project by first creating an YouTube API Key which will be our credential to access youtube data. I will should you in detail, how to create an API Key. Once the API Key is generated, we will then learn how to use this API key to access different youtube data. I.e. we will walk through the documentation given by google to use youtube API. We will look at the different sections in the documentation to access different data we need to build this project. We will also look at the sample python code given by google to call different resources and methods to fetch youtube data.

Finally, we will get into writing the python code to build this project. I will be using Jupyter Notebook to write my python code. Since it is a new project, we will create a new virtual environment for this project. We will use anaconda for this. Once the virtual environment is set, we will then install all the required python packages. So we will install "google-api-python-client" (which is the google python package required to access youtube api data), we will also install pandas and seaborn. I will show you how to create a virtual environment and also how to install all these packages in detail.

Once our environment is set and required packages are installed, we will then start writing the code in Jupyter Notebook. I have divided this project into 2 parts. In the first part, we extract channel details from youtube. I.e. we extract details such as youtube channel name, total no of subscribers, total views and total number of videos posted by each channel. We gather these details for few Data Analyst/Data Scientist kind of channel and then compare these channel data with each other. We shall see who has the highest subscriber and who gets the most views and the amount of videos posted by these channels. We will be loading all of this data into a pandas dataframe and then analyze it. We will also generate some basic visualization using this data so we can easily compare these multiple channels.

In the second part of the video, we shall build a logic to extract video details from a particular channel. We shall extract details such as video title, total views each video has got, total number of likes, dislikes and comments each video has got. We shall extract these details for all of the videos posted by a particular channel. We will then analyze this data by loading it into a pandas dataframe. At the end we will create some simple visualization using Seaborn python library.

Hopefully, this can be a good starting project for anyone aspiring to become a Data Analyst. If you find this video useful then please make sure to like the video and subscribe to the channel.

Click on the below link to download the Python script to build this project and the notebook used in this video.

Previous
Previous

Practice Writing SQL Queries using Real Dataset

Next
Next

Connect to PostgreSQL from Python (Using SQL in Python)