How To Import Datasets Into Google Colab Notebook
Google Colab, the go-to development environment for machine learning, empowers you to seamlessly write and execute Python code directly within your web browser. In this article, we will explore various methods for uploading datasets to Google Colab, a fundamental step in your data science and machine learning workflows.
First, open Google Colab in your web browser. It’s an online platform.
Once you’ve accessed Google Colab, create a new Colab Notebook. This is where you’ll write and execute your Python code.
And Let’s Start!
1. Import from Your Local Machine
One of the initial and straightforward methods is transferring data from your local computer to Google Colab. While this method is user-friendly, it may not be efficient when handling extensive datasets or data that requires a significant amount of time to upload, such as video and audio files. Uploading each data individually from your folder can be a time-consuming process. However, it’s a viable option when dealing with smaller datasets.
from google.colab import files
files.upload()
When you click ‘Select Files’ on the left, it opens a file explorer that allows you to choose the file(s) you want from your local machine. Unfortunately, you can’t select an entire folder at once. Therefore, you should opt for selecting multiple files in bulk.
2. Import with drive.mount() / Google Drive Sync
First, let’s examine the drive.mount() method. First, write this code into your Google Colab note and run the cell.
from google.colab import drive
drive.mount(‘/content/gdrive’)
When you choose to ‘Connect to Google Drive’ , it will redirect you to a login panel. After you log in, all your folders and previously uploaded data on Google Drive will be automatically displayed.
To check, press the file icon on the vertically positioned panel on the right and you will see all your data on the screen that opens.
In fact, you can synchronize your Google Drive with your Google Colab Notebook without having to write this code. You can achieve this by following the steps illustrated in the images below:
Select the files icon:
Select the files icon:
Select the icon:
After clicking the icon, you will be directed to a panel where you will be prompted to grant access permissions and log in, just like when using the drive.mount() method. Upon signing in, your Google Drive will sync automatically.
3. Import from Kaggle
In this section, we will explore how to import pre-existing datasets from Kaggle, a popular platform for machine learning enthusiasts that offers a wide range of datasets.
If you prefer not to consume the storage capacity of your local machine or OneDrive, you can utilize Kaggle’s provided API. However, a few preparations are required before proceeding with this method.
First we need to create tokens. To generate a token, begin by accessing the settings panel through your profile picture. Next, navigate to the API section and select “Create New Token.” This action will initiate the download of your token in JSON format as a file.
We import the Kaggle file you downloaded to your local machine using the method described above.
After this step, we first download Kaggle to Google Colab and then then set up the required components.
! pip install -q kaggle
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
After configuring the settings, proceed to select the dataset from Kaggle. Click on the 3 dots and copy the API command.
Go back to Google Colab Notebook and copy the command into the new cell and add ! put the sign.
! kaggle datasets download -d chrisfilo/urbansound8k
You can unzip the dataset and remove existing one.
! unzip urbansound8k.zip && rm urbansound8k.zip
And there you have it! Your dataset is now ready and at your disposal for all your project needs. Happy exploring and analyzing!