How to run SQL in Python (Jupyter Notebook)
What is SQL? It is a domain-specific language used in programming and designed for managing data held in a relational database management system, or for stream processing in a relational data stream management system. Wikipedia
What is Python? It is a high-level and general-purpose programming language that can be used for web development, data science and scripting.
What is Jupyter Notebook? It is a is a web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience.
We do have a lot of information for setting up an extensive database. For now, I am going to take you through a quick and efficient technique to start working with SQL (Structured Query Language) with Pandas.
Often we come across scenarios where upon reading a file like csv or excel, we do feel a need to perform some quick analysis through SQL (Obviously for SQL lovers 😉). Let’s jump in… ⛷️
Workflow:
Pre-requisites : A working Jupyter NB, preferably Anaconda installation will provide all the requirements.
- Import required libraries.
- Create a DataFrame.
- Create a Database.
- Export the DataFrame to the Database.
- Start querying with SQL.
Note: Although, I have mentioned Jupyter Notebook in the title, the following technique and codes can be used on any IDE running on python.
Step 1: Import Libraries
Import pandas and sqlite3 library.
import pandas as pd
import sqlite3 as sql
Step 2: Create a DataFrame
Create a new dataframe or read an external file like csv, excel, blah..
df = pd.read_csv('./airport')
Step 3: Create a Database
This will connect to an existing db or create a new one if not exists.
conn = sql.connect('default.db')
Step 4: Export the DataFrame to the Database
Now, we will export the existing dataframe to the created database. The table name should be passed explicitly, as ‘sql_table’ here.
df.to_sql('sql_table', conn)
Step 4: Start querying through SQL
Create the query to execute and store the result into the dataframe to check it again.
query = 'select * from sql_table'
new_df = pd.read_sql(query, conn)
new_df.head(3)
While the entire process is simple, there is a catch 🙄 in case of re-creating a table i.e, with an existing name. All you need to do is drop the old table and then create one. Check out the below code. ❤️
Feel free to share this with data enthusiasts over social platforms and do post follow up questions, if any. ✌️