python and sql for business intelligence dailybitalks.com

Business Intelligence with Python and SQL: A Beginner’s Guide

In Business Intelligence (BI), SQL is used to extract and manipulate data from databases, while Python adds flexibility for data processing, visualization, and automation. Combining both enables you to build powerful, automated BI pipelines and dashboards.


What You Need to Get Started

Before diving in, ensure you have the following installed:

  • Python 3.x
  • pip (Python package manager)
  • A SQL database (e.g., PostgreSQL, MySQL, SQLite, or Microsoft SQL Server)

Recommended Python Libraries:

  • pandas: Data manipulation
  • SQLAlchemy: Connect to SQL databases
  • matplotlib/seaborn: Data visualization
  • pyspark: for working with pyspark

1. Connecting Python to Relational Databases

Here’s how to connect to a database using SQLAlchemy (for most DBs) or SQLite (for quick prototyping).

Example 1: SQLite

import sqlite3

# Create connection to SQLite
con = sqlite3.connect('sqlite:///example.db')

cur = con.cursor()

# The result of a "cursor.execute" can be iterated over by row
for row in cur.execute('SELECT * FROM tablename;'):
    print(row)

# Be sure to close the connection
con.close()

Alternatively, use Pandas and SQLite

import sqlite3
import pandas as pd

# Create connection to SQLite
con = sqlite3.connect('sqlite:///example.db')

# Read data from SQL into Pandas
df = pd.read_sql_query("SELECT * FROM tablename", con)

print(df.head())
con.close()

Example 2: PostgreSQL (or NoSQL MySQL)

import pandas as pd
import sqlalchemy as db

# Format: dialect+driver://username:password@host:port/database
engine = db.create_engine(('postgresql://user:password@localhost:5432/mydatabase')

df = pd.read_sql_query('SELECT * FROM tablename', engine)

print(df.head())

Example 3: MS SQL

import pyodbc
import pandas as pd

cnxn = pyodbc.connect('DRIVER=ODBC Driver 17 for SQL Server;SERVER=your_db_server_id,your_db_server_port;DATABASE=pangard;UID=your_db_username;PWD=your_db_password')

query = "SELECT * FROM database.tablename;"

df = pd.read_sql(query, cnxn)
print(df)

You can also use sqlalchemy to make that connection

import pandas as pd
import sqlalchemy as sql
import pyodbc

server = '100.10.10.10'
driver = 'SQL+Server'
db = 'master'

engine = sql.create_engine('mssql+pyodbc://{}/{}?driver={}'.format(server, db, driver))
df = pd.read_sql_query('SELECT * FROM tablename', engine)

2. Using PySpark for Spark SQL

Ideal for big data or distributed clusters (e.g., Databricks, AWS EMR, local Spark)

from pyspark.sql import SparkSession

# Start Spark session
spark = SparkSession
    .builder
    .appName("BI with Spark SQL") 
    .config("spark.some.config.option", "some-value") 
    .getOrCreate()

# Read CSV or Parquet
df_spark = spark.read.csv("sales.csv", header=True, inferSchema=True)
df_spark.createOrReplaceTempView("sales")

# Run SQL
result = spark.sql("SELECT region, SUM(sales) as total_sales FROM sales GROUP BY region")
result.show()

spark.stop()

3. Common BI Use Cases (SQL + Python + Spark)

Let’s walk through real-world BI tasks using both SQLAlchemy + pandas and Spark SQL.

SQLAlchemy Version

df = pd.read_sql_query(query, engine)
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.plot(df['order_date'], df['total_sales'], marker='o')
plt.title('Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Total Sales')
plt.grid(True)
plt.show()

Spark SQL Version

query = """
SELECT order_date, SUM(sales) AS total_sales
FROM sales
GROUP BY order_date
ORDER BY order_date
"""
result = spark.sql(query)
result.show()
Tip: For Spark, use .toPandas() to convert results to pandas for visualization:

result.toPandas().plot(x='order_date', y='total_sales')
  

Python + SQLAlchemy:

df = pd.read_sql_query(query, engine)

import seaborn as sns
sns.barplot(data=df, x='revenue', y='product_name')
plt.title("Top 10 Products by Revenue")
plt.xlabel("Revenue")
plt.ylabel("Product")
plt.show()

Spark SQL:


spark.sql(query).toPandas().plot(kind='barh', x='product_name', y='revenue')
###spark SQL

query = """
SELECT customer_id,
       MAX(order_date) AS last_order,
       CASE WHEN MAX(order_date) < DATE_SUB(CURRENT_DATE(), 180) THEN 1 ELSE 0 END AS churn
FROM sales
GROUP BY customer_id
"""
spark.sql(query).show()

Bonus: Automating BI Pipelines

You can automate Spark + SQL pipelines using:

  • Apache Airflow for scheduling
  • Databricks Jobs or EMR Steps
  • Save results to Delta/Parquet for dashboards
  • Email PDF/Excel reports with Python

result.toPandas().to_excel("daily_summary.xlsx", index=False)
  

Final Thoughts

Using Python + SQL + Spark SQL gives you:

✅ Traditional database querying (SQL)
✅ Powerful processing at scale (Spark SQL)
✅ Rich analysis, visualization, and automation (Python)

This tech stack is essential for modern BI analysts dealing with growing data and needing deeper insight.


Discover more from Daily BI Talks

Subscribe to get the latest posts sent to your email.