If you’re working as a data analyst today, Python is no longer just a “nice to have.” It’s part of the daily toolkit.
But here’s the reality:
Most analysts don’t need dozens of libraries — they need a small, reliable stack, used well.
This guide goes beyond listing packages. It shows:
- what each library is actually used for
- where it fits in real workflows
- how analysts use it with practical code examples
1. Data Manipulation: Where Analysts Spend Most of Their Time
Pandas — Your Core Tool
Pandas is the backbone of almost every Python-based analysis.
Example: Cleaning and Aggregating Data
import pandas as pd# Load data
df = pd.read_csv("orders.csv")# Basic inspection
print(df.head())
print(df.info())# Convert timestamp
df["order_date"] = pd.to_datetime(df["order_date"])# Filter valid orders
df = df[df["status"] == "completed"]# Create revenue column
df["revenue"] = df["price"] * df["quantity"]# Aggregate by week
weekly_revenue = (
df.groupby(pd.Grouper(key="order_date", freq="W"))
.agg(total_revenue=("revenue", "sum"))
.reset_index()
)print(weekly_revenue.head())
What this reflects in real work:
- cleaning messy data
- building derived metrics
- time-based aggregation
NumPy — Efficient Calculations
Used for vectorized operations and performance.
Example: Conditional Logic at Scale
import numpy as npdf["high_value_flag"] = np.where(df["revenue"] > 100, 1, 0)
This avoids slow row-by-row operations.
2. SQL + Python: Real-World Data Access
Most analysts don’t work with CSV files — they work with databases.
SQLAlchemy + Pandas
Example: Pull Data from Database
from sqlalchemy import create_engine
import pandas as pdengine = create_engine("postgresql://user:password@host:port/db")query = """
SELECT user_id, order_date, price, quantity
FROM orders
WHERE order_date >= '2024-01-01'
"""df = pd.read_sql(query, engine)
Why this matters:
- integrates SQL + Python workflow
- avoids exporting data manually
- enables reproducible pipelines
3. Data Visualization: Making Patterns Visible
Seaborn — Fast, Clean Visuals
Example: Distribution Analysis
import seaborn as sns
import matplotlib.pyplot as pltsns.histplot(df["revenue"], bins=50)
plt.title("Revenue Distribution")
plt.show()
Example: Relationship Between Variables
sns.scatterplot(x="price", y="revenue", data=df)
plt.title("Price vs Revenue")
plt.show()
Example: Group Comparison
sns.boxplot(x="region", y="revenue", data=df)
plt.xticks(rotation=45)
plt.title("Revenue by Region")
plt.show()
What this reflects:
- understanding distributions
- detecting outliers
- comparing segments
Plotly — Interactive Charts
import plotly.express as pxfig = px.line(weekly_revenue, x="order_date", y="total_revenue",
title="Weekly Revenue Trend")
fig.show()
Useful when sharing with stakeholders.
4. Data Cleaning & Feature Creation
Handling Missing Data
# Check missing values
df.isnull().sum()# Fill missing values
df["price"] = df["price"].fillna(df["price"].median())# Drop rows with critical missing data
df = df.dropna(subset=["user_id"])
Feature Engineering (Very Important)
# Recency feature
df["days_since_last_order"] = (
df["order_date"].max() - df["order_date"]
).dt.days# Customer-level aggregation
user_features = (
df.groupby("user_id")
.agg(
total_orders=("order_id", "count"),
avg_order_value=("revenue", "mean"),
last_order_days=("days_since_last_order", "min")
)
.reset_index()
)
This is the bridge between BI and data science.
5. Statistical Analysis & Experimentation
statsmodels — Interpretable Models
Example: Linear Regression
import statsmodels.api as smX = user_features[["total_orders", "avg_order_value"]]
y = user_features["last_order_days"]X = sm.add_constant(X)model = sm.OLS(y, X).fit()
print(model.summary())
Why analysts use this:
- clear coefficients
- statistical significance
- explainability
A/B Test Example (Simple)
from scipy import statscontrol = df[df["group"] == "control"]["conversion"]
treatment = df[df["group"] == "treatment"]["conversion"]t_stat, p_value = stats.ttest_ind(treatment, control)print(f"T-stat: {t_stat}, P-value: {p_value}")
6. Machine Learning (Applied, Not Overkill)
scikit-learn — Practical Modeling
Example: Simple Churn Model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_scoreX = user_features[["total_orders", "avg_order_value"]]
y = user_features["churn_flag"]X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)model = LogisticRegression()
model.fit(X_train, y_train)preds = model.predict(X_test)print("Accuracy:", accuracy_score(y_test, preds))
Key point:
Analysts don’t need complex models — but understanding simple models is powerful.
7. Working with APIs
Requests — Pull External Data
import requestsresponse = requests.get("https://api.example.com/users")
data = response.json()df_api = pd.DataFrame(data)
Used for:
- enrichment data
- automation
- integrations
8. Automation & Workflow Tools
Jupyter Notebook — Where Analysis Happens
- combine code + explanation
- share results easily
- ideal for exploration
tqdm — Track Long Processes
from tqdm import tqdmfor i in tqdm(range(1000)):
pass
File Automation
import osfiles = os.listdir("data/")
print(files)
9. A Real End-to-End Workflow
Let’s connect everything.
Example: Revenue Analysis + Prediction
- Extract data → SQLAlchemy
- Clean data → Pandas
- Explore patterns → Seaborn
- Build features → Pandas
- Test hypothesis → SciPy
- Build model → scikit-learn
- Visualize output → Plotly
This is what real analyst workflows look like.
10. What Actually Matters (Not the Number of Libraries)
It’s easy to think:
“Good analysts know many libraries.”
In reality:
Good analysts know how to use a few tools extremely well.
Focus on:
- Pandas
- SQL integration
- visualization
- basic statistics
- simple models
Everything else builds on top of that.
Final Thoughts
Python is not about memorizing libraries.
It’s about building a system that allows you to:
- move from raw data → insight
- move from insight → decision
- do it reliably and repeatedly
Most analysts succeed not because they know more tools,
but because they use the right ones with clarity and judgment.
Discover more from Daily BI Talks
Subscribe to get the latest posts sent to your email.
