First Post
Introduction
Welcome to my first blog post. This is a space where I explore various topics in software engineering, from database queries to data pipelines and system design. Grab a coffee and let’s dive in.
The best way to learn is by doing. Throughout this post, I’ll walk through some real examples that cover SQL analytics, Python scripting, system architecture, and a bit of web styling.
Analyzing User Engagement with SQL
One of the first things you might want to do with a new product is understand how users are interacting with it. Here’s a query that calculates a rolling 7-day active user count, broken down by signup cohort:
WITH daily_activity AS (
SELECT
u.id AS user_id,
DATE_TRUNC('week', u.created_at) AS cohort_week,
d.activity_date,
COUNT(DISTINCT d.session_id) AS sessions
FROM users u
JOIN daily_events d ON u.id = d.user_id
WHERE d.activity_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY u.id, cohort_week, d.activity_date
),
rolling_active AS (
SELECT
cohort_week,
activity_date,
COUNT(DISTINCT user_id) AS active_users,
SUM(sessions) AS total_sessions
FROM daily_activity
GROUP BY cohort_week, activity_date
)
SELECT
cohort_week,
activity_date,
active_users,
total_sessions,
ROUND(total_sessions::NUMERIC / active_users, 2) AS avg_sessions_per_user
FROM rolling_active
ORDER BY cohort_week, activity_date;
This gives you a clear picture of whether newer cohorts are more or less engaged than older ones. If the avg_sessions_per_user is declining across cohorts, it might be time to revisit the onboarding flow.
Building a Simple ETL Pipeline in Python
Once you have the data, you’ll want to move it somewhere useful. Here’s a lightweight ETL script that extracts records from a Postgres database, transforms them, and loads them into a target table:
import logging
from dataclasses import dataclass
from datetime import datetime
import psycopg2
from psycopg2.extras import execute_values
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@dataclass
class PipelineConfig:
source_dsn: str
target_dsn: str
batch_size: int = 5000
def extract(config: PipelineConfig, since: datetime) -> list[dict]:
"""Pull raw event records from the source database."""
query = """
SELECT user_id, event_type, payload, created_at
FROM raw_events
WHERE created_at >= %s
ORDER BY created_at
"""
with psycopg2.connect(config.source_dsn) as conn:
with conn.cursor() as cur:
cur.execute(query, (since,))
columns = [desc[0] for desc in cur.description]
return [dict(zip(columns, row)) for row in cur.fetchall()]
def transform(records: list[dict]) -> list[tuple]:
"""Clean and reshape records for the analytics schema."""
transformed = []
for r in records:
event_date = r["created_at"].date()
category = classify_event(r["event_type"])
transformed.append((
r["user_id"],
event_date,
r["event_type"],
category,
r["payload"],
))
return transformed
def classify_event(event_type: str) -> str:
mapping = {
"page_view": "engagement",
"signup": "acquisition",
"purchase": "monetization",
"referral": "growth",
}
return mapping.get(event_type, "other")
def load(config: PipelineConfig, rows: list[tuple]) -> int:
"""Bulk-insert transformed rows into the target table."""
insert_sql = """
INSERT INTO analytics_events
(user_id, event_date, event_type, category, payload)
VALUES %s
ON CONFLICT (user_id, event_date, event_type) DO NOTHING
"""
with psycopg2.connect(config.target_dsn) as conn:
with conn.cursor() as cur:
for i in range(0, len(rows), config.batch_size):
batch = rows[i : i + config.batch_size]
execute_values(cur, insert_sql, batch)
conn.commit()
return len(rows)
if __name__ == "__main__":
cfg = PipelineConfig(
source_dsn="postgresql://src_user:pass@localhost:5432/source_db",
target_dsn="postgresql://tgt_user:pass@localhost:5432/target_db",
)
since = datetime(2026, 1, 1)
logger.info("Starting extraction since %s", since)
raw = extract(cfg, since)
logger.info("Extracted %d records", len(raw))
cleaned = transform(raw)
logger.info("Transformed %d records", len(cleaned))
count = load(cfg, cleaned)
logger.info("Loaded %d records into target", count)
The pattern is intentionally simple – extract, transform, load – but it covers batch processing, conflict handling, and basic logging which are enough to get a pipeline running in production.
System Architecture Overview
Here’s a high-level view of how these pieces fit together in a typical data platform:
graph TD
A[Web Application] -->|Events| B[Message Queue]
B --> C[Stream Processor]
C --> D[(Raw Events DB)]
D --> E[ETL Pipeline]
E --> F[(Analytics DB)]
F --> G[Dashboard]
F --> H[ML Feature Store]
style A fill:#4a90d9,stroke:#2c5f8a,color:#fff
style B fill:#f5a623,stroke:#c17d1a,color:#fff
style D fill:#7ed321,stroke:#5a9a18,color:#fff
style F fill:#7ed321,stroke:#5a9a18,color:#fff
style G fill:#bd10e0,stroke:#8a0ba3,color:#fff
style H fill:#bd10e0,stroke:#8a0ba3,color:#fff
And a sequence diagram showing how a single event flows through the system:
sequenceDiagram
participant User
participant App
participant Queue
participant Processor
participant DB
User->>App: Clicks "Purchase"
App->>Queue: Publish purchase_event
Queue->>Processor: Deliver message
Processor->>Processor: Validate & enrich
Processor->>DB: INSERT INTO raw_events
DB-->>Processor: Acknowledge
Processor-->>Queue: ACK message
Styling a Metrics Card with HTML
Sometimes you want to embed a quick visual element directly in your page. Here’s a styled metrics card using inline HTML:
You can also use HTML tables for structured comparisons:
| Database | Best For | Write Speed | Query Flexibility |
|---|---|---|---|
| PostgreSQL | General purpose OLTP | Medium | High |
| ClickHouse | Analytics / OLAP | Very High | Medium |
| Redis | Caching / Real-time | Very High | Low |
| DuckDB | Embedded analytics | High | High |
Wrapping Up
That covers a quick tour through SQL analytics, Python ETL pipelines, system architecture diagrams, and HTML-based content formatting. Each of these tools has its place in a modern data stack, and knowing how to combine them is what makes the difference between a good engineer and a great one.
Stay curious. Ship things. Iterate.