First Post

February 7, 2026

Introduction

Welcome to my first blog post. This is a space where I explore various topics in software engineering, from database queries to data pipelines and system design. Grab a coffee and let’s dive in.

The best way to learn is by doing. Throughout this post, I’ll walk through some real examples that cover SQL analytics, Python scripting, system architecture, and a bit of web styling.

Analyzing User Engagement with SQL

One of the first things you might want to do with a new product is understand how users are interacting with it. Here’s a query that calculates a rolling 7-day active user count, broken down by signup cohort:

WITH daily_activity AS (
    SELECT
        u.id AS user_id,
        DATE_TRUNC('week', u.created_at) AS cohort_week,
        d.activity_date,
        COUNT(DISTINCT d.session_id) AS sessions
    FROM users u
    JOIN daily_events d ON u.id = d.user_id
    WHERE d.activity_date >= CURRENT_DATE - INTERVAL '90 days'
    GROUP BY u.id, cohort_week, d.activity_date
),
rolling_active AS (
    SELECT
        cohort_week,
        activity_date,
        COUNT(DISTINCT user_id) AS active_users,
        SUM(sessions) AS total_sessions
    FROM daily_activity
    GROUP BY cohort_week, activity_date
)
SELECT
    cohort_week,
    activity_date,
    active_users,
    total_sessions,
    ROUND(total_sessions::NUMERIC / active_users, 2) AS avg_sessions_per_user
FROM rolling_active
ORDER BY cohort_week, activity_date;

This gives you a clear picture of whether newer cohorts are more or less engaged than older ones. If the avg_sessions_per_user is declining across cohorts, it might be time to revisit the onboarding flow.

Building a Simple ETL Pipeline in Python

Once you have the data, you’ll want to move it somewhere useful. Here’s a lightweight ETL script that extracts records from a Postgres database, transforms them, and loads them into a target table:

import logging
from dataclasses import dataclass
from datetime import datetime

import psycopg2
from psycopg2.extras import execute_values

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


@dataclass
class PipelineConfig:
    source_dsn: str
    target_dsn: str
    batch_size: int = 5000


def extract(config: PipelineConfig, since: datetime) -> list[dict]:
    """Pull raw event records from the source database."""
    query = """
        SELECT user_id, event_type, payload, created_at
        FROM raw_events
        WHERE created_at >= %s
        ORDER BY created_at
    """
    with psycopg2.connect(config.source_dsn) as conn:
        with conn.cursor() as cur:
            cur.execute(query, (since,))
            columns = [desc[0] for desc in cur.description]
            return [dict(zip(columns, row)) for row in cur.fetchall()]


def transform(records: list[dict]) -> list[tuple]:
    """Clean and reshape records for the analytics schema."""
    transformed = []
    for r in records:
        event_date = r["created_at"].date()
        category = classify_event(r["event_type"])
        transformed.append((
            r["user_id"],
            event_date,
            r["event_type"],
            category,
            r["payload"],
        ))
    return transformed


def classify_event(event_type: str) -> str:
    mapping = {
        "page_view": "engagement",
        "signup": "acquisition",
        "purchase": "monetization",
        "referral": "growth",
    }
    return mapping.get(event_type, "other")


def load(config: PipelineConfig, rows: list[tuple]) -> int:
    """Bulk-insert transformed rows into the target table."""
    insert_sql = """
        INSERT INTO analytics_events
            (user_id, event_date, event_type, category, payload)
        VALUES %s
        ON CONFLICT (user_id, event_date, event_type) DO NOTHING
    """
    with psycopg2.connect(config.target_dsn) as conn:
        with conn.cursor() as cur:
            for i in range(0, len(rows), config.batch_size):
                batch = rows[i : i + config.batch_size]
                execute_values(cur, insert_sql, batch)
        conn.commit()
    return len(rows)


if __name__ == "__main__":
    cfg = PipelineConfig(
        source_dsn="postgresql://src_user:pass@localhost:5432/source_db",
        target_dsn="postgresql://tgt_user:pass@localhost:5432/target_db",
    )
    since = datetime(2026, 1, 1)
    logger.info("Starting extraction since %s", since)

    raw = extract(cfg, since)
    logger.info("Extracted %d records", len(raw))

    cleaned = transform(raw)
    logger.info("Transformed %d records", len(cleaned))

    count = load(cfg, cleaned)
    logger.info("Loaded %d records into target", count)

The pattern is intentionally simple – extract, transform, load – but it covers batch processing, conflict handling, and basic logging which are enough to get a pipeline running in production.

System Architecture Overview

Here’s a high-level view of how these pieces fit together in a typical data platform:

graph TD
    A[Web Application] -->|Events| B[Message Queue]
    B --> C[Stream Processor]
    C --> D[(Raw Events DB)]
    D --> E[ETL Pipeline]
    E --> F[(Analytics DB)]
    F --> G[Dashboard]
    F --> H[ML Feature Store]

    style A fill:#4a90d9,stroke:#2c5f8a,color:#fff
    style B fill:#f5a623,stroke:#c17d1a,color:#fff
    style D fill:#7ed321,stroke:#5a9a18,color:#fff
    style F fill:#7ed321,stroke:#5a9a18,color:#fff
    style G fill:#bd10e0,stroke:#8a0ba3,color:#fff
    style H fill:#bd10e0,stroke:#8a0ba3,color:#fff

And a sequence diagram showing how a single event flows through the system:

sequenceDiagram
    participant User
    participant App
    participant Queue
    participant Processor
    participant DB

    User->>App: Clicks "Purchase"
    App->>Queue: Publish purchase_event
    Queue->>Processor: Deliver message
    Processor->>Processor: Validate & enrich
    Processor->>DB: INSERT INTO raw_events
    DB-->>Processor: Acknowledge
    Processor-->>Queue: ACK message

Styling a Metrics Card with HTML

Sometimes you want to embed a quick visual element directly in your page. Here’s a styled metrics card using inline HTML:

Daily Active Users

12,847

+14.2% from last week

Avg Session Duration

8m 32s

+2.1% from last week

Conversion Rate

3.42%

-0.3% from last week

You can also use HTML tables for structured comparisons:

Database	Best For	Write Speed	Query Flexibility
PostgreSQL	General purpose OLTP	Medium	High
ClickHouse	Analytics / OLAP	Very High	Medium
Redis	Caching / Real-time	Very High	Low
DuckDB	Embedded analytics	High	High

Wrapping Up

That covers a quick tour through SQL analytics, Python ETL pipelines, system architecture diagrams, and HTML-based content formatting. Each of these tools has its place in a modern data stack, and knowing how to combine them is what makes the difference between a good engineer and a great one.

Stay curious. Ship things. Iterate.