Tracking Core Web Vitals Over Time with Python

Core Web Vitals (CWVs) are not just buzzwords; they are measurable metrics that reflect the real-world user experience of a webpage. Google heavily weights these metrics—Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS)—when determining search ranking and overall site performance.

However, simply tracking a snapshot of CWVs is insufficient. To truly improve performance, you need historical data: how did LCP change month-over-month? Did the CLS improvement from last quarter hold up? Python provides a powerful, programmatic way to scrape, store, process, and visualize this crucial time-series performance data.

This guide details the workflow for using Python to track Core Web Vitals over time, moving from basic data collection to actionable analysis.

⚙️ The Core Workflow

Tracking CWVs over time generally involves four distinct phases:

Data Collection: Acquiring the raw CWV data for a specific URL at a specific point in time.
Data Storage: Storing the collected data into a structured, time-series database.
Data Processing: Cleaning, normalizing, and calculating trends from the stored data.
Data Visualization: Presenting the historical trends in an understandable format.

🐍 Phase 1: Data Collection (The Scraper)

Since CWVs are deeply integrated into browser performance, reliable collection requires simulating a real user visit.

Choosing Your Tool

While you could try direct API calls (like those from Google Search Console, which are often rate-limited or lack historical granular data), a more universal approach is to use a headless browser framework.

Recommended Library: Selenium or Playwright. Playwright is often preferred for its speed and modern API.

Example: Basic Scraper Setup (Conceptual)

A basic script would initialize the browser, navigate to the target URL, wait for key elements to load, and then capture performance metrics.

“`python

Note: Actual implementation depends heavily on the required metrics source (Lighthouse/API)

from playwright.sync_api import sync_playwright
from datetime import datetime

def collect_cwv_data(url: str) -> dict:
“””Navigates to a URL and simulates performance auditing.”””

# Playwright context setup
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    # Load the page
    page.goto(url, wait_until="networkidle")

    # --- The Crucial Step: Generating Performance Data ---
    # For accurate CWV, you ideally run an audit (e.g., using Lighthouse via API, 
    # or waiting for specific performance events). For demonstration, we'll simulate 
    # the measurement:

    # In a real-world scenario, you would inject a performance script or API call here.

    # Simulate data collection
    metrics = {
        'timestamp': datetime.now().isoformat(),
        'url': url,
        'lcp': page.evaluate("document.querySelector('img').getBoundingClientRect().height"), # Simplified placeholder
        'cls': 0.01, # Placeholder
        'fid': 120,  # Placeholder
    }

    browser.close()
    return metrics

Example usage

data = collect_cwv_data(“https://example.com”)

print(data)

“`

💡 Pro-Tip: Using Lighthouse: For the most authoritative CWV scores, consider using the Puppeteer/Playwright ability to run Google Lighthouse audits. This tool is explicitly designed to calculate these metrics accurately.

💾 Phase 2: Data Storage (The Time-Series Database)

As you run your scraper daily or weekly, you will generate thousands of records. Storing this data requires a database optimized for time-series data.

Recommended Tools

PostgreSQL with TimescaleDB Extension: Excellent balance of power and reliability. It handles time-series queries incredibly well.
InfluxDB: A purpose-built time-series database. Highly optimized for metrics and performance tracking.
SQLite (for local testing): Suitable if your tracking is limited to a single machine or small scale.

Implementation using `psycopg2` (for PostgreSQL)

Assuming you are using PostgreSQL, the structure of your table should be simple and optimized for time and URL.

“`python
import psycopg2
from datetime import datetime

DB_CONFIG = {
“host”: “localhost”,
“database”: “cwv_metrics”,
“user”: “user”,
“password”: “password”
}

def store_metrics(metrics: dict):
“””Inserts collected CWV metrics into the PostgreSQL database.”””
conn = None
try:
conn = psycopg2.connect(**DB_CONFIG)
cursor = conn.cursor()

    # SQL Injection protection via parameterized queries
    sql = """
    INSERT INTO core_web_vitals (
        timestamp, url, lcp, cls, fid
    ) VALUES (%s, %s, %s, %s, %s);
    """

    cursor.execute(
        sql, 
        (
            metrics['timestamp'], 
            metrics['url'], 
            metrics['lcp'], 
            metrics['cls'], 
            metrics['fid']
        )
    )
    conn.commit()
    print("Data successfully stored.")

except Exception as e:
    print(f"Error storing data: {e}")
finally:
    if conn:
        conn.close()

“`

📊 Phase 3 & 4: Analysis and Visualization

Once the data is consistently stored, the real power of Python comes into play for analysis. We use pandas for data manipulation and matplotlib or plotly for visualization.

Retrieving and Processing Data

This function pulls all historical data for a single URL.

“`python
import pandas as pd

Assuming you have a function to connect and query the DB

def get_historical_data(target_url: str) -> pd.DataFrame:
“””Fetches all CWV data for a given URL and returns a DataFrame.”””
# — Placeholder: In reality, this function queries your database —
data = {
‘timestamp’: pd.to_datetime([‘2023-10-01’, ‘2023-11-01’, ‘2023-12-01’, ‘2024-01-01’]),
‘lcp’: [2800, 2500, 2300, 2100], # Improving LCP (lower is better)
‘cls’: [0.15, 0.12, 0.09, 0.07], # Improving CLS (lower is better)
‘fid’: [180, 150, 120, 100]
}
df = pd.DataFrame(data)
df[‘url’] = target_url
df = df.sort_values(by=’timestamp’).reset_index(drop=True)
return df

Get the data

history_df = get_historical_data(“https://your-tracked-site.com”)
print(“Historical Data Snapshot:”)
print(history_df)
“`

Visualizing Trends with Plotly

For professional, interactive dashboards, Plotly is superior to standard matplotlib.

“`python
import plotly.express as px

def visualize_cwv_trends(df: pd.DataFrame):
“””Generates interactive trend lines for LCP, CLS, and FID.”””

fig = make_subplots(rows=3, cols=1, 
                    shared_xaxes=True, 
                    vertical_spacing=0.05,
                    subplot_titles=("LCP Trend (ms)", "CLS Trend (Score)", "FID Trend (ms)"))

# LCP (Goal: Decrease)
fig.add_trace(go.Scatter(x=df['timestamp'], y=df['lcp'], mode='lines+markers', name='LCP'), row=1, col=1)

# CLS (Goal: Decrease)
fig.add_trace(go.Scatter(x=df['timestamp'], y=df['cls'], mode='lines+markers', name='CLS'), row=2, col=1)

# FID (Goal: Decrease)
fig.add_trace(go.Scatter(x=df['timestamp'], y=df['fid'], mode='lines+markers', name='FID'), row=3, col=1)

fig.update_layout(
    title_text="Core Web Vitals Performance Over Time", 
    height=800,
    xaxis_title="Date",
    yaxis_title="Score / Time"
)

fig.show() # In a Jupyter/Colab environment, this displays the graph

“`

💡 Summary and Best Practices

By combining scraping, time-series storage, and advanced visualization, you transform raw performance numbers into actionable insights.

Automation is Key: Schedule the entire process (Collection $\rightarrow$ Storage) using a job scheduler like Cron (Linux) or Airflow.
Define Baselines: Always track against a clear baseline (e.g., the performance metrics collected the month before the major site redesign).
Correlate with Changes: When a metric changes dramatically, immediately cross-reference the timestamp with recent code deployments or content changes to determine root causes.
Focus on Trends: Don’t panic over a single bad score. Focus on the long-term trend line. Consistent improvement, even slight, indicates successful performance engineering efforts.

Post Views: 11

Art of SEO

How to Track Core Web Vitals Over Time with Python

Tracking Core Web Vitals Over Time with Python

⚙️ The Core Workflow

🐍 Phase 1: Data Collection (The Scraper)

Choosing Your Tool

Example: Basic Scraper Setup (Conceptual)

Note: Actual implementation depends heavily on the required metrics source (Lighthouse/API)

Example usage

data = collect_cwv_data(“https://example.com”)

print(data)

💾 Phase 2: Data Storage (The Time-Series Database)

Recommended Tools

Implementation using `psycopg2` (for PostgreSQL)

📊 Phase 3 & 4: Analysis and Visualization

Retrieving and Processing Data

Assuming you have a function to connect and query the DB

Get the data

Visualizing Trends with Plotly

💡 Summary and Best Practices

Tracking Core Web Vitals Over Time with Python

⚙️ The Core Workflow

🐍 Phase 1: Data Collection (The Scraper)

Choosing Your Tool

Example: Basic Scraper Setup (Conceptual)

Note: Actual implementation depends heavily on the required metrics source (Lighthouse/API)

Example usage

data = collect_cwv_data(“https://example.com”)

print(data)

💾 Phase 2: Data Storage (The Time-Series Database)

Recommended Tools

Implementation using psycopg2 (for PostgreSQL)

📊 Phase 3 & 4: Analysis and Visualization

Retrieving and Processing Data

Assuming you have a function to connect and query the DB

Get the data

Visualizing Trends with Plotly

💡 Summary and Best Practices

Implementation using `psycopg2` (for PostgreSQL)