Building AI Agents That React to Website Changes

Building AI Agents That React to Website Changes

Most AI agents are reactive. They answer questions, process requests, and follow instructions. But the most useful agents are proactive. They watch for changes in the world and take action before anyone asks.

Website changes are one of the richest signals an AI agent can monitor. A competitor changes their pricing. A regulatory body publishes new guidance. A documentation page gets updated. A product goes back in stock. Each of these events could trigger an agent to update a knowledge base, alert a team, adjust a strategy, or take automated action.

This tutorial shows how to build AI agents that react to website changes using PageCrawl webhooks. We will cover three practical patterns: keeping a RAG knowledge base fresh, competitive intelligence alerts, and compliance monitoring.

How It Works

The architecture is simple:

  1. PageCrawl monitors web pages on a schedule you configure (every minute to weekly)
  2. When content changes, PageCrawl detects the difference, generates an AI summary, and fires a webhook
  3. Your agent receives the webhook with structured change data (what changed, AI summary, diff, screenshots)
  4. Your agent takes action based on the change (update a database, send an alert, trigger a workflow)

You do not need to build any scraping, comparison, or scheduling logic. PageCrawl handles all of that. Your agent only needs an HTTP endpoint that processes incoming change events.

Setting Up Monitors via API

First, create monitors for the pages your agent should watch. You can do this via the PageCrawl API or the MCP server.

import requests

API_TOKEN = "your_api_token"
BASE_URL = "https://pagecrawl.io/api"
HEADERS = {"Authorization": f"Bearer {API_TOKEN}"}

# Monitor a documentation page
response = requests.post(
    f"{BASE_URL}/track-simple",
    headers=HEADERS,
    json={
        "url": "https://docs.example.com/api-reference",
        "tracking_mode": "content_only",
        "frequency": 60,  # hourly
    },
)
print(f"Monitor created: {response.json()['id']}")

Setting Up Webhooks

Create a webhook that PageCrawl will call when changes are detected:

response = requests.post(
    f"{BASE_URL}/hooks",
    headers=HEADERS,
    json={
        "target_url": "https://your-agent.example.com/webhook",
        "match_type": "all",
        "events": ["change_detected"],
        "payload_fields": [
            "title",
            "contents",
            "markdown_difference",
            "ai_summary",
            "ai_priority_score",
            "page",
        ],
    },
)
print(f"Webhook created: {response.json()['id']}")

The payload_fields parameter controls exactly what data your agent receives. Only request the fields you need to keep payloads lean.

Pattern 1: Keeping a RAG Knowledge Base Fresh

The most common problem with RAG pipelines is stale data. You ingest documentation once, and within weeks the source material has changed but your vector database still has the old content. Your AI gives wrong answers because it is working with outdated information.

PageCrawl webhooks solve this by triggering re-ingestion only when content actually changes.

Python Implementation

from flask import Flask, request
import openai
import chromadb

app = Flask(__name__)
chroma = chromadb.HttpClient()
collection = chroma.get_or_create_collection("docs")

@app.route("/webhook", methods=["POST"])
def handle_change():
    data = request.json

    page_url = data["page"]["url"]
    new_content = data["contents"]
    ai_summary = data.get("ai_summary", "")

    # Generate embedding for the updated content
    embedding = openai.embeddings.create(
        model="text-embedding-3-small",
        input=new_content,
    )

    # Upsert into vector database
    collection.upsert(
        ids=[page_url],
        documents=[new_content],
        embeddings=[embedding.data[0].embedding],
        metadatas=[{
            "url": page_url,
            "title": data["title"],
            "last_updated": data["page"].get("last_checked_at", ""),
            "change_summary": ai_summary,
        }],
    )

    print(f"Updated RAG entry: {data['title']}")
    print(f"Change: {ai_summary}")

    return "", 200

Node.js Implementation

import express from "express";
import { OpenAI } from "openai";
import { ChromaClient } from "chromadb";

const app = express();
app.use(express.json());

const openai = new OpenAI();
const chroma = new ChromaClient();

app.post("/webhook", async (req, res) => {
  const data = req.body;
  const pageUrl = data.page.url;
  const newContent = data.contents;

  const embedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: newContent,
  });

  const collection = await chroma.getOrCreateCollection({ name: "docs" });
  await collection.upsert({
    ids: [pageUrl],
    documents: [newContent],
    embeddings: [embedding.data[0].embedding],
    metadatas: [{
      url: pageUrl,
      title: data.title,
      changeSummary: data.ai_summary || "",
    }],
  });

  console.log(`Updated: ${data.title}`);
  res.sendStatus(200);
});

app.listen(3000);

This agent only processes pages that actually changed. If you monitor 500 documentation pages and only 3 change in a week, you re-embed 3 documents, not 500.

Pattern 2: Competitive Intelligence Agent

An agent that monitors competitor websites and generates structured intelligence reports when changes are detected.

from flask import Flask, request
from openai import OpenAI

app = Flask(__name__)
llm = OpenAI()  # Or any LLM provider of your choice

@app.route("/webhook", methods=["POST"])
def handle_change():
    data = request.json

    # Use the AI summary and diff to generate intelligence
    analysis = llm.chat.completions.create(
        model="gpt-4o",  # Use any model you prefer
        messages=[{
            "role": "user",
            "content": f"""Analyze this competitor website change and provide a structured intelligence brief.

Page: {data['title']} ({data['page']['url']})
AI Summary of Change: {data.get('ai_summary', 'N/A')}
Priority Score: {data.get('ai_priority_score', 'N/A')}/100
Diff: {data.get('markdown_difference', 'N/A')}

Provide:
1. What changed (one sentence)
2. Strategic significance (low/medium/high and why)
3. Recommended action for our team
4. Which internal team should be notified (sales, product, marketing, legal)"""
        }],
    )

    brief = analysis.choices[0].message.content

    # Send to Slack, email, or your internal tool
    send_to_slack(
        channel="#competitive-intel",
        text=f"*Competitor Change Detected*\n"
             f"*Page:* {data['title']}\n"
             f"*URL:* {data['page']['url']}\n\n"
             f"{brief}",
    )

    return "", 200

This agent adds a layer of strategic analysis on top of PageCrawl's change detection. PageCrawl tells you what changed, the agent tells you what it means.

Pattern 3: Compliance Monitoring Agent

An agent that watches regulatory and policy pages, extracts specific changes to compliance-relevant sections, and creates structured audit records.

from flask import Flask, request
from openai import OpenAI
import datetime

app = Flask(__name__)
llm = OpenAI()  # Or any LLM provider of your choice

@app.route("/webhook", methods=["POST"])
def handle_change():
    data = request.json

    # Only process changes above a priority threshold
    priority = data.get("ai_priority_score", 0)
    if priority < 30:
        return "", 200  # Skip low-priority noise

    # Extract compliance-relevant changes
    extraction = llm.chat.completions.create(
        model="gpt-4o",  # Use any model you prefer
        messages=[{
            "role": "user",
            "content": f"""Review this regulatory page change for compliance impact.

Page: {data['title']} ({data['page']['url']})
Change Summary: {data.get('ai_summary', 'N/A')}
Full Diff: {data.get('markdown_difference', 'N/A')}

Extract:
1. Which specific sections or clauses changed
2. Whether any deadlines, requirements, or obligations were added, modified, or removed
3. Compliance impact assessment (none/low/medium/high/critical)
4. Whether this requires immediate action or can be reviewed in the next cycle
5. Affected business areas

Return as JSON."""
        }],
    )

    # Store in compliance audit log
    save_audit_record({
        "timestamp": datetime.datetime.utcnow().isoformat(),
        "source_url": data["page"]["url"],
        "source_title": data["title"],
        "change_summary": data.get("ai_summary"),
        "priority_score": priority,
        "compliance_analysis": extraction.choices[0].message.content,
    })

    return "", 200

Using the MCP Server with AI Assistants

If you use Claude, ChatGPT, or Cursor, you can manage monitors directly through conversation. The PageCrawl MCP server exposes tools that AI assistants can call.

For example, in Claude:

  • "Monitor example.com/pricing and check every hour"
  • "What changed on my monitored pages today?"
  • "Show me the diff for the last change on the competitor pricing page"

The get-changes-since tool is particularly useful for agents that need a cross-monitor view of all recent changes.

Webhook Payload Reference

Key fields available in webhook payloads:

Field Description
title Monitor name
contents Current value of the tracked element
markdown_difference Diff in markdown format (additions/removals)
ai_summary AI-generated plain-language summary
ai_priority_score 0-100 importance score
page Monitor metadata (id, name, url, slug, folder)
page_elements All tracked elements with current values
previous_check Full data from the previous check for comparison
page_screenshot_image Signed URL to the page screenshot

Customize which fields are included when creating webhooks via the payload_fields parameter. See the full API reference for all options.

Getting Started

Start with one pattern. Pick the 5 pages that matter most to your use case, set up monitors, and build a simple webhook handler. Run it for a week and observe what changes come through.

Most teams find that the combination of PageCrawl's change detection and a lightweight AI agent creates more value than either tool alone. The monitoring catches every change automatically, and the agent adds context, analysis, and automated action.

PageCrawl was built with developers in mind from day one. The REST API, webhooks, and MCP server are first-class features, not add-ons. The free tier includes 6 monitors with webhooks and AI summaries, so you can prototype your agent without any upfront cost.

Last updated: 18 May, 2026

Get Started with PageCrawl.io

Start monitoring website changes in under 60 seconds. Join thousands of users who never miss important updates. No credit card required.

Go to dashboard