Deprecated as of 10.7.0. highlight(lang, code, ...args) has been deprecated. Deprecated as of 10.7.0. Please use highlight(code, options) instead. https://github.com/highlightjs/highlight.js/issues/2277 "\n How to Integrate PageCrawl.io with Home Assistant: Complete Guide to Web Change Detection Automation | PageCrawl.io\n \n \n \n \n

How to Integrate PageCrawl.io with Home Assistant: Complete Guide to Web Change Detection Automation

\"How

Home Assistant users love automation. Whether it's turning on lights when you arrive home or adjusting the thermostat based on weather forecasts, the power of Home Assistant lies in connecting different services and triggering actions automatically.

\n

But what about monitoring websites? Home Assistant has a built-in scrape sensor that works great for simple, static HTML pages. However, many sites you actually want to monitor—Raspberry Pi stock on rpilocator.com, Ubiquiti UniFi gear, sneaker drops, protected member portals—require something more powerful.

\n

Why use PageCrawl.io instead of the built-in scrape sensor?

\n
    \n
  • Real browser rendering — Modern sites load content via JavaScript. The scrape sensor only sees raw HTML, missing dynamic prices, stock status, and interactive elements. PageCrawl uses a real browser.
  • \n
  • Anti-bot protection bypass — Sites like Ubiquiti, NVIDIA, and many retailers use Cloudflare, DataDome, or similar protection that blocks simple HTTP requests. PageCrawl handles this automatically.
  • \n
  • Don't get your IP blocked — Scraping from your home IP can get you blocked or rate-limited. PageCrawl uses rotating proxies and residential IPs.
  • \n
  • Don't overload servers — Running frequent checks from your HA instance puts load on both your server and the target site. PageCrawl manages check frequency and caching intelligently.
  • \n
  • Login authentication — Monitor pages behind login walls using browser-based authentication that the scrape sensor can't handle.
  • \n
  • AI-powered summaries — Get intelligent change summaries instead of raw diffs.
  • \n
  • Visual screenshots — See exactly what changed with before/after screenshots.
  • \n
\n

PageCrawl.io is a web monitoring and change detection service built for the sites that are hard to monitor. In this guide, we'll show you how to integrate PageCrawl with Home Assistant using two methods:

\n
    \n
  1. Webhooks (Push) — PageCrawl sends notifications directly to Home Assistant when changes are detected. Available on the free plan.
  2. \n
  3. REST API Polling (Pull) — Home Assistant periodically queries PageCrawl for status updates. Requires paid plan.
  4. \n
\n

Both approaches have their use cases, and we'll cover everything from basic setup to advanced automation examples.

\n

Prerequisites

Before you begin, you'll need:

\n
    \n
  • A PageCrawl.io account with at least one monitored page
  • \n
  • Home Assistant installed and accessible (local or Nabu Casa cloud)
  • \n
  • Basic familiarity with Home Assistant YAML configuration
  • \n
  • For webhooks: A way to expose Home Assistant to the internet (Nabu Casa, reverse proxy, or Cloudflare Tunnel)
  • \n
  • For REST API polling: A paid PageCrawl plan (API access is not available on the free tier)
  • \n
\n

Method 1: Webhook Integration (Recommended)

Webhooks are the recommended approach for real-time notifications. When PageCrawl detects a change on your monitored page, it immediately sends an HTTP POST request to your Home Assistant instance with all the change details.

\n

Basic Webhook Setup

Step 1: Create a Webhook Automation in Home Assistant

First, create an automation that listens for incoming webhooks. Add this to your automations.yaml or create it through the UI:

\n
automation:\n  - id: pagecrawl_change_detected\n    alias: "PageCrawl - Change Detected"\n    description: "Triggered when PageCrawl detects a website change"\n    trigger:\n      - platform: webhook\n        webhook_id: pagecrawl_change_notification\n        allowed_methods:\n          - POST\n        local_only: false\n    action:\n      - service: notify.persistent_notification\n        data:\n          title: "Website Changed: {{ trigger.json.title }}"\n          message: >\n            {{ trigger.json.page.name }} has changed!\n\n            URL: {{ trigger.json.page.url }}\n            Changed at: {{ trigger.json.changed_at }}\n            Difference: {{ trigger.json.human_difference }}\n            {% if trigger.json.ai_summary %}\n\n            AI Summary: {{ trigger.json.ai_summary }}\n            {% endif %}

Step 2: Get Your Webhook URL

Your webhook URL will be in one of these formats:

\n

With Nabu Casa:

\n
https://YOUR_NABU_CASA_URL.ui.nabu.casa/api/webhook/pagecrawl_change_notification

With external access (reverse proxy, Cloudflare Tunnel, etc.):

\n
https://your-homeassistant-domain.com/api/webhook/pagecrawl_change_notification

Important: Since PageCrawl is a cloud-hosted service, your Home Assistant instance must be accessible from the internet to receive webhooks. Local-only URLs (like homeassistant.local) will not work.

\n

Step 3: Configure the Webhook in PageCrawl

    \n
  1. Log in to PageCrawl.io
  2. \n
  3. Go to Settings → Webhooks
  4. \n
  5. Click Add Webhook
  6. \n
  7. Enter your Home Assistant webhook URL
  8. \n
  9. Select the pages you want to trigger this webhook
  10. \n
  11. Choose which fields to include in the payload
  12. \n
  13. Save and test the webhook
  14. \n
\n

Webhook Payload Structure

PageCrawl sends a comprehensive JSON payload with each webhook. Here's the complete structure:

\n
{\n  "id": 12345,\n  "title": "Product Price Monitor",\n  "status": "ok",\n  "content_type": "text/html",\n  "visual_diff": 15,\n  "changed_at": "2024-01-15T10:30:00Z",\n\n  "contents": "Current page content...",\n  "original": "Previous page content...",\n  "difference": 25.5,\n  "human_difference": "Medium change detected",\n  "elements": 5,\n\n  "page_screenshot_image": "https://pagecrawl.io/api/webhook/...",\n  "text_difference_image": "https://pagecrawl.io/api/webhook/...",\n  "html_difference": "<div class='diff'>...</div>",\n  "markdown_difference": "## Changes\\n- Price changed from $99 to $79",\n\n  "page": {\n    "id": 123,\n    "name": "Amazon Product Page",\n    "url": "https://amazon.com/product/...",\n    "url_tld": "amazon.com",\n    "slug": "amazon-product-page",\n    "link": "https://pagecrawl.io/app/pages/amazon-product-page",\n    "folder": "Price Monitors"\n  },\n\n  "page_elements": [\n    {\n      "id": 456,\n      "label": "Price",\n      "type": "css",\n      "contents": "$79.99",\n      "original": "$99.99",\n      "difference": 20.0,\n      "human_difference": "Significant change",\n      "changed": true\n    }\n  ],\n\n  "ai_summary": "The product price dropped from $99.99 to $79.99, a 20% discount",\n  "ai_priority_score": 85.5,\n\n  "previous_check": {\n    "id": 12344,\n    "changed_at": "2024-01-14T10:30:00Z",\n    "contents": "...",\n    "difference": 5.2\n  }\n}

Key Fields for Automations

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
FieldDescriptionUse Case
titlePage nameNotification titles
statusPage status (ok, error)Error alerting
differenceChange percentage (0-100)Threshold-based triggers
human_differenceReadable change descriptionNotifications
ai_summaryAI-generated change summarySmart notifications
ai_priority_scoreImportance score (0-100)Priority filtering
page.urlMonitored URLDeep linking
page_elements[].contentsSpecific tracked element valuePrice/stock monitoring
changed_atISO 8601 timestampTime-based logic
\n

Advanced Webhook Automations

Price Drop Alert with Mobile Notification

automation:\n  - id: pagecrawl_price_drop_alert\n    alias: "PageCrawl - Price Drop Alert"\n    trigger:\n      - platform: webhook\n        webhook_id: pagecrawl_change_notification\n        allowed_methods:\n          - POST\n        local_only: false\n    condition:\n      # Only trigger for significant changes with high AI priority\n      - condition: template\n        value_template: "{{ trigger.json.ai_priority_score | float > 70 }}"\n      - condition: template\n        value_template: "{{ trigger.json.difference | float > 10 }}"\n    action:\n      - service: notify.mobile_app_your_phone\n        data:\n          title: "Price Alert: {{ trigger.json.title }}"\n          message: "{{ trigger.json.ai_summary | default('Change detected') }}"\n          data:\n            url: "{{ trigger.json.page.url }}"\n            clickAction: "{{ trigger.json.page.url }}"\n            image: "{{ trigger.json.page_screenshot_image }}"\n            priority: high\n            ttl: 0\n            actions:\n              - action: OPEN_URL\n                title: "View Page"\n                uri: "{{ trigger.json.page.url }}"\n              - action: VIEW_PAGECRAWL\n                title: "View in PageCrawl"\n                uri: "{{ trigger.json.page.link }}"

Raspberry Pi Stock Alert (rpilocator.com)

Monitor rpilocator.com for Raspberry Pi 5 availability and get instant alerts when stock appears:

\n
automation:\n  - id: pagecrawl_rpi_stock_alert\n    alias: "PageCrawl - Raspberry Pi Back in Stock"\n    trigger:\n      - platform: webhook\n        webhook_id: pagecrawl_rpi_stock\n        allowed_methods:\n          - POST\n        local_only: false\n    condition:\n      - condition: template\n        value_template: >\n          {% set content = trigger.json.contents | lower %}\n          {{ 'in stock' in content or 'add to cart' in content or 'buy now' in content }}\n    action:\n      - service: notify.all_devices\n        data:\n          title: "Raspberry Pi In Stock!"\n          message: "{{ trigger.json.title }} is available! {{ trigger.json.page.url }}"\n      - service: light.turn_on\n        target:\n          entity_id: light.office_alert\n        data:\n          color_name: green\n          brightness: 255\n      - delay:\n          seconds: 30\n      - service: light.turn_off\n        target:\n          entity_id: light.office_alert

Multi-Page Router with Different Actions

Route different monitors to different actions based on the page folder or URL:

\n
automation:\n  - id: pagecrawl_router\n    alias: "PageCrawl - Notification Router"\n    trigger:\n      - platform: webhook\n        webhook_id: pagecrawl_change_notification\n        allowed_methods:\n          - POST\n        local_only: false\n    action:\n      - choose:\n          # Home Assistant / ESPHome / Zigbee2mqtt releases (GitHub)\n          - conditions:\n              - condition: template\n                value_template: "{{ 'github.com' in trigger.json.page.url }}"\n            sequence:\n              - service: notify.mobile_app_phone\n                data:\n                  title: "New Release: {{ trigger.json.title }}"\n                  message: "{{ trigger.json.ai_summary | default('New version available') }}"\n                  data:\n                    url: "{{ trigger.json.page.url }}"\n                    priority: high\n\n          # Hardware stock alerts (rpilocator, Ubiquiti store, ThePiHut)\n          - conditions:\n              - condition: template\n                value_template: "{{ trigger.json.page.folder == 'Stock Alerts' }}"\n            sequence:\n              - service: notify.all_devices\n                data:\n                  title: "In Stock: {{ trigger.json.title }}"\n                  message: "{{ trigger.json.page.url }}"\n              - service: light.turn_on\n                target:\n                  entity_id: light.office_notification\n                data:\n                  color_name: green\n                  brightness: 255\n\n          # Firmware updates (Shelly, Tasmota)\n          - conditions:\n              - condition: template\n                value_template: "{{ trigger.json.page.folder == 'Firmware' }}"\n            sequence:\n              - service: tts.speak\n                target:\n                  entity_id: tts.home_assistant_cloud\n                data:\n                  message: "Firmware update available for {{ trigger.json.title }}"\n                  media_player_entity_id: media_player.office_speaker\n\n        # Default action for any other page\n        default:\n          - service: notify.persistent_notification\n            data:\n              title: "PageCrawl: {{ trigger.json.title }}"\n              message: "{{ trigger.json.human_difference }}"

Track Changes in Input Text Helper

automation:\n  - id: pagecrawl_store_price\n    alias: "PageCrawl - Store Latest Price"\n    trigger:\n      - platform: webhook\n        webhook_id: pagecrawl_price_tracker\n        allowed_methods:\n          - POST\n        local_only: false\n    action:\n      - service: input_text.set_value\n        target:\n          entity_id: input_text.tracked_product_price\n        data:\n          value: "{{ trigger.json.page_elements[0].contents }}"\n      - service: input_datetime.set_datetime\n        target:\n          entity_id: input_datetime.price_last_updated\n        data:\n          datetime: "{{ now().strftime('%Y-%m-%d %H:%M:%S') }}"

Method 2: REST API Polling

REST API polling is useful when you need to:

\n
    \n
  • Check status on a schedule regardless of changes
  • \n
  • Monitor page health and error states
  • \n
  • Build dashboard sensors
  • \n
  • Integrate with systems that can't receive webhooks
  • \n
\n

Getting Your API Token

    \n
  1. Log in to PageCrawl.io
  2. \n
  3. Navigate to SettingsAPI
  4. \n
  5. Copy your API token (or generate a new one)
  6. \n
\n

Your API token is a 60-character string that authenticates your requests.

\n

Basic REST Sensor Setup

Add this to your configuration.yaml:

\n
rest:\n  - resource: "https://pagecrawl.io/api/changes/YOUR_PAGE_ID"\n    scan_interval: 300  # Poll every 5 minutes\n    headers:\n      Authorization: "Bearer YOUR_API_TOKEN"\n      Accept: "application/json"\n    sensor:\n      - name: "PageCrawl Example Page"\n        unique_id: "pagecrawl_example_page_status"\n        value_template: >\n          {% if value_json.disabled %}\n            disabled\n          {% elif value_json.unseen > 0 %}\n            changed\n          {% elif value_json.failed > 0 %}\n            error\n          {% else %}\n            ok\n          {% endif %}\n        json_attributes:\n          - name\n          - url\n          - last_checked_at\n          - unseen\n          - failed\n          - disabled\n          - frequency

Advanced REST Sensors

Complete Page Status with All Attributes

rest:\n  - resource: "https://pagecrawl.io/api/changes/123"\n    scan_interval: 300\n    headers:\n      Authorization: "Bearer YOUR_API_TOKEN"\n      Accept: "application/json"\n    sensor:\n      - name: "Product Monitor Status"\n        unique_id: "pagecrawl_product_monitor"\n        icon: mdi:web-sync\n        value_template: >\n          {% if value_json.disabled %}disabled\n          {% elif value_json.pending %}checking\n          {% elif value_json.unseen > 0 %}changed\n          {% elif value_json.failed > 0 %}error\n          {% else %}ok{% endif %}\n        json_attributes:\n          - id\n          - name\n          - url\n          - status\n          - last_checked_at\n          - unseen\n          - failed\n          - disabled\n          - pending\n          - frequency\n          - screenshots\n          - latest\n\n    binary_sensor:\n      - name: "Product Monitor Has Changes"\n        unique_id: "pagecrawl_product_monitor_changed"\n        icon: mdi:alert-circle\n        value_template: "{{ value_json.unseen > 0 }}"\n        device_class: problem\n\n      - name: "Product Monitor Has Errors"\n        unique_id: "pagecrawl_product_monitor_error"\n        icon: mdi:alert\n        value_template: "{{ value_json.failed > 0 }}"\n        device_class: problem\n\n      - name: "Product Monitor Active"\n        unique_id: "pagecrawl_product_monitor_active"\n        icon: mdi:eye\n        value_template: "{{ not value_json.disabled }}"

Latest Check Details with AI Summary

rest:\n  - resource: "https://pagecrawl.io/api/changes/123/zapier/poll"\n    scan_interval: 300\n    headers:\n      Authorization: "Bearer YOUR_API_TOKEN"\n      Accept: "application/json"\n    sensor:\n      - name: "Product Monitor Latest Check"\n        unique_id: "pagecrawl_product_latest"\n        icon: mdi:clipboard-text-clock\n        value_template: "{{ value_json[0].id }}"\n        json_attributes_path: "$[0]"\n        json_attributes:\n          - title\n          - status\n          - changed_at\n          - difference\n          - human_difference\n          - ai_summary\n          - ai_priority_score\n          - contents\n          - page\n\n      - name: "Product Monitor AI Priority"\n        unique_id: "pagecrawl_product_ai_priority"\n        icon: mdi:robot\n        unit_of_measurement: "%"\n        value_template: "{{ value_json[0].ai_priority_score | float(0) | round(1) }}"\n\n      - name: "Product Monitor Difference"\n        unique_id: "pagecrawl_product_difference"\n        icon: mdi:compare\n        unit_of_measurement: "%"\n        value_template: "{{ value_json[0].difference | float(0) | round(1) }}"

Extract Specific Element Value (e.g., Price)

rest:\n  - resource: "https://pagecrawl.io/api/changes/123/zapier/poll"\n    scan_interval: 600\n    headers:\n      Authorization: "Bearer YOUR_API_TOKEN"\n    sensor:\n      - name: "Tracked Product Price"\n        unique_id: "pagecrawl_product_price"\n        icon: mdi:currency-usd\n        unit_of_measurement: "$"\n        value_template: >\n          {% set price_element = value_json[0].page_elements | selectattr('label', 'equalto', 'Price') | first %}\n          {% if price_element %}\n            {{ price_element.contents | regex_replace('[^0-9.]', '') | float(0) }}\n          {% else %}\n            unknown\n          {% endif %}\n        json_attributes_path: "$[0]"\n        json_attributes:\n          - changed_at\n          - ai_summary

Monitoring Multiple Pages

Template for Multiple Pages

Monitor multiple tinkerer-relevant pages with YAML anchors to reduce duplication:

\n
rest:\n  # Home Assistant Releases (GitHub)\n  - resource: "https://pagecrawl.io/api/changes/101"\n    scan_interval: 300\n    headers:\n      Authorization: !secret pagecrawl_api_token\n    sensor:\n      - name: "HA Releases Monitor"\n        unique_id: "pagecrawl_ha_releases"\n        icon: mdi:home-assistant\n        value_template: &status_template >\n          {% if value_json.disabled %}disabled\n          {% elif value_json.unseen > 0 %}changed\n          {% elif value_json.failed > 0 %}error\n          {% else %}ok{% endif %}\n        json_attributes: &common_attributes\n          - name\n          - url\n          - last_checked_at\n          - unseen\n          - failed\n\n  # Raspberry Pi Stock (rpilocator.com)\n  - resource: "https://pagecrawl.io/api/changes/102"\n    scan_interval: 300\n    headers:\n      Authorization: !secret pagecrawl_api_token\n    sensor:\n      - name: "RPi Stock Monitor"\n        unique_id: "pagecrawl_rpi_stock"\n        icon: mdi:raspberry-pi\n        value_template: *status_template\n        json_attributes: *common_attributes\n\n  # Zigbee2mqtt Releases (GitHub)\n  - resource: "https://pagecrawl.io/api/changes/103"\n    scan_interval: 600\n    headers:\n      Authorization: !secret pagecrawl_api_token\n    sensor:\n      - name: "Z2M Releases Monitor"\n        unique_id: "pagecrawl_z2m_releases"\n        icon: mdi:zigbee\n        value_template: *status_template\n        json_attributes: *common_attributes

Store your API token in secrets.yaml:

\n
pagecrawl_api_token: "Bearer your_60_character_api_token_here"

Dashboard Card for Multiple Monitors

Create a custom dashboard card showing all your monitors:

\n
type: entities\ntitle: Tinkerer Monitors\nentities:\n  - entity: sensor.ha_releases_monitor\n    name: Home Assistant Releases\n    secondary_info: last-changed\n  - entity: sensor.rpi_stock_monitor\n    name: Raspberry Pi Stock\n    secondary_info: last-changed\n  - entity: sensor.z2m_releases_monitor\n    name: Zigbee2mqtt Releases\n    secondary_info: last-changed

Or use a more visual approach with conditional formatting:

\n
type: custom:auto-entities\ncard:\n  type: glance\n  title: PageCrawl Monitors\nfilter:\n  include:\n    - entity_id: sensor.pagecrawl_*\n      options:\n        tap_action:\n          action: url\n          url_path: "{{ state_attr(config.entity, 'url') }}"

Advanced Automation Examples

Price History Tracking with Statistics

sensor:\n  - platform: statistics\n    name: "Product Price Stats"\n    entity_id: sensor.tracked_product_price\n    state_characteristic: mean\n    max_age:\n      days: 30\n\n  - platform: statistics\n    name: "Product Price Min 30d"\n    entity_id: sensor.tracked_product_price\n    state_characteristic: value_min\n    max_age:\n      days: 30\n\nautomation:\n  - id: pagecrawl_price_at_30day_low\n    alias: "PageCrawl - Price at 30-Day Low"\n    trigger:\n      - platform: template\n        value_template: >\n          {{ states('sensor.tracked_product_price') | float ==\n             states('sensor.product_price_min_30d') | float }}\n    action:\n      - service: notify.mobile_app_phone\n        data:\n          title: "30-Day Low Price!"\n          message: >\n            {{ state_attr('sensor.tracked_product_price', 'name') }}\n            is at its lowest price in 30 days: ${{ states('sensor.tracked_product_price') }}

Combine Webhook and Polling for Reliability

# REST sensor for baseline status\nrest:\n  - resource: "https://pagecrawl.io/api/changes/123"\n    scan_interval: 900  # 15 minutes as backup\n    headers:\n      Authorization: !secret pagecrawl_api_token\n    sensor:\n      - name: "Monitor Backup Status"\n        unique_id: "pagecrawl_backup_sensor"\n        value_template: "{{ value_json.unseen }}"\n\n# Webhook for real-time notifications\nautomation:\n  - id: pagecrawl_webhook_handler\n    alias: "PageCrawl - Webhook Handler"\n    trigger:\n      - platform: webhook\n        webhook_id: pagecrawl_notification\n        allowed_methods:\n          - POST\n    action:\n      - service: input_number.set_value\n        target:\n          entity_id: input_number.pagecrawl_unseen_count\n        data:\n          value: "{{ trigger.json.unseen | default(1) }}"\n      - service: notify.mobile_app_phone\n        data:\n          title: "{{ trigger.json.title }}"\n          message: "{{ trigger.json.ai_summary | default(trigger.json.human_difference) }}"\n\n# Alert if no updates received for too long\n  - id: pagecrawl_stale_check\n    alias: "PageCrawl - Stale Data Alert"\n    trigger:\n      - platform: template\n        value_template: >\n          {{ (as_timestamp(now()) - as_timestamp(state_attr('sensor.monitor_backup_status', 'last_checked_at'))) > 7200 }}\n    action:\n      - service: notify.admin\n        data:\n          title: "PageCrawl Monitor Stale"\n          message: "No check received in over 2 hours"

Trigger Manual Check via Home Assistant

rest_command:\n  pagecrawl_trigger_check:\n    url: "https://pagecrawl.io/api/changes/{{ page_id }}/check"\n    method: PUT\n    headers:\n      Authorization: !secret pagecrawl_api_token\n      Content-Type: "application/json"\n\nscript:\n  check_all_monitors:\n    alias: "Trigger All PageCrawl Checks"\n    sequence:\n      - service: rest_command.pagecrawl_trigger_check\n        data:\n          page_id: 101\n      - delay:\n          seconds: 2\n      - service: rest_command.pagecrawl_trigger_check\n        data:\n          page_id: 102\n      - delay:\n          seconds: 2\n      - service: rest_command.pagecrawl_trigger_check\n        data:\n          page_id: 103

Visual Dashboard with Change Screenshots

# Download and cache screenshot when change detected\nautomation:\n  - id: pagecrawl_cache_screenshot\n    alias: "PageCrawl - Cache Screenshot"\n    trigger:\n      - platform: webhook\n        webhook_id: pagecrawl_notification\n        allowed_methods:\n          - POST\n    condition:\n      - condition: template\n        value_template: "{{ trigger.json.page_screenshot_image is defined }}"\n    action:\n      - service: downloader.download_file\n        data:\n          url: "{{ trigger.json.page_screenshot_image }}"\n          filename: "pagecrawl_latest_{{ trigger.json.page.id }}.png"\n          overwrite: true\n\n# Display in dashboard using local image\ncamera:\n  - platform: local_file\n    name: "PageCrawl Latest Screenshot"\n    file_path: /config/downloads/pagecrawl_latest_123.png

Troubleshooting

Webhook Issues

Problem: Webhooks not arriving

\n
    \n
  • Verify your Home Assistant is accessible from the internet
  • \n
  • Check the webhook URL is correctly formatted
  • \n
  • Test with curl:
    curl -X POST https://your-ha-url/api/webhook/pagecrawl_test \\\n  -H "Content-Type: application/json" \\\n  -d '{"test": "data"}'
  • \n
  • Check Home Assistant logs for webhook errors
  • \n
\n

Problem: Webhook automation not triggering

\n
    \n
  • Ensure local_only: false is set in your trigger
  • \n
  • Verify the webhook_id matches exactly
  • \n
  • Check that allowed_methods includes POST
  • \n
\n

REST API Issues

Problem: Authentication errors (401)

\n
    \n
  • Verify your API token is correct
  • \n
  • Ensure the Authorization header format is Bearer YOUR_TOKEN (with space)
  • \n
  • Check the token hasn't been rotated
  • \n
\n

Problem: Empty or null values

\n
    \n
  • Use default filters in templates: {{ value_json.field | default('N/A') }}
  • \n
  • Check if the page ID/slug exists in your PageCrawl account
  • \n
\n

Problem: Rate limiting (429)

\n
    \n
  • Increase scan_interval to reduce polling frequency
  • \n
  • Recommended: 300 seconds (5 minutes) minimum
  • \n
\n

Template Errors

Problem: Template errors in value_template

\n
    \n
  • Use the Template Editor in Developer Tools to test
  • \n
  • Wrap in {% if %} checks for optional fields
  • \n
  • Use | default() filter for nullable values
  • \n
\n

When to Use PageCrawl vs. Native HA Sensors

Use HA's built-in scrape or rest sensors when:

\n
    \n
  • The site is simple, static HTML without JavaScript
  • \n
  • No bot protection (Cloudflare, etc.)
  • \n
  • Public JSON APIs with stable endpoints
  • \n
  • GitHub Atom feeds (/releases.atom)
  • \n
  • You don't mind your home IP making requests
  • \n
\n

Use PageCrawl when the site is hard to monitor:

\n

Protected E-commerce & Stock Alerts

These sites actively block scrapers and need real browser rendering:

\n
    \n
  • Ubiquiti Store — Heavy Cloudflare protection, dynamic inventory
  • \n
  • NVIDIA GeForce Store — Bot detection, JavaScript-loaded stock status
  • \n
  • rpilocator.com — Aggregates Raspberry Pi stock across retailers
  • \n
  • Sneaker drops (Nike SNKRS, Shopify stores) — Aggressive bot protection
  • \n
  • Limited releases — PS5/Xbox restocks, GPU drops
  • \n
\n

Behind-Login Content

PageCrawl can authenticate and monitor pages that require login:

\n
    \n
  • Company career pages — New job postings behind applicant portals
  • \n
  • Customer portals — Order status, account changes
  • \n
  • School/University portals — Grades, schedules, announcements
  • \n
  • ISP status dashboards — Outage info requiring login
  • \n
  • Member-only forums — Specific threads or categories
  • \n
\n

Use PageCrawl's browser steps feature to log in before capturing content.

\n

JavaScript-Heavy Sites

Modern sites that load content dynamically after page load:

\n
    \n
  • Single-page applications (React, Vue, Angular sites)
  • \n
  • Lazy-loaded pricing — Price appears after JavaScript executes
  • \n
  • Interactive dashboards — Content loaded via API calls
  • \n
  • Infinite scroll pages — Content not in initial HTML
  • \n
\n

Sites You Don't Want to Overload

Avoid hammering servers from your home IP or getting blocked:

\n
    \n
  • Small business websites — Don't overload their hosting
  • \n
  • Government portals — Rate limiting concerns
  • \n
  • Competitor monitoring — Don't reveal your IP
  • \n
  • Frequent checks — Let PageCrawl manage the rate limiting
  • \n
\n

Conclusion

Integrating PageCrawl with Home Assistant opens up powerful automation possibilities for tinkerers and home automation enthusiasts.

\n

Webhooks provide instant notifications with rich payload data—perfect for time-sensitive alerts like Raspberry Pi stock drops or Ubiquiti UniFi availability. REST API polling gives you dashboard-ready sensors for monitoring Home Assistant releases, Zigbee2mqtt updates, or Shelly firmware changes.

\n

Whether you're tracking hardware availability on rpilocator.com, monitoring GitHub releases for your favorite smart home projects, watching for deals on Zigbee devices, or keeping tabs on any web content that matters to your setup, this integration brings web change detection into your smart home ecosystem.

\n

Related Resources

\n

Keywords: Home Assistant webhook integration, web change detection, website monitoring automation, PageCrawl Home Assistant, REST API sensor, smart home website alerts, price drop notification Home Assistant, web scraping automation

\n
Last updated: 12 January, 2026
"