Home Assistant users love automation. Whether it's turning on lights when you arrive home or adjusting the thermostat based on weather forecasts, the power of Home Assistant lies in connecting different services and triggering actions automatically.
But what about monitoring websites? Home Assistant has a built-in scrape sensor that works great for simple, static HTML pages. However, many sites you actually want to monitor—Raspberry Pi stock on rpilocator.com, Ubiquiti UniFi gear, sneaker drops, protected member portals—require something more powerful.
Why use PageCrawl.io instead of the built-in scrape sensor?
- Real browser rendering — Modern sites load content via JavaScript. The scrape sensor only sees raw HTML, missing dynamic prices, stock status, and interactive elements. PageCrawl uses a real browser.
- Anti-bot protection bypass — Sites like Ubiquiti, NVIDIA, and many retailers use Cloudflare, DataDome, or similar protection that blocks simple HTTP requests. PageCrawl handles this automatically.
- Don't get your IP blocked — Scraping from your home IP can get you blocked or rate-limited. PageCrawl uses rotating proxies and residential IPs.
- Don't overload servers — Running frequent checks from your HA instance puts load on both your server and the target site. PageCrawl manages check frequency and caching intelligently.
- Login authentication — Monitor pages behind login walls using browser-based authentication that the scrape sensor can't handle.
- AI-powered summaries — Get intelligent change summaries instead of raw diffs.
- Visual screenshots — See exactly what changed with before/after screenshots.
PageCrawl.io is a web monitoring and change detection service built for the sites that are hard to monitor. In this guide, we'll show you how to integrate PageCrawl with Home Assistant using two methods:
- Webhooks (Push) — PageCrawl sends notifications directly to Home Assistant when changes are detected. Available on the free plan.
- REST API Polling (Pull) — Home Assistant periodically queries PageCrawl for status updates. Requires paid plan.
Both approaches have their use cases, and we'll cover everything from basic setup to advanced automation examples.
Prerequisites
Before you begin, you'll need:
- A PageCrawl.io account with at least one monitored page
- Home Assistant installed and accessible (local or Nabu Casa cloud)
- Basic familiarity with Home Assistant YAML configuration
- For webhooks: A way to expose Home Assistant to the internet (Nabu Casa, reverse proxy, or Cloudflare Tunnel)
- For REST API polling: A paid PageCrawl plan (API access is not available on the free tier)
Method 1: Webhook Integration (Recommended)
Webhooks are the recommended approach for real-time notifications. When PageCrawl detects a change on your monitored page, it immediately sends an HTTP POST request to your Home Assistant instance with all the change details.
Basic Webhook Setup
Step 1: Create a Webhook Automation in Home Assistant
First, create an automation that listens for incoming webhooks. Add this to your automations.yaml or create it through the UI:
automation:
- id: pagecrawl_change_detected
alias: "PageCrawl - Change Detected"
description: "Triggered when PageCrawl detects a website change"
trigger:
- platform: webhook
webhook_id: pagecrawl_change_notification
allowed_methods:
- POST
local_only: false
action:
- service: notify.persistent_notification
data:
title: "Website Changed: {{ trigger.json.title }}"
message: >
{{ trigger.json.page.name }} has changed!
URL: {{ trigger.json.page.url }}
Changed at: {{ trigger.json.changed_at }}
Difference: {{ trigger.json.human_difference }}
{% if trigger.json.ai_summary %}
AI Summary: {{ trigger.json.ai_summary }}
{% endif %}Step 2: Get Your Webhook URL
Your webhook URL will be in one of these formats:
With Nabu Casa:
https://YOUR_NABU_CASA_URL.ui.nabu.casa/api/webhook/pagecrawl_change_notificationWith external access (reverse proxy, Cloudflare Tunnel, etc.):
https://your-homeassistant-domain.com/api/webhook/pagecrawl_change_notificationImportant: Since PageCrawl is a cloud-hosted service, your Home Assistant instance must be accessible from the internet to receive webhooks. Local-only URLs (like homeassistant.local) will not work.
Step 3: Configure the Webhook in PageCrawl
- Log in to PageCrawl.io
- Go to Settings → Webhooks
- Click Add Webhook
- Enter your Home Assistant webhook URL
- Select the pages you want to trigger this webhook
- Choose which fields to include in the payload
- Save and test the webhook
Webhook Payload Structure
PageCrawl sends a comprehensive JSON payload with each webhook. Here's the complete structure:
{
"id": 12345,
"title": "Product Price Monitor",
"status": "ok",
"content_type": "text/html",
"visual_diff": 15,
"changed_at": "2024-01-15T10:30:00Z",
"contents": "Current page content...",
"original": "Previous page content...",
"difference": 25.5,
"human_difference": "Medium change detected",
"elements": 5,
"page_screenshot_image": "https://pagecrawl.io/api/webhook/...",
"text_difference_image": "https://pagecrawl.io/api/webhook/...",
"html_difference": "<div class='diff'>...</div>",
"markdown_difference": "## Changes\n- Price changed from $99 to $79",
"page": {
"id": 123,
"name": "Amazon Product Page",
"url": "https://amazon.com/product/...",
"url_tld": "amazon.com",
"slug": "amazon-product-page",
"link": "https://pagecrawl.io/app/pages/amazon-product-page",
"folder": "Price Monitors"
},
"page_elements": [
{
"id": 456,
"label": "Price",
"type": "css",
"contents": "$79.99",
"original": "$99.99",
"difference": 20.0,
"human_difference": "Significant change",
"changed": true
}
],
"ai_summary": "The product price dropped from $99.99 to $79.99, a 20% discount",
"ai_priority_score": 85.5,
"previous_check": {
"id": 12344,
"changed_at": "2024-01-14T10:30:00Z",
"contents": "...",
"difference": 5.2
}
}Key Fields for Automations
| Field | Description | Use Case |
|---|---|---|
title |
Page name | Notification titles |
status |
Page status (ok, error) | Error alerting |
difference |
Change percentage (0-100) | Threshold-based triggers |
human_difference |
Readable change description | Notifications |
ai_summary |
AI-generated change summary | Smart notifications |
ai_priority_score |
Importance score (0-100) | Priority filtering |
page.url |
Monitored URL | Deep linking |
page_elements[].contents |
Specific tracked element value | Price/stock monitoring |
changed_at |
ISO 8601 timestamp | Time-based logic |
Advanced Webhook Automations
Price Drop Alert with Mobile Notification
automation:
- id: pagecrawl_price_drop_alert
alias: "PageCrawl - Price Drop Alert"
trigger:
- platform: webhook
webhook_id: pagecrawl_change_notification
allowed_methods:
- POST
local_only: false
condition:
# Only trigger for significant changes with high AI priority
- condition: template
value_template: "{{ trigger.json.ai_priority_score | float > 70 }}"
- condition: template
value_template: "{{ trigger.json.difference | float > 10 }}"
action:
- service: notify.mobile_app_your_phone
data:
title: "Price Alert: {{ trigger.json.title }}"
message: "{{ trigger.json.ai_summary | default('Change detected') }}"
data:
url: "{{ trigger.json.page.url }}"
clickAction: "{{ trigger.json.page.url }}"
image: "{{ trigger.json.page_screenshot_image }}"
priority: high
ttl: 0
actions:
- action: OPEN_URL
title: "View Page"
uri: "{{ trigger.json.page.url }}"
- action: VIEW_PAGECRAWL
title: "View in PageCrawl"
uri: "{{ trigger.json.page.link }}"Raspberry Pi Stock Alert (rpilocator.com)
Monitor rpilocator.com for Raspberry Pi 5 availability and get instant alerts when stock appears:
automation:
- id: pagecrawl_rpi_stock_alert
alias: "PageCrawl - Raspberry Pi Back in Stock"
trigger:
- platform: webhook
webhook_id: pagecrawl_rpi_stock
allowed_methods:
- POST
local_only: false
condition:
- condition: template
value_template: >
{% set content = trigger.json.contents | lower %}
{{ 'in stock' in content or 'add to cart' in content or 'buy now' in content }}
action:
- service: notify.all_devices
data:
title: "Raspberry Pi In Stock!"
message: "{{ trigger.json.title }} is available! {{ trigger.json.page.url }}"
- service: light.turn_on
target:
entity_id: light.office_alert
data:
color_name: green
brightness: 255
- delay:
seconds: 30
- service: light.turn_off
target:
entity_id: light.office_alertMulti-Page Router with Different Actions
Route different monitors to different actions based on the page folder or URL:
automation:
- id: pagecrawl_router
alias: "PageCrawl - Notification Router"
trigger:
- platform: webhook
webhook_id: pagecrawl_change_notification
allowed_methods:
- POST
local_only: false
action:
- choose:
# Home Assistant / ESPHome / Zigbee2mqtt releases (GitHub)
- conditions:
- condition: template
value_template: "{{ 'github.com' in trigger.json.page.url }}"
sequence:
- service: notify.mobile_app_phone
data:
title: "New Release: {{ trigger.json.title }}"
message: "{{ trigger.json.ai_summary | default('New version available') }}"
data:
url: "{{ trigger.json.page.url }}"
priority: high
# Hardware stock alerts (rpilocator, Ubiquiti store, ThePiHut)
- conditions:
- condition: template
value_template: "{{ trigger.json.page.folder == 'Stock Alerts' }}"
sequence:
- service: notify.all_devices
data:
title: "In Stock: {{ trigger.json.title }}"
message: "{{ trigger.json.page.url }}"
- service: light.turn_on
target:
entity_id: light.office_notification
data:
color_name: green
brightness: 255
# Firmware updates (Shelly, Tasmota)
- conditions:
- condition: template
value_template: "{{ trigger.json.page.folder == 'Firmware' }}"
sequence:
- service: tts.speak
target:
entity_id: tts.home_assistant_cloud
data:
message: "Firmware update available for {{ trigger.json.title }}"
media_player_entity_id: media_player.office_speaker
# Default action for any other page
default:
- service: notify.persistent_notification
data:
title: "PageCrawl: {{ trigger.json.title }}"
message: "{{ trigger.json.human_difference }}"Track Changes in Input Text Helper
automation:
- id: pagecrawl_store_price
alias: "PageCrawl - Store Latest Price"
trigger:
- platform: webhook
webhook_id: pagecrawl_price_tracker
allowed_methods:
- POST
local_only: false
action:
- service: input_text.set_value
target:
entity_id: input_text.tracked_product_price
data:
value: "{{ trigger.json.page_elements[0].contents }}"
- service: input_datetime.set_datetime
target:
entity_id: input_datetime.price_last_updated
data:
datetime: "{{ now().strftime('%Y-%m-%d %H:%M:%S') }}"Method 2: REST API Polling
REST API polling is useful when you need to:
- Check status on a schedule regardless of changes
- Monitor page health and error states
- Build dashboard sensors
- Integrate with systems that can't receive webhooks
Getting Your API Token
- Log in to PageCrawl.io
- Navigate to Settings → API
- Copy your API token (or generate a new one)
Your API token is a 60-character string that authenticates your requests.
Basic REST Sensor Setup
Add this to your configuration.yaml:
rest:
- resource: "https://pagecrawl.io/api/changes/YOUR_PAGE_ID"
scan_interval: 300 # Poll every 5 minutes
headers:
Authorization: "Bearer YOUR_API_TOKEN"
Accept: "application/json"
sensor:
- name: "PageCrawl Example Page"
unique_id: "pagecrawl_example_page_status"
value_template: >
{% if value_json.disabled %}
disabled
{% elif value_json.unseen > 0 %}
changed
{% elif value_json.failed > 0 %}
error
{% else %}
ok
{% endif %}
json_attributes:
- name
- url
- last_checked_at
- unseen
- failed
- disabled
- frequencyAdvanced REST Sensors
Complete Page Status with All Attributes
rest:
- resource: "https://pagecrawl.io/api/changes/123"
scan_interval: 300
headers:
Authorization: "Bearer YOUR_API_TOKEN"
Accept: "application/json"
sensor:
- name: "Product Monitor Status"
unique_id: "pagecrawl_product_monitor"
icon: mdi:web-sync
value_template: >
{% if value_json.disabled %}disabled
{% elif value_json.pending %}checking
{% elif value_json.unseen > 0 %}changed
{% elif value_json.failed > 0 %}error
{% else %}ok{% endif %}
json_attributes:
- id
- name
- url
- status
- last_checked_at
- unseen
- failed
- disabled
- pending
- frequency
- screenshots
- latest
binary_sensor:
- name: "Product Monitor Has Changes"
unique_id: "pagecrawl_product_monitor_changed"
icon: mdi:alert-circle
value_template: "{{ value_json.unseen > 0 }}"
device_class: problem
- name: "Product Monitor Has Errors"
unique_id: "pagecrawl_product_monitor_error"
icon: mdi:alert
value_template: "{{ value_json.failed > 0 }}"
device_class: problem
- name: "Product Monitor Active"
unique_id: "pagecrawl_product_monitor_active"
icon: mdi:eye
value_template: "{{ not value_json.disabled }}"Latest Check Details with AI Summary
rest:
- resource: "https://pagecrawl.io/api/changes/123/zapier/poll"
scan_interval: 300
headers:
Authorization: "Bearer YOUR_API_TOKEN"
Accept: "application/json"
sensor:
- name: "Product Monitor Latest Check"
unique_id: "pagecrawl_product_latest"
icon: mdi:clipboard-text-clock
value_template: "{{ value_json[0].id }}"
json_attributes_path: "$[0]"
json_attributes:
- title
- status
- changed_at
- difference
- human_difference
- ai_summary
- ai_priority_score
- contents
- page
- name: "Product Monitor AI Priority"
unique_id: "pagecrawl_product_ai_priority"
icon: mdi:robot
unit_of_measurement: "%"
value_template: "{{ value_json[0].ai_priority_score | float(0) | round(1) }}"
- name: "Product Monitor Difference"
unique_id: "pagecrawl_product_difference"
icon: mdi:compare
unit_of_measurement: "%"
value_template: "{{ value_json[0].difference | float(0) | round(1) }}"Extract Specific Element Value (e.g., Price)
rest:
- resource: "https://pagecrawl.io/api/changes/123/zapier/poll"
scan_interval: 600
headers:
Authorization: "Bearer YOUR_API_TOKEN"
sensor:
- name: "Tracked Product Price"
unique_id: "pagecrawl_product_price"
icon: mdi:currency-usd
unit_of_measurement: "$"
value_template: >
{% set price_element = value_json[0].page_elements | selectattr('label', 'equalto', 'Price') | first %}
{% if price_element %}
{{ price_element.contents | regex_replace('[^0-9.]', '') | float(0) }}
{% else %}
unknown
{% endif %}
json_attributes_path: "$[0]"
json_attributes:
- changed_at
- ai_summaryMonitoring Multiple Pages
Template for Multiple Pages
Monitor multiple tinkerer-relevant pages with YAML anchors to reduce duplication:
rest:
# Home Assistant Releases (GitHub)
- resource: "https://pagecrawl.io/api/changes/101"
scan_interval: 300
headers:
Authorization: !secret pagecrawl_api_token
sensor:
- name: "HA Releases Monitor"
unique_id: "pagecrawl_ha_releases"
icon: mdi:home-assistant
value_template: &status_template >
{% if value_json.disabled %}disabled
{% elif value_json.unseen > 0 %}changed
{% elif value_json.failed > 0 %}error
{% else %}ok{% endif %}
json_attributes: &common_attributes
- name
- url
- last_checked_at
- unseen
- failed
# Raspberry Pi Stock (rpilocator.com)
- resource: "https://pagecrawl.io/api/changes/102"
scan_interval: 300
headers:
Authorization: !secret pagecrawl_api_token
sensor:
- name: "RPi Stock Monitor"
unique_id: "pagecrawl_rpi_stock"
icon: mdi:raspberry-pi
value_template: *status_template
json_attributes: *common_attributes
# Zigbee2mqtt Releases (GitHub)
- resource: "https://pagecrawl.io/api/changes/103"
scan_interval: 600
headers:
Authorization: !secret pagecrawl_api_token
sensor:
- name: "Z2M Releases Monitor"
unique_id: "pagecrawl_z2m_releases"
icon: mdi:zigbee
value_template: *status_template
json_attributes: *common_attributesStore your API token in secrets.yaml:
pagecrawl_api_token: "Bearer your_60_character_api_token_here"Dashboard Card for Multiple Monitors
Create a custom dashboard card showing all your monitors:
type: entities
title: Tinkerer Monitors
entities:
- entity: sensor.ha_releases_monitor
name: Home Assistant Releases
secondary_info: last-changed
- entity: sensor.rpi_stock_monitor
name: Raspberry Pi Stock
secondary_info: last-changed
- entity: sensor.z2m_releases_monitor
name: Zigbee2mqtt Releases
secondary_info: last-changedOr use a more visual approach with conditional formatting:
type: custom:auto-entities
card:
type: glance
title: PageCrawl Monitors
filter:
include:
- entity_id: sensor.pagecrawl_*
options:
tap_action:
action: url
url_path: "{{ state_attr(config.entity, 'url') }}"Advanced Automation Examples
Price History Tracking with Statistics
sensor:
- platform: statistics
name: "Product Price Stats"
entity_id: sensor.tracked_product_price
state_characteristic: mean
max_age:
days: 30
- platform: statistics
name: "Product Price Min 30d"
entity_id: sensor.tracked_product_price
state_characteristic: value_min
max_age:
days: 30
automation:
- id: pagecrawl_price_at_30day_low
alias: "PageCrawl - Price at 30-Day Low"
trigger:
- platform: template
value_template: >
{{ states('sensor.tracked_product_price') | float ==
states('sensor.product_price_min_30d') | float }}
action:
- service: notify.mobile_app_phone
data:
title: "30-Day Low Price!"
message: >
{{ state_attr('sensor.tracked_product_price', 'name') }}
is at its lowest price in 30 days: ${{ states('sensor.tracked_product_price') }}Combine Webhook and Polling for Reliability
# REST sensor for baseline status
rest:
- resource: "https://pagecrawl.io/api/changes/123"
scan_interval: 900 # 15 minutes as backup
headers:
Authorization: !secret pagecrawl_api_token
sensor:
- name: "Monitor Backup Status"
unique_id: "pagecrawl_backup_sensor"
value_template: "{{ value_json.unseen }}"
# Webhook for real-time notifications
automation:
- id: pagecrawl_webhook_handler
alias: "PageCrawl - Webhook Handler"
trigger:
- platform: webhook
webhook_id: pagecrawl_notification
allowed_methods:
- POST
action:
- service: input_number.set_value
target:
entity_id: input_number.pagecrawl_unseen_count
data:
value: "{{ trigger.json.unseen | default(1) }}"
- service: notify.mobile_app_phone
data:
title: "{{ trigger.json.title }}"
message: "{{ trigger.json.ai_summary | default(trigger.json.human_difference) }}"
# Alert if no updates received for too long
- id: pagecrawl_stale_check
alias: "PageCrawl - Stale Data Alert"
trigger:
- platform: template
value_template: >
{{ (as_timestamp(now()) - as_timestamp(state_attr('sensor.monitor_backup_status', 'last_checked_at'))) > 7200 }}
action:
- service: notify.admin
data:
title: "PageCrawl Monitor Stale"
message: "No check received in over 2 hours"Trigger Manual Check via Home Assistant
rest_command:
pagecrawl_trigger_check:
url: "https://pagecrawl.io/api/changes/{{ page_id }}/check"
method: PUT
headers:
Authorization: !secret pagecrawl_api_token
Content-Type: "application/json"
script:
check_all_monitors:
alias: "Trigger All PageCrawl Checks"
sequence:
- service: rest_command.pagecrawl_trigger_check
data:
page_id: 101
- delay:
seconds: 2
- service: rest_command.pagecrawl_trigger_check
data:
page_id: 102
- delay:
seconds: 2
- service: rest_command.pagecrawl_trigger_check
data:
page_id: 103Visual Dashboard with Change Screenshots
# Download and cache screenshot when change detected
automation:
- id: pagecrawl_cache_screenshot
alias: "PageCrawl - Cache Screenshot"
trigger:
- platform: webhook
webhook_id: pagecrawl_notification
allowed_methods:
- POST
condition:
- condition: template
value_template: "{{ trigger.json.page_screenshot_image is defined }}"
action:
- service: downloader.download_file
data:
url: "{{ trigger.json.page_screenshot_image }}"
filename: "pagecrawl_latest_{{ trigger.json.page.id }}.png"
overwrite: true
# Display in dashboard using local image
camera:
- platform: local_file
name: "PageCrawl Latest Screenshot"
file_path: /config/downloads/pagecrawl_latest_123.pngTroubleshooting
Webhook Issues
Problem: Webhooks not arriving
- Verify your Home Assistant is accessible from the internet
- Check the webhook URL is correctly formatted
- Test with
curl:curl -X POST https://your-ha-url/api/webhook/pagecrawl_test \ -H "Content-Type: application/json" \ -d '{"test": "data"}' - Check Home Assistant logs for webhook errors
Problem: Webhook automation not triggering
- Ensure
local_only: falseis set in your trigger - Verify the webhook_id matches exactly
- Check that
allowed_methodsincludes POST
REST API Issues
Problem: Authentication errors (401)
- Verify your API token is correct
- Ensure the Authorization header format is
Bearer YOUR_TOKEN(with space) - Check the token hasn't been rotated
Problem: Empty or null values
- Use default filters in templates:
{{ value_json.field | default('N/A') }} - Check if the page ID/slug exists in your PageCrawl account
Problem: Rate limiting (429)
- Increase
scan_intervalto reduce polling frequency - Recommended: 300 seconds (5 minutes) minimum
Template Errors
Problem: Template errors in value_template
- Use the Template Editor in Developer Tools to test
- Wrap in
{% if %}checks for optional fields - Use
| default()filter for nullable values
When to Use PageCrawl vs. Native HA Sensors
Use HA's built-in scrape or rest sensors when:
- The site is simple, static HTML without JavaScript
- No bot protection (Cloudflare, etc.)
- Public JSON APIs with stable endpoints
- GitHub Atom feeds (
/releases.atom) - You don't mind your home IP making requests
Use PageCrawl when the site is hard to monitor:
Protected E-commerce & Stock Alerts
These sites actively block scrapers and need real browser rendering:
- Ubiquiti Store — Heavy Cloudflare protection, dynamic inventory
- NVIDIA GeForce Store — Bot detection, JavaScript-loaded stock status
- rpilocator.com — Aggregates Raspberry Pi stock across retailers
- Sneaker drops (Nike SNKRS, Shopify stores) — Aggressive bot protection
- Limited releases — PS5/Xbox restocks, GPU drops
Behind-Login Content
PageCrawl can authenticate and monitor pages that require login:
- Company career pages — New job postings behind applicant portals
- Customer portals — Order status, account changes
- School/University portals — Grades, schedules, announcements
- ISP status dashboards — Outage info requiring login
- Member-only forums — Specific threads or categories
Use PageCrawl's browser steps feature to log in before capturing content.
JavaScript-Heavy Sites
Modern sites that load content dynamically after page load:
- Single-page applications (React, Vue, Angular sites)
- Lazy-loaded pricing — Price appears after JavaScript executes
- Interactive dashboards — Content loaded via API calls
- Infinite scroll pages — Content not in initial HTML
Sites You Don't Want to Overload
Avoid hammering servers from your home IP or getting blocked:
- Small business websites — Don't overload their hosting
- Government portals — Rate limiting concerns
- Competitor monitoring — Don't reveal your IP
- Frequent checks — Let PageCrawl manage the rate limiting
Conclusion
Integrating PageCrawl with Home Assistant opens up powerful automation possibilities for tinkerers and home automation enthusiasts.
Webhooks provide instant notifications with rich payload data—perfect for time-sensitive alerts like Raspberry Pi stock drops or Ubiquiti UniFi availability. REST API polling gives you dashboard-ready sensors for monitoring Home Assistant releases, Zigbee2mqtt updates, or Shelly firmware changes.
Whether you're tracking hardware availability on rpilocator.com, monitoring GitHub releases for your favorite smart home projects, watching for deals on Zigbee devices, or keeping tabs on any web content that matters to your setup, this integration brings web change detection into your smart home ecosystem.
Related Resources
- PageCrawl API Documentation (requires login)
- PageCrawl Webhook Documentation (requires login)
- Home Assistant Webhook Trigger Documentation
- Home Assistant RESTful Sensor Documentation
- Home Assistant Template Documentation
Keywords: Home Assistant webhook integration, web change detection, website monitoring automation, PageCrawl Home Assistant, REST API sensor, smart home website alerts, price drop notification Home Assistant, web scraping automation

