PageCrawl.io Help Center

Cancel or Upgrade Account

2026-05-06T14:16:21+00:00

Cancel or Upgrade Account

Changing plan or billing interval

If you would like to change or upgrade your plan, just go to your Subscription settings and choose a plan you want to switch to. Upgrades/downgrades are prorated, meaning, that the unused time will be applied as a credit for the next payment. e.g. you subscribed to $8/mo plan but you only used it for half-a-month and decided to upgrade to $30/mo plan. When upgrading, 4$ will be credited back and the remaining half-of-the-month of $30/mo plan will only cost you 11$.

Canceling or Suspending your account

You can cancel your subscription by going to your Subscription settings and clicking on the red "Downgrade to Free" button. This will open a multi-step confirmation modal where you can optionally provide feedback about why you are canceling. To complete the cancellation, you will need to type "CANCEL" and confirm.

Once confirmed, your subscription will not end immediately. You will retain full access to your paid features until the end of your current billing period (grace period). After that date, your account will automatically downgrade to the Free plan.

How to Change Email Address

2026-03-05T10:31:11+00:00

How to Change Email Address

Unfortunately, for security and to prevent service abuse, email addresses cannot be changed directly by users.

To change your email address please contact support at help_me@pagecrawl.io from your originally registered email address. We will verify the information and get back to you as soon as possible.

Email address for 'Free Forever' plan users cannot be changed to prevent service abuse.

Can I pay by Paypal?

2026-05-06T14:16:21+00:00

Can I pay by Paypal?

Unfortunately, it is not yet possible to pay via Paypal.

We support subscription billing by credit/debit card, Apple Pay, and Google Pay for monthly and annual billing intervals.

How do I get invoices?

2026-03-05T10:31:13+00:00

How do I get invoices?

You can find all your invoices here.

If you wish to receive invoices to your email each month/year, enter your email address in the billing details section:

Is it possible to pay by a bank transfer or purchase order?

2026-03-05T10:31:13+00:00

Is it possible to pay by a bank transfer or purchase order?

We accept all major credit and debit cards for subscriptions.

For Ultimate plans paid annually, we also support:

Bank transfers (wire/ACH)
Purchase orders (PO)
Invoicing

If you would like to arrange an alternative payment method, please contact support at support@pagecrawl.io.

Why does my card keep getting declined?

2026-03-05T10:31:13+00:00

Why does my card keep getting declined?

The most common reasons for a failed transaction include insufficient funds, incorrect card details, and suspicions of fraud.

In case of a transaction failure first, check if the card details you entered are correct and make sure that there are enough funds in your account to make a purchase.

If the transaction keeps getting declined try using another card or contact your card issuer. In most cases your card issuer will be able to remove the block and allow the transaction to go through.

Common reasons for a payment failure:

Insufficient funds
Your card has expired
Incorrectly entered information
Account flagged for fraud
Credit limit has been maxed out
Transaction blocked
Your card doesn't allow international transactions
Wrong billing address

Tracking Text Changes in PDF Files

2026-03-05T10:31:12+00:00

Tracking Text Changes in PDF Files

PageCrawl can monitor PDF files hosted online and notify you when the text content changes. It extracts text from the PDF, compares it against the previous version, and highlights exactly what was added, removed, or modified.

How It Works

PageCrawl downloads the PDF file at your configured check frequency
Text is extracted from the PDF
The extracted text is compared against the previous version
If changes are detected, you receive a notification with a diff showing exactly what changed

Setup

Click Track New Page
Paste the direct URL to the PDF file
PageCrawl automatically detects it as a PDF and shows the appropriate configuration options
Choose your check frequency and notification preferences
Save

Password-Protected PDFs

PDFs behind login authentication are also supported. Configure an authentication setup first, then select it when adding the PDF to monitor.

PDF vs File Checksum

Method	What It Detects	Diff Available
PDF text tracking	Text content changes (additions, deletions, edits)	Yes, line-by-line diff
File checksum	Any modification to the file (including metadata, images)	No, only detects that something changed

Use PDF text tracking when you need to see exactly what text changed. Use file checksum monitoring when you need to detect any modification, including non-text changes.

File Checksum Monitoring - Detect any file modification using SHA-256
Tracking PDF Files (Tutorial) - Step-by-step PDF monitoring guide
Excel Spreadsheets - Monitor Excel file changes

Send SMS message when website change is detected

2026-03-05T10:31:12+00:00

Send SMS message when website change is detected

While SMS messages can be useful for mission-critical applications, to avoid increasing the subscription costs, we do not include native SMS notifications in our subscription plans.

For personal use, we suggest using Telegram Messenger as an alternative of the SMS notifications. It is free of charge, and you only need Internet connection on your mobile phone, which you most likely already have and will need to review what has changed on your monitored page.

Send SMS via Zapier Integration

If you really need to receive change notifications by SMS, you can receive them by setting up Zapier integration to send SMS messages. Zapier allows integrating our application to over 2000 services easily (for an additional cost and there may be a limit for the number of SMS each month).

Other notification channels

We have integrations with other notification channels, visit PageCrawl.io Integrations to learn more.

What is the difference between Enterprise Support and Standard Support?

2026-05-06T14:16:21+00:00

What is the difference between Enterprise Support and Standard Support?

We aim to respond to your inquiries promptly but sometimes due to an increased number of support requests Enterprise customer requests/emails are prioritized over the Standard customers. Therefore, the response time is faster, also you may expect a 'higher level' of support in case you are not able to set up the page the way you want.

For technical support our response times are prioritized according to your subscription plan:

Free Forever Plan: Technical support not offered
Standard Plan: Within 72 hours (excluding weekends)
Enterprise Plan: Within 24 hours (excluding weekends)
Ultimate Plan: Within 24 hours (excluding weekends)

Is there any limit to how many websites we can add to monitor?

2026-03-05T10:31:13+00:00

Is there any limit to how many websites we can add to monitor?

No. We price our services based on the number of pages primarily and you can upgrade your plan if you need to track more pages.

Is there a limit to the number of checks in the plan?

2026-05-06T14:16:21+00:00

Is there a limit to the number of checks in the plan?

The Standard plan includes 15,000 checks, the Enterprise plan allows for 100,000 checks, and the Ultimate plan also includes 100,000 checks each month. All paid plans can be purchased in multiples if you require more pages checked or more frequent checks.

How many checks I need?

It all depends on how many pages you want to track and how frequently. Also, adjusting your schedule may reduce the number of checks needed. You may start with the Standard plan and upgrade if you notice that you need more.

A few rules of thumb:

A page checked daily will require 30 checks each month.
A page checked every hour will require 720 checks each month.
A page checked every 5 minutes will require 8,640 checks each month.

Estimating based on current usage

If your estimated number of checks for this period will be over the limit, you will see an alert. You can check your usage statistics to find out your current estimate.

How to Delete My Account

2026-05-06T14:16:21+00:00

How to Delete My Account

Deletion of your account will result in loss of ALL data associated with it.

To delete your account go to the Account Settings, scroll to the bottom of the page, press Permanently delete my account, and proceed with the instructions.

Send Website Change Detection Notifications to Microsoft Teams channel

2026-03-05T10:31:13+00:00

Send Website Change Detection Notifications to Microsoft Teams channel

PageCrawl.io monitors websites for changes and sends instant notifications through your preferred channels. This guide walks you through connecting PageCrawl.io with Microsoft Teams to receive alerts directly in your Teams channels.

What You'll Need

Before starting, ensure you have:

A PageCrawl.io account
→ Sign up here if you don't have one yet
Microsoft 365 For Business subscription
Basic Teams plans don't support external webhooks - you need a Business plan

Setting Up the Integration

Step 1: Create a Teams Webhook

1.1 In your Teams channel, click the Workflows menu

1.2 Select "Post to a channel when a webhook request is received"

1.3 Click Next and name your workflow Use a descriptive name like "PageCrawl Website Monitoring"

1.4 Copy the generated webhook URL

Step 2: Connect to PageCrawl.io

Choose your notification scope:

Option A: Monitor All Pages
→ Go to Workspace Settings
→ Paste the Teams webhook URL
→ Save changes

Option B: Monitor Specific Pages
→ Open settings for individual pages
→ Add the Teams webhook URL
→ Save changes

Tip: Set a default webhook for all pages, then override for specific ones that need special handling.

Not working? Check that:

The webhook URL was copied correctly
Your Microsoft 365 plan supports webhooks
The monitored page actually changed

More Notification Options

Other supported notification channels

We do have more supported notification channels to suit everyone's preferences.

Be notified about website changes via Telegram
Be notified about website changes via Discord
Be notified about website changes via Slack
Be notified about website changes via Email
Be notified about website changes via Webhook
Be notified about website changes via Zapier

Send Website Change Detection Notifications to Discord channel

2026-05-06T14:16:21+00:00

Send Website Change Detection Notifications to Discord channel

PageCrawl allows you to track changes in websites and get notified instantly via your preferred method. In this article we will discuss how you can setup PageCrawl to receive notifications in Discord.

Prerequisites

You need an PageCrawl.io account. This works in both Free and Paid accounts. If you don't already have one, go here to register an account.

Retrieve Discord Webhook URL

Follow the steps below to retrieve a Discord Webhook URL

1. You should go to a server and click "Edit Channel" (e.g. see below).

2. Click on "Integrations" and press "New Webhook" button

3. Finally, click on "Copy Webhook URL"

Set Webhook URL in PageCrawl.io

If you would like to receive notifications for all tracked pages, simply paste webhook URL in user notification preferences.

If you only want a single page to be notified about in Discord. Just set this Webhook URL in a specific page.

Troubleshooting

What if I can't edit the server? You should ensure you have permissions from the server owner to edit channel.

I didn't receive a notification Please wait for page to change. We will only send a notification when we detect a change.

I receive too many notifications? What can I do? You may setup notification rules to be notified only when e.g. text disappears, number increases, etc.

Other supported notification channels

We do have more supported notification channels to suit everyone's preferences.

Be notified about website changes via Telegram
Be notified about website changes via Microsoft Teams
Be notified about website changes via Slack
Be notified about website changes via Email
Be notified about website changes via Webhook
Be notified about website changes via Zapier

Send Website Change Detection Notifications to Telegram group or channel

2026-05-06T14:16:21+00:00

Send Website Change Detection Notifications to Telegram group or channel

PageCrawl.io allows you to track changes in websites and get notified instantly via your preferred method. In this article we will discuss how you can setup PageCrawl to receive notifications in Telegram.

Prerequisites

You need an PageCrawl.io account. This works in both Free and Paid accounts. If you don't already have one, go here to register an account.

Retrieve Telegram Chat ID

Follow the steps below to retrieve a Telegram Chat ID. This is needed so you could receive notifications in a 1-to-1 chat, channel or a group conversation.

Start 1-to-1 conversation with @PageCrawlBot, invite to a Channel, or add to group conversation.

1-to-1 conversation

Simply begin a conversation with @PageCrawlBot and you will receive instructions how to configure it.

Include in a Channel or Group conversation

Instructions for Channels and Groups are identical. To include the bot in the Channel or Group you should invite @PageCrawlBot to the channel. You may likely also need to adjust bot permissions, so it could read and send messages. To get instructions what code you should put in PageCrawl.io settings, send a /start message to the bot: @PageCrawlBot /start

Keep in mind that Channels or Group conversations have a negative chat id! 1-to-1 conversations - always positive chat id.

Configure in PageCrawl.io

If you would like to receive notifications for all tracked pages, enter the Chat ID you obtained in previously in user notification preferences.

If you only want a single page to be notified about in Telegram. Just set this Chat ID in a specific page.

Troubleshooting

What if I can't edit the server? You should ensure you have permissions from the server owner to edit channel.

I didn't receive a notification Please wait for page to change. We will only send a notification when we detect a change.

I receive too many notifications? What can I do? You may setup notification rules to be notified only when e.g. text disappears, number increases, etc.

Other supported notification channels

We do have more supported notification channels to suit everyone's preferences.

Be notified about website changes via Microsoft Teams
Be notified about website changes via Discord
Be notified about website changes via Slack
Be notified about website changes via Email
Be notified about website changes via Webhook
Be notified about website changes via Zapier

Monitoring password-protected pages

2026-03-05T10:31:12+00:00

Monitoring password-protected pages

If you're looking to track pages on websites that require login authentication, the answer is yes – it is possible. Please note that this feature is only available on paid plans.

How It Works

Monitoring password-protected pages is a two-step process:

Configure authentication - Set up your login credentials once
Select when monitoring - Choose the configuration when adding a page to monitor

Step 1: Configure Authentication

Before you can monitor password-protected pages, you need to set up an authentication configuration:

Go to Authentication Settings
Click "Add Authentication Configuration"
Fill in the required details:
- Name - A friendly name to identify this configuration (e.g., "My Company Portal")
- Login URL - The URL of the login page
- Username/Email - Your login credentials
- Password - Your password
- Form fields - CSS selectors for the username field, password field, and submit button
Save the configuration

You can create multiple authentication configurations for different websites.

Step 2: Add a Page to Monitor

Once your authentication is configured:

Go to add a new page to monitor
Enter the URL of the password-protected page you want to track
If an authentication configuration exists for that website's domain, a "Login Authentication" option will appear
Select the appropriate authentication configuration from the dropdown
Complete the rest of the setup as usual

The system automatically detects and shows only authentication configurations that match the domain of the URL you're monitoring. For example, if you're monitoring https://app.example.com/dashboard, it will show authentication configs set up for example.com.

Can You Also Track Files Behind Login Authentication?

If you want to track files such as PDFs, Excel spreadsheets, CSVs, or Word documents, you're in luck. These types of files can also be tracked, even if they are behind login authentication. Simply provide the link to the file and select the appropriate authentication configuration.

HTTP Basic Authentication

In case the website is using "HTTP Basic Authentication" (the browser popup that asks for credentials), you can enter the credentials under "Advanced Settings" when setting up your monitored page. This is different from form-based login authentication.

PageCrawl API & Webhooks

2026-05-06T14:16:21+00:00

PageCrawl API & Webhooks

PageCrawl provides three ways to integrate with external systems: a REST API, webhooks, and RSS feeds.

API and webhooks are available on paid plans.

API

The REST API lets you manage monitors programmatically, including creating pages, retrieving change history, and triggering checks. Find your API key in Settings > API.

See the API & Webhooks guide for endpoints and authentication details. For the full endpoint reference and schemas, see pagecrawl.io/developers.

Webhooks

Webhooks send HTTP POST requests to your endpoint whenever a change is detected or an error occurs. Configure them in Settings > Workspace > Integrations > Webhooks.

See the Webhook Integration guide for setup, payload fields, and example payloads.

RSS Feeds

Access recent changes in Atom RSS format. Generate a public RSS URL for a single page or for all pages in the workspace.

See the RSS Feeds guide for setup instructions.

Monitor Changes in CSV Files

2026-03-05T10:31:12+00:00

Monitor Changes in CSV Files

PageCrawl can monitor CSV (comma-separated values) files hosted online and notify you when their content changes. It retrieves the file, compares the data against the previous version, and shows exactly what rows or values were added, removed, or modified.

Setup

Click Track New Page
Paste the direct URL to the CSV file
PageCrawl detects the file type and shows the appropriate configuration
Choose your check frequency and notification preferences
Save

Password-Protected Files

CSV files behind login authentication are supported. Configure an authentication setup first, then select it when adding the file.

Excel Spreadsheets - Monitor Excel file changes
Google Docs & Sheets - Monitor Google Sheets and Docs
File Checksum Monitoring - Detect any file modification

Monitor Changes in Excel Spreadsheets (xls, xlsx, ods)

2026-03-05T10:31:12+00:00

Monitor Changes in Excel Spreadsheets (xls, xlsx, ods)

PageCrawl can monitor Excel files hosted online and notify you when their content changes. It extracts text and data from the spreadsheet, compares it against the previous version, and shows exactly what was added, removed, or modified.

Supported File Types

xls, xlsx, ods

Setup

Click Track New Page
Paste the direct URL to the Excel file
PageCrawl detects the file type and shows the appropriate configuration
Choose your check frequency and notification preferences
Save

Password-Protected Files

Excel files behind login authentication are supported. Configure an authentication setup first, then select it when adding the file.

CSV Files - Monitor CSV file changes
Google Docs & Sheets - Monitor Google Sheets and Docs
File Checksum Monitoring - Detect any file modification

Monitor Changes in PowerPoint Presentations

2026-03-05T10:31:12+00:00

Monitor Changes in PowerPoint Presentations

PageCrawl can monitor PowerPoint presentations hosted online and notify you when their text content changes. It extracts text from the slides, compares it against the previous version, and shows exactly what was added, removed, or modified.

Supported File Types

pptx

Setup

Click Track New Page
Paste the direct URL to the PowerPoint file
PageCrawl detects the file type and shows the appropriate configuration
Choose your check frequency and notification preferences
Save

Password-Protected Files

PowerPoint files behind login authentication are supported. Configure an authentication setup first, then select it when adding the file.

Word Documents - Monitor Word document changes
Excel Spreadsheets - Monitor Excel file changes
File Checksum Monitoring - Detect any file modification

Monitor Changes in Word Documents (doc, docx, odt)

2026-03-05T10:31:12+00:00

Monitor Changes in Word Documents (doc, docx, odt)

PageCrawl can monitor Word documents hosted online and notify you when their text content changes. It extracts text from the document, compares it against the previous version, and shows exactly what was added, removed, or modified.

Supported File Types

doc, docx, odt

Setup

Click Track New Page
Paste the direct URL to the Word document
PageCrawl detects the file type and shows the appropriate configuration
Choose your check frequency and notification preferences
Save

Password-Protected Files

Word files behind login authentication are supported. Configure an authentication setup first, then select it when adding the file.

PDF Changes - Monitor PDF file changes
PowerPoint Files - Monitor PowerPoint presentations
File Checksum Monitoring - Detect any file modification

Send Website Change Detection Notifications to Slack channel

2026-05-06T14:16:21+00:00

Send Website Change Detection Notifications to Slack channel

Prerequisites

You need an PageCrawl.io account. If you don't already have one, go here to register an account and setup pages you wish to track.
You need a Slack account.

Create Incoming Webhook Connector

Follow the steps below to create a new Incoming Webhook connector

1. Install "Incoming Webhooks" integration in your Slack workspace

Visit https://slack.com/apps/A0F7XDUAZ-incoming-webhooks to enable "Incoming WebHooks" for your workspace.

Please note, this is a legacy custom integration - an outdated way for teams to integrate with Slack. You may create Slack app instead, but the setup procedure of "Slack app" is significantly longer so we suggest using the legacy integration.

2. Click "Add to Slack" to continue

Simply click "Add to Slack" button. You may be prompted to sign in to your Slack account.

3. Select channel or create a new channel.

Here you will need to select a Slack channel where the messages from PageCrawl.io bot should be sent to and press "Add Incoming Webhook integration"

4. Copy the "URL".

Finally you should receive URL address. Copy it and paste in the notification settings as indicated below.

Set Webhook URL in PageCrawl.io

If you would like to receive notifications for all tracked pages, simply paste webhook URL in user notification preferences.

If you only want a single page to be notified via Slack. Just set this Webhook URL for a specific page.

Troubleshooting

What if I can't the app? You should ensure you have permissions from the Slack workspace owner.

I didn't receive a notification Please wait for page to change. We will only send a notification when we detect a change.

I receive too many notifications? What can I do? You may setup notification rules to be notified only when e.g. text disappears, number increases, etc.

Other supported notification channels

We do have more supported notification channels to suit everyone's preferences.

Be notified about website changes via Telegram
Be notified about website changes via Microsoft Teams
Be notified about website changes via Discord
Be notified about website changes via Email
Be notified about website changes via Webhook
Be notified about website changes via Zapier

Blocking Cookies and Ads in Your Monitored Pages

2026-05-06T14:16:21+00:00

Blocking Cookies and Ads in Your Monitored Pages

Monitoring tracked pages can sometimes result in frequent false-positive notifications, often stemming from pesky cookie popups. To address this issue and enhance your monitoring experience, we provide the "Block cookie banners & ads" action. This action effectively handles the majority of cookie windows and blocks ads, minimizing unnecessary notifications. Here are some considerations and alternatives to optimize your monitoring experience.

The "Block Cookie Banners & Ads" Action

To mitigate false positives, we highly recommend implementing the "Block cookie banners & ads" action on all tracked pages. This action has proven to be remarkably effective, successfully handling approximately 99% of cookie popups and preventing ad content from triggering notifications.

Alternative approach

In specific cases, if the tracked page is accessed from a location outside of Europe, cookie popups might not be displayed. As an alternative approach, you can opt to perform checks from a different country to avoid encountering cookie-related notifications.

Legacy Version of "Block Cookies and Ads"

Please be aware that a deprecated version of the "Block Cookies and Ads" action exists, which targets a narrower range of cookie popups. For optimal performance and to take advantage of the full feature set, we strongly advise updating to the current version. Keep in mind that automatic updates are not applied to prevent triggering unnecessary notifications.

Excluding Dates in the Monitored Pages

2026-05-06T14:16:21+00:00

Excluding Dates in the Monitored Pages

Frequently, you encounter text like "updated 1 month ago" or "last changed 1 hour ago" that continually updates on your monitored pages. While this information might seem informative, it often leads to false-positive notifications.

The "Remove dates" action

To address this issue and improve your monitoring experience, we recommend applying the "Remove Dates" action to your tracked page. This action will intelligently detect and replace all date-related text with a standardized [DATE REMOVED] tag.

Supported Date Formats

The "Remove Dates" action is designed to handle a wide range of common date formats, including:

30 min ago
1 day ago
19 August 2022
01-01-2020
Sat Aug 17 2020 18:40:39 GMT+0000 (GMT)
and many more...

The "Ignore numbers" filter

Instead of replacing dates with [DATE REMOVED] placeholders you may completely ignore all changes in numbers by adding "Ignore numbers" filters to "Conditions/Filters" section. Only use this if you are not interested in numeric changes.

Excluding a Part of the Page from Triggering Notifications

2026-05-06T14:16:21+00:00

Excluding a Part of the Page from Triggering Notifications

In certain situations, you may wish to exclude or remove a specific section on the page to prevent (false positive) notifications, especially when the content changes frequently. For instance, you might want to exclude a sidebar containing new blog posts or a Twitter feed at the bottom of the page.

When your tracked element type is "Full page" you may choose to track Everything on the page or Content only. If you choose Content only, text in header, sidebar, footer will not be tracked.

If you would like more control on what is removed, we recommend using the "Remove page element" action to exclude sections that do not interest you. You can either utilize the visual selector to remove the area or add the selector manually. Below you will find a few suggested selectors you can use.

Commonly Excluded Sections

Frequently, there are areas where tracking changes may not be of interest, including:

Sidebars (commonly placed within
HTML elements)
Footers (commonly placed within
HTML elements)
Navigation menus (commonly placed within
HTML elements)

You can use the following selector (which you can paste into the "CSS/XPath selector") to exclude the mentioned elements: nav,aside,footer,.footer,header

The Selector Didn't Work?

Unfortunately, not all websites adhere to the content sectioning guidelines. In such cases, you may need to use the visual selector to identify the area or manually input the selector.

Using Custom Proxies to Monitor Pages

2026-03-30T14:25:31+00:00

Using Custom Proxies to Monitor Pages

PageCrawl provides built-in proxy locations and supports custom proxy servers for pages that require specific geographic access or have IP-based restrictions.

Built-in Proxy Locations

PageCrawl offers multiple proxy locations across North America, Europe, and the Middle East, plus a residential proxy option. Select a proxy location per page or apply one to multiple pages via Bulk Edit. You can also choose Random to rotate between locations automatically.

Custom Proxy Setup

Use your own proxy servers when the built-in locations do not work for your use case.

Supported formats:

host:port
username:password@host:port

Configuration options:

Method	How
Single page	Edit the page > Power User settings > Custom Proxy
Multiple pages	Select pages > Bulk Edit > Custom Proxies
Template	Add proxy settings to a template for reuse

You can paste multiple proxy servers (one per line). PageCrawl will randomly select one for each check. If a proxy fails, the system automatically retries with a different proxy from the list.

Automatic Engine Switching

When a page is blocked (timeout, 403, or 401), PageCrawl automatically switches to Stealth mode in addition to the proxy configuration. This combination resolves most access issues.

Premium Residential Proxies

For pages that require residential proxies, PageCrawl offers Premium Residential Proxies with pay-as-you-go bandwidth starting at $10/GB. Purchase bandwidth in your account settings and select "Premium Residential" as the proxy location on your monitors. See the residential proxies guide for details on pricing, geo-targeting, and setup.

Choosing a Proxy Provider

Most pages work fine without any proxy configuration. You only need a custom proxy if a website is actively blocking bots or restricting access by geographic location. Start without a proxy, and only set one up if you are seeing access errors (403, bot protection blocks, empty pages).

If the built-in proxy locations are not enough for your needs, you can use a third-party proxy provider. Here is what to look for and some popular options.

Understanding bandwidth usage:

Each page check downloads the full page without caching, so bandwidth adds up quickly. An average web page uses 2-3 MB per check. Heavier pages (news sites, e-commerce, image-heavy pages) can use 5-10 MB or more. For example, monitoring 50 pages every 30 minutes at 3 MB each would use roughly 7 GB per day, or around 216 GB per month. Because of this, avoid proxy providers that charge per GB of traffic. Those plans are designed for one-off scraping, not ongoing monitoring.

What to look for:

Unlimited bandwidth - This is the most important factor. Look for plans priced per proxy/port or as a flat monthly rate, not per GB.
Username/password authentication - PageCrawl connects to proxies dynamically, so IP-based allowlists will not work. Choose a provider that supports username:password@host:port authentication.
Rotating IPs - Providers that rotate IPs automatically reduce the chance of being blocked over time.
Geographic coverage - Pick a provider with servers in the regions your monitored pages target.
HTTP/HTTPS support - PageCrawl requires standard HTTP proxies. SOCKS proxies are not supported.

Datacenter vs. residential proxies:

Datacenter proxies with unlimited bandwidth are the most cost-effective option for monitoring. They work well for most websites. Residential proxies (using real ISP addresses) are only needed for sites with strict bot detection that blocks datacenter IPs. If you need residential proxies, look for providers that offer them with unlimited bandwidth or per-IP pricing rather than per-GB billing.

Popular proxy providers that work with PageCrawl:

Provider	Type	Pricing Model	Notes
Webshare	Datacenter, Residential	Per proxy, unlimited bandwidth	Free tier available, good for testing. Paid datacenter plans include unlimited bandwidth.
IPRoyal	Datacenter, Static residential	Per proxy (datacenter)	Datacenter proxies with unlimited traffic. Static residential proxies available per IP.
Proxy-Cheap	Datacenter, Static residential	Per proxy, unlimited bandwidth	Budget-friendly static residential and datacenter proxies with no traffic limits.
ProxyRack	Datacenter, Residential	Flat monthly rate	Unlimited bandwidth on most plans. Rotating and geo-targeted options.

These are independent providers and not affiliated with PageCrawl. Prices and features may change.

Not every provider works for every website. A proxy that works perfectly for one site may get blocked on another. This depends on the website's bot detection, the proxy provider's IP reputation, and the type of proxies used. Always test a provider against your specific pages before committing to a long-term plan. Most providers offer short trial periods or small starter plans for this purpose.

Country-specific access: Some websites restrict content to visitors from a specific country (geo-blocking). Government portals, local news sites, and region-locked services often require an IP address from that country to load correctly. If you are monitoring pages like these, make sure the proxy provider offers proxies in the required country. Check the provider's location list before purchasing, as coverage varies significantly between providers, especially for smaller countries.

Note: Most providers give you a proxy endpoint in the username:password@host:port format. Paste it directly into the Custom Proxy field in PageCrawl. If the provider offers rotating proxies through a single gateway endpoint, you only need to add one line.

Avoiding Free Proxies

Free proxy servers are unreliable, slow, and frequently stop working. They should not be used for monitoring pages where uptime matters. Use the built-in proxy locations, your own paid proxy service, or contact us for residential proxy options.

Real Browser Mode - Engine selection including Stealth mode
Monitoring Pages Behind Bot Protection - Handling bot-protected pages
Bulk Edit - Apply proxy settings to multiple pages at once

Monitoring Pages Protected with CAPTCHA

2026-03-26T05:33:22+00:00

Monitoring Pages Protected with CAPTCHA

Some websites use CAPTCHA challenges to block automated access. PageCrawl integrates with 2Captcha, a CAPTCHA-solving service, to handle these protections automatically.

Available on Enterprise and Ultimate plans.

How It Works

PageCrawl encounters a CAPTCHA when checking a page
The CAPTCHA is sent to 2Captcha for solving
2Captcha returns the solution (using a combination of human workers and AI)
PageCrawl submits the solution and accesses the page content
The page is checked for changes as normal

Setup

Create an account at 2captcha.com and add funds
Copy your 2Captcha API key from the 2Captcha dashboard
In PageCrawl, go to Settings > Workspace > Integrations
Enter your 2Captcha API key and save
Pages that encounter CAPTCHAs will now be solved automatically

Supported CAPTCHA Types

2Captcha handles most common CAPTCHA types including reCAPTCHA v2, reCAPTCHA v3, hCaptcha, and image-based challenges.

Cost

CAPTCHA solving is billed by 2Captcha separately from your PageCrawl subscription. Typical costs are $1-3 per 1,000 CAPTCHAs solved. Check 2captcha.com/pricing for current rates.

Tips

Not all blocked pages use CAPTCHA. If you see 403 errors or bot protection challenges, try Stealth mode first
CAPTCHA solving adds a few seconds to each check while waiting for the solution
If a page always shows CAPTCHA, consider reducing check frequency to minimize costs

Bot Protection - Handle bot-protected pages
Page Loading Issues - Common loading problems and solutions
Custom Proxies - Use proxy servers to avoid blocks

Common Problems and Solutions for Page Loading Issues

2026-05-06T14:16:21+00:00

Common Problems and Solutions for Page Loading Issues

There may be various reasons why a page fails to open. This guide describes the most common problems and suggests solutions to help you overcome these issues.

Timeout

A timeout occurs when the page takes too long to respond. This may be a temporary issue with the page, or the page may be loading very slowly. Timeout limits vary depending on your plan:

Free plan: 45 seconds
Standard plan: 90 seconds
Enterprise plan: 180 seconds
Ultimate plan: 180 seconds

To avoid timeouts please consider subscribing to a paid plan or upgrading your plan.

Selector not found

This error will be shown if the page has changed significantly and element with configured XPath/CSS selector could not be found. In this case, you should review the page and update selector if needed.

Page blocked

Some pages may use site protection features to block scrapers and website tracking tools like PageCrawl.io. Different pages may use different blocking mechanisms, but here are the most common ones:

Access Restricted to Specific Countries Page may be configured to only allow visitors from a specific country.
- Solution: Specify a proxy location from a country that is not blocked. If you cannot find an available proxy, consider purchasing a proxy service for a specific country and configuring custom proxy in PageCrawl.io.
Proxy Location blocked The website may block the IP address of the proxy server PageCrawl is using.
- Solution: Use a residential proxy to avoid being blocked. PageCrawl has a built-in Residential Proxy option available on Enterprise and Ultimate plans. Alternatively, you can purchase a third-party proxy service and configure a custom proxy in PageCrawl.io.

401 or 403 Error

Most often indicates that PageCrawl.io Bot was not allowed to access the website. Use "Residential proxy pool" to avoid being blocked.

404 Page Not Found

In most cases this error indicates that page is no longer available to view. You should check and update the page URL.

500 Series error

500, 502, 503, 504 indicates that website server is not responsive, overloaded, currently in maintenance or experiencing server issues. If such error occurs, our bots will retry page check later.

Page Unreachable

The page can't be opened. In most cases website is down or the website in only reachable from a specific country

Site Protected with CAPTCHA

Pages may use CAPTCHA to protect the website from bots. To bypass this, you can use a service like 2Captcha which will use human workers to solve the captcha for you. PageCrawl.io has an integration with 2Captcha (you must be subscribed to an Enterprise or Ultimate plan) you can sign up for and configure the API token generated from 2Captcha.

Unknown Error

In some cases there could be an unexpected error that causes the PageCrawl bot to fail to check the page for changes. In case this error does not go away after a while, please contact support to notify us about the problem so we could prioritize the issue.

How to Easily Find XPath or CSS Selector in Major Browsers

2026-05-06T14:16:21+00:00

How to Easily Find XPath or CSS Selector in Major Browsers

If you encounter a problem with PageCrawl's visual selector and are unable to open the page you are trying to access, there is another option you can try. You can manually copy the selector by opening the desired page in your preferred web browser. This manual method may be more time-consuming, but it can provide a reliable solution if the visual selector is not functioning properly. Additionally, by manually copying the selector, you can have greater control over the elements on the page and the data you want to extract.

This guide will show you how to do it quickly and easily for Chrome, Firefox and Safari browsers.

XPath vs CSS Selector: Which One to Choose for Tracking?

When it comes to web scraping, finding the right element on a webpage can be a challenge. This is where expression languages like XPath and CSS Selector come in handy. These two powerful tools help you locate elements on a webpage, and choosing between them can be difficult.

Understanding XPath and CSS Selector

For those just starting out, CSS Selectors are the recommended choice due to their simplicity and versatility. Most advanced selectors can be written in CSS, making it a good option for web scraping beginners.

Relative vs Absolute Selector

When it comes to CSS and XPath Selectors, there are two ways to generate them: relative and absolute.

Relative selectors are preferred in most cases, as they are less prone to break.

Absolute selectors, on the other hand, are useful when tracking a large number of pages, and you are only interested in specific elements. However, with even a slight change in page layout, the selector will break. If an element is added or removed from a page, the absolute XPath will need to be updated to continue tracking the page contents.

Relative selectors tend to be short, while absolute selectors can be lengthy. Here are some examples of relative and absolute selectors for both CSS and XPath:

Relative XPath selector: //h2[@id='get-started']//span[1]
Relative CSS selector: h2[id='get-started'] span
Absolute XPath selector: //*[@id="root"]/section/section/main/div/main/div/div[5]/div/div/div/div/div[1]/div/table/tbody/tr[20]
Absolute CSS selector: #root > section > section > main > div > main > div > div:nth-child(6) > div > div > div > div > div.ant-table-container > div > table > tbody > tr:nth-child(20)

Generating Selectors with a Browser Extension

There are multiple browser extensions available that can help you copy CSS or XPath Selectors. Two options that we tried and can recommend include "SelectorsHub" and "SelectorGadget".

SelectorsHub is a browser extension available for all browsers that allows you to right-click on an element and copy the "Relative XPath selector" or "Relative CSS selector."
SelectorGadget, on the other hand, is only available for Chrome and offers a visual selector that allows you to click on elements and see the generated selector.

Generating Selectors Without a Browser Extension

If you prefer not to use a browser extension, you can also find CSS or XPath Selectors by inspecting an element. In most cases, you will get an absolute selector, and if the page content changes, you will need to update the selector.

In conclusion, choosing between XPath and CSS Selectors for web scraping comes down to your personal preference and level of experience. Both offer powerful tools for locating elements on a webpage, and with a little practice, you can become an expert in no time!

Steps to Find XPath or CSS Selector in Chrome Browser:

Right-click on the element on the web page you want to select.
Choose the "Inspect" option from the context menu.
The "Elements" tab in the DevTools window will open, displaying the HTML code for the page.
Right-click on the HTML code for the element you want to select and choose "Copy" from the context menu.
Choose "Copy XPath" or "Copy selector" to copy the XPath or CSS selector for that element.
If you selected "Copy full XPath", it will copy the absolute XPath (Check in section above: Relative vs Absolute Selector).
Paste the generated selector in PageCrawl.io Tracked Element field.

Steps to Find XPath or CSS Selector in Firefox Browser:

Right-click on the element on the web page you want to select.
Choose the "Inspect Element" option from the context menu.
The "Developer Tools" window will open, displaying the HTML code for the page.
Right-click on the HTML code for the element you want to select and choose "Copy XPath" or "Copy CSS Path" from the context menu.
Paste the generated selector in PageCrawl.io Tracked Element field.

Steps to Find XPath or CSS Selector in Safari Browser:

Enable the "Develop" menu in Safari by going to Safari > Settings > Advanced, and checking the "Show Develop menu in menu bar" option (called Preferences in older macOS versions).
Right-click on the element on the web page you want to select.
Choose the "Inspect Element" option from the context menu.
The "Web Inspector" will open, displaying the HTML code for the page.
Right-click on the HTML code for the element you want to select and choose "Copy XPath" or "Copy CSS Path" from the context menu.
Paste the generated selector in PageCrawl.io Tracked Element field.

Dealing with Website Language Changes When Monitoring Page for Updates

2026-03-26T05:33:22+00:00

Dealing with Website Language Changes When Monitoring Page for Updates

If you are reading this, you may have experienced the frustration of the language suddenly switching on your monitored page, causing false positive notifications. Unfortunately, the language behavior of a website is determined by the site developers, and there are several approaches they may use. Some websites base their language on the browser or system settings, which is the best option. Others guess the language based on the country information from the IP address, while others use a mixed approach. There are two approaches you can use to prevent the page language from changing.

Set the browser language

To prevent language switching from occurring when monitoring a website for changes, there are a few things you can do. One option is to set the browser language to a specific language, such as "Danish", in "Advanced Settings" by editing the tracked page configuration in PageCrawl. However, keep in mind that some bot detection services can detect this, so use this option only if absolutely necessary.

If you are using "Stealth Mode", be aware that setting the browser language may cause issues. Overriding the browser language can be inconsistent with what bot detection services expect, which may trigger blocks.

Use fixed IP address

Another option is to access the website from a fixed IP address by setting Proxy Location to "Fixed IP". This ensures that the same IP is used to check for changes on the page. However, if the proxy location gets blocked, PageCrawl may not be able to bypass the blocks and displays a crawl error.

Monitoring SEO Tags for Changes

2026-05-06T14:16:21+00:00

Monitoring SEO Tags for Changes

Optimizing your website for search engines requires effective monitoring of SEO tags. PageCrawl makes it easy to track changes to title tags, meta descriptions, canonical URLs, robots directives, Open Graph tags, and headings.

One-Click SEO Monitoring

The fastest way to monitor SEO tags is with the built-in SEO Tags tracking mode:

Log in to your PageCrawl account.
Click on Track New Page and enter the page URL.
Select SEO Tags as the tracking type.
Save and start monitoring.

PageCrawl will automatically extract and track:

Title tag
Meta description
Meta keywords (if present)
Canonical URL
Robots directive
H1 heading
Open Graph tags (og:title, og:description, og:image, og:url, og:type)

When any of these fields change, you will see exactly which tag was modified and what the previous and new values are.

If you plan to monitor SEO tags for multiple pages, we recommend creating a Template with the SEO Tags tracking type. This lets you reuse the configuration across many pages without repeating setup.

Advanced: Track Individual SEO Elements

If you only need to monitor specific SEO tags (rather than all of them), you can create individual tracked elements using CSS or XPath selectors.

Use "Text" for the following tracked elements:

SEO

Title: title
Meta description: /html/head/meta[@name="description"]/@content
Meta keywords: /html/head/meta[@name="keywords"]/@content
Meta robots: /html/head/meta[@name="robots"]/@content
Meta viewport: /html/head/meta[@name="viewport"]/@content

Social Media Tags

og:title: /html/head/meta[@property="og:title"]/@content
og:type: /html/head/meta[@property="og:type"]/@content
og:image: /html/head/meta[@property="og:image"]/@content
og:url: /html/head/meta[@property="og:url"]/@content

Use "Text (all matches)" for the following tracked elements:

Headings

h1 tags: h1
h2 tags: h2
h3 tags: h3
h4 tags: h4
h5 tags: h5

Tracking (outgoing) links for changes

2026-05-06T14:16:21+00:00

Tracking (outgoing) links for changes

You may also wish to track outgoing links that exist on the page. We suggest using "Text (all matches, sorted)" to capture links to other pages. You may use these selectors to track:

All links on the page

Use the following selector to track all links on a web page:

//a/@href

External Links

To track only external links (those not belonging to a specific website), use this selector:

//@href[not(contains(.,'not-this-website.com'))] Note: You should substitute 'not-this-website.com' with the website URL.

Links with Specific Keywords in the URL

If you want to track links containing specific keywords in their URLs, use this selector as an example:

//a[contains(@href,'/download/oursoftware_')]/@href

PDF Links

To specifically track links leading to PDF documents, you can use this selector:

//a[contains(@href,'.pdf')]/@href

Links with Text as Anchor Text

//a[contains(text(),'Download')]/@href Note: This selector is case-sensitive. e.g. if the text actually is "download", it will not be found

Links with Specific CSS Classes

If you want to track links with specific CSS classes, use this selector:

//a[contains(@class,'your-class-name')]/@href Note: You should substitute 'your-class-name' with the class.

Links with Specific Attributes

To track links with specific attributes (other than href), use this selector and replace "attribute-name" with the name of the attribute you're interested in:

//a[@attribute-name='attribute-value']/@href Note: You should substitute 'attribute-name' and 'attribute-value' with the relevant attribute values.

Monitoring Pages Behind Bot Protection

2026-04-07T05:09:15+00:00

Monitoring Pages Behind Bot Protection

Over 30% of websites now use bot protection services like Cloudflare, Akamai, and similar tools that block automated access. This means your monitored pages can stop returning data without warning.

PageCrawl provides multiple layers of protection to keep your monitors working, most of which happen automatically.

What Happens Automatically

PageCrawl handles most bot protection automatically. When a check fails, PageCrawl detects the block and adjusts its approach on the next attempt. This includes automatic retries, switching to stealth mode, and rotating through different proxy locations.

For most pages, you do not need to configure anything. The steps below are only needed if automatic handling does not resolve the issue.

How Do I Know If My Page Is Blocked?

PageCrawl will show a warning on the page if it detects a block. You may also notice that the captured content is empty, shows an error code (403, 401), or looks different from what you see when you visit the page yourself.

Troubleshooting Guide

Note: The settings below require Advanced mode. To enable it, click Edit on any page and toggle Advanced at the bottom of the form.

Follow these steps in order. After each step, wait for the check to complete before moving on.

Step 1: Enable Stealth Mode

This is the first thing to try and resolves most blocking issues.

Open the blocked page in PageCrawl
Click Edit
Scroll down and enable Advanced mode
Change Engine from "Default" to Stealth
Click Save - a check will trigger automatically
Wait for the check to complete and review the result

If the content now loads correctly, you are done. Stealth mode will be used for all future checks on this page.

Step 2: Change Proxy Location

If Stealth mode alone does not work, the site may be blocking the specific IP address or region.

Open the page and click Edit
Under Proxy Location, select Random
Click Save - a check will trigger automatically

Random proxy rotation means each check comes from a different IP address, making IP-based blocking ineffective.

You can also try specific locations (London, New York, San Francisco, Toronto, Frankfurt) if you know the site serves content differently by region.

Step 3: Use Residential Proxies

For sites with the strictest protections, residential proxies are the most effective option. These route requests through real consumer internet connections, making them virtually indistinguishable from regular visitors.

Open the page and click Edit
Under Proxy Location, select Residential
Select a country for the residential proxy
Click Save - a check will trigger automatically

Residential proxy traffic is available as an add-on. You can purchase residential proxy traffic directly from your PageCrawl account.

Note: Residential proxies consume traffic from your purchased balance. Each check uses a small amount of traffic depending on the page size.

Step 4: Use a Custom Proxy

If none of the built-in options work, you can use your own proxy server from a third-party provider.

Open the page and click Edit
Enable Advanced mode
Enter your proxy details in the Custom Proxy field (format: http://user:password@host:port)
Click Save and trigger a manual check

This is useful when you need a proxy from a specific country or provider, or when you already have a proxy subscription. See Custom Proxies for more details.

Quick Reference

Solution	How to Enable	When to Use
Stealth mode	Edit > Advanced > Engine: Stealth	First thing to try for any blocked page
Proxy rotation	Edit > Proxy: Random	When a specific IP is blocked
Residential proxy	Edit > Proxy: Residential	For the strictest access controls
Custom proxy	Edit > Advanced > Custom Proxy	When you need a specific provider or location

Still Blocked?

If you have tried all the steps above and the page is still not loading:

Double-check the URL - Make sure the URL is correct and the page is publicly accessible. Try opening it in a private/incognito browser window to confirm.
Purchase residential proxy traffic directly from PageCrawl if you have not already. This is the most effective solution for heavily protected sites.
Try a custom proxy from a third-party provider if you need a specific geographic location or a different proxy type.
Contact support - Email support@pagecrawl.io with the page URL and a description of what you see. We can review the specific page and suggest the best configuration.

Real Browser Mode - Engine selection including Stealth mode
Custom Proxies - Configure proxy servers
Residential Proxies - Purchase residential proxy traffic
Page Loading Issues - Other common loading problems

Monitor Changes in XML Files

2026-05-06T14:16:21+00:00

Monitor Changes in XML Files



    
        Gambardella, Matthew
        XML Developer's Guide
        Computer
        44.95
        2000-10-01
        An in-depth look at creating applications with XML.

pagecrawl.io offers an efficient way to monitor and track changes in XML files. Instead of sifting through the whole XML file for changes, which can be overwhelming due to frequent updates, you can focus on specific things that matter. This helps you avoid getting flooded with unnecessary alerts for minor changes like 'updated at' dates.

This guide will walk you through the process of setting up and utilizing this feature to simplify your tracking experience.

To reduce the number of false positive you may want to monitor a specific attribute (or multiple attributes), whether it was added, removed or changed.

Step 1: Getting Started

To begin tracking changes in XML files, follow these steps:

Access PageCrawl: Log in to your PageCrawl account or sign up if you're new to the platform.
Create a Monitored Page: Once logged in, navigate to the dashboard and click on the "Track New Page" button. This will initiate the setup process for monitoring pages for changes.

Step 2: Choosing Attributes to Track

Instead of monitoring the entire XML file, you can narrow down your focus to specific attributes that are relevant to you. For example, you might want to track changes in book names within an XML catalog.

Example XML File

Consider the following example xml file structure:



    
        XML Developer's Guide
        
    
    
        Dummy XML Developer's Guide

Step 3: Configuring Tracking Elements

Follow these steps to configure tracking elements for your XML file:

Select Tracked Element: Within the PageCrawl setup interface, choose the "Text (all matches)" as tracking element type.
Specify Element to Track: In this step, you'll specify the exact element within the XML that you want to track. For instance, if you're interested in changes to book titles, you'll set the element as title.

In this case, by focusing on the title element, you'll receive notifications only when book titles change, new is added or removed, filtering out less significant updates.

If you would like to also keep the full history of what has changed in the XML document but only be notified when a specific attribute changes, you can also add "Full Page" as the Tracked Element and then add a condition to be notified when the monitored attribute changes.

Daily, Weekly or Monthly Change Monitoring Reports

2026-05-06T14:16:21+00:00

Daily, Weekly or Monthly Change Monitoring Reports

Note: This feature is available on paid plans only.

With pagecrawl.io, you can group your monitors into scheduled briefings that compile every detected change into a single digest, delivered automatically on the cadence each audience expects. Reports turn raw change notifications into something stakeholders actually open and read: a clean, AI-summarized email with the most important items at the top.

Instant alerts on every change flood inboxes until people mute the channel. Scheduled reports solve that without losing the safety net for genuinely urgent items, which still escalate immediately to your channel of choice.

Why Scheduled Reports

A single workspace often serves several audiences with very different appetites for detail:

Marketing wants Monday morning competitor intel.
Legal wants a monthly compliance roundup.
Sales wants daily price movements across competitors.
Product wants weekly competitor product launches.
Your team wants to be paged the moment something critical lands.

Reports let you serve all of these from the same monitor set. Group monitors by tag, folder, or domain, then deliver a tailored briefing to each audience on their preferred schedule. High-priority changes still escalate instantly; everything else lands in the next digest.

When to Use Reports vs Instant Notifications

Scenario	Recommended
Monitoring a handful of critical pages	Instant notifications
Tracking 50+ competitor pages for pricing	Scheduled report (daily or weekly)
Legal/compliance pages that rarely change	Scheduled report (weekly or monthly)
Stock availability that needs immediate action	Instant notifications with escalation
Executive stakeholder updates	Scheduled report with AI summary
Onboarding a non-PageCrawl recipient (CEO, board, client)	Scheduled report (public share link)

You can mix both approaches. Monitors not assigned to any report continue to send instant notifications as usual. Monitors assigned to a report will only appear in digests, unless priority escalation is configured for urgent changes.

How Reports Work

Each report has four moving parts:

Scope. Which monitors are included. Match by tag, folder, domain, specific monitors, or all monitors in the workspace.
Schedule. When the digest is generated and sent. Daily, weekdays only, weekly, monthly, or on-demand.
Recipients. Who gets the email or notification. Workspace members, additional Cc emails, and channel webhooks.
Content. What goes in the digest: AI summary style, importance threshold, failing pages section, escalation rules, attachments.

Each generated digest is stored as a record in the workspace and gets a unique public share link, so anyone with the URL can view it in their browser without a PageCrawl account.

Setting Up Your First Report

Go to Settings > Workspace > Alerts & Reports.
Select the Scheduled Summary Reports tab.
Click Add Report and give it a name, color, and (optionally) an icon.
Pick the Scope: choose tag, folder, domain, specific monitors, or all monitors.
Pick the Schedule and delivery hour.
Add Recipients: workspace members and additional Cc emails. You can also wire in Slack, Teams, Discord, Telegram, or a custom webhook.
Configure Content: AI summary style, importance threshold, failing pages, attachments.
Save. Use Generate now to preview the next digest immediately.

For step-by-step instructions on each option, see the Scheduled Reports setup guide.

Choosing What to Include (Scope)

The scope determines which monitors feed into the digest. Five options are available:

All monitors — every monitor in the workspace. Useful for a single executive summary.
By tag — monitors carrying a specific tag, e.g. #competitors, #pricing, #legal. Easiest way to slice cross-cutting topics.
By folder — monitors inside a folder (and its sub-folders). Best when monitors are already organized hierarchically.
By domain — monitors whose tracked URL matches one or more domains. Useful when you want a per-vendor view.
Specific monitors — hand-picked list. For very small or very high-stakes reports.

Tag and folder scopes are dynamic: any monitor that picks up the tag or moves into the folder later will start appearing in the next digest, with no report change required.

Available Schedule Options

Schedule	Description
Daily	Receive a digest every day at your chosen hour
Weekdays only	Monday through Friday
Weekly	On a specific day of the week
Monthly	On a specific day of the month
On-demand only	Only generated when you manually click "Generate now"

All times are based on your workspace timezone.

Delivery Channels

A single report can ship to multiple channels at once. Each channel can override the workspace defaults if you want this report to use a different webhook or email list.

Email — primary recipients, plus Cc and Bcc lists. Recipients do not need a PageCrawl account.
Slack — posts a formatted message to a channel webhook.
Microsoft Teams — posts to an incoming webhook URL.
Discord — posts to a server webhook.
Telegram — sends to a chat or group via bot token.
Custom webhook — full JSON payload for your own automations or n8n / Zapier flows.

Channels can be enabled or disabled per report. If a channel is disabled or its webhook is missing, that channel is skipped without affecting the others.

AI Executive Summary

Every digest can include an AI-written summary at the top of the report. The summary is generated each time the digest is built, using the latest changes plus your workspace-level focus prompt for context. You choose the style that fits the audience.

Eight summary styles are available:

Headline — one short sentence (max 20 words) that captures the single most important takeaway. Best for chat notifications and SMS-style alerts.
Patterns — a concise paragraph (2-4 sentences) focused on cross-monitor trends, e.g. "three competitors raised prices". The default. Good for general updates.
Action briefing — leads with what the reader should DO (review, respond, monitor, ignore). Best for sales and ops teams.
Detailed executive summary — a thorough multi-paragraph breakdown with section headings, notable individual changes, affected areas, and likely causes. Best for weekly and monthly executive briefings.
Bullets — a markdown bullet list (5-10 bullets) with bolded category labels. Best when scanning matters more than narrative flow.
Changelog — a chronological log, newest first, formatted like a release-notes file. Best for product and engineering audiences.
Risk assessment — groups changes into High / Medium / Low risk with explanations of why each matters. Best for legal, compliance, and security teams.
Brief — a plain-text summary under 280 characters. Designed for chat notifications where formatting is stripped.

You can change the style per report at any time. The next digest will reflect the new style.

AI-Generated Dynamic Title

In addition to the summary, every digest gets a short, content-aware title generated by AI. Examples:

"5 price movements across tracked SKUs"
"2 new pages, 1 redesign, 1 announcement"
"4 policy updates this month"
"3 high-priority competitor changes this week"

The title appears in three places:

The email subject line, prefixed with the report name: Competitor Intel: 3 high-priority competitor changes this week.
The digest header as a sub-heading on the public web view and PDF.
The email body as a quick visual landmark above the AI summary.

Stakeholders can tell at a glance whether this digest is worth opening, without having to scroll.

Priority Escalation

Reports batch changes by design, but some changes are urgent enough that waiting for the next digest is unacceptable, like a competitor dropping prices 30% or a page being taken down. Priority escalation handles this.

When you enable escalation on a report:

Each change is scored by AI for importance.
If a change exceeds the threshold you set (e.g. score ≥ 90), it triggers an immediate notification through your chosen escalation channel: Slack, Teams, email, Discord, Telegram, or webhook.
The change still appears in the next digest, with a label indicating it was already escalated.

This means stakeholders subscribed to the digest channel are not woken up at 3am, but the people who need to act on critical changes are paged the moment they happen.

Importance Threshold and Content Filters

Each report can filter what makes the cut:

Minimum importance threshold — drop everything scored below a certain priority. Useful for executive reports where only important changes belong.
Collapse to latest — if a monitor changed multiple times during the period, show only the most recent change. Avoids cluttering the digest with intermediate states.
Group by domain — present changes grouped by website host, instead of priority. Best for digests that watch many vendors.
Workspace AI focus prompt — a free-text prompt the AI uses to bias importance scoring and summary generation toward your team's specific concerns ("focus on enterprise pricing changes", "ignore design system updates").

Pages Currently Failing

Most digests focus on what changed. The "Pages Currently Failing" section flips that and shows what isn't being checked successfully right now: monitors stuck on timeouts, blocked by bot protection, returning server errors, or hitting SSL issues. 404s are not included, since broken pages have their own dedicated section with replacement suggestions.

The list shows the page name, the current status (timeout, blocked, server error, etc.), and how long ago the last attempt happened. It's a single block at the bottom of the digest, off by default for new reports but easy to enable per report.

This is the difference between thinking your monitor is healthy because no email arrived, and knowing it's silently broken.

Comments and Inline Feedback

Every change in the digest carries a thumbs-up / thumbs-down pair, plus a comment field. Recipients (whether they have a PageCrawl account or not) can:

Mark a change as important (thumbs up). The AI uses this signal to bias future scoring on similar changes.
Mark a change as noise (thumbs down). Future similar changes get a lower priority and may be filtered out automatically.
Leave a comment on a specific change or on the digest as a whole. Other recipients see comments inline when they open the digest.

For teams that want a structured workflow, enable review board actions. Each change shows a board selector (To Review / Flagged / Reviewed) so the team can triage changes directly inside the digest without opening the dashboard.

Public Share Links

Every digest gets a unique URL that anyone can open in their browser without signing in. The URL is included in every email, and you can copy it from the digest header for pasting into Slack, a Notion doc, or a board deck.

Sharing options:

Public — anyone with the link can view.
Authenticated — only signed-in workspace members.

Links can be rotated (invalidates the old URL) or revoked at any time. Default expiry is 30 days.

Print, PDF, and Excel Export

Each digest is laid out for print. Open it in your browser, hit print, and you get a clean, paginated PDF for board decks, audit archives, or quarterly reviews.

Beyond print:

Excel export — a spreadsheet with every change, score, monitor, URL, timestamp, and AI summary, plus an overview sheet. Available as an email attachment (toggle per report) or on-demand from the digest page.
CSV — same data in a simpler format for analytics pipelines.
PDF — a one-click download from the digest page.

The Excel attachment is automatically skipped if the file would be too large for SMTP delivery, so the email itself never bounces.

Managing Recipients

Recipients live on the report, not the workspace. You can mix:

Workspace members — picked from a dropdown of users in the workspace.
Additional Cc Emails — any email address, no PageCrawl account required. The address must be verified once before it can receive reports.
Channel routes — Slack channels, Teams channels, Discord servers, Telegram chats, custom webhooks.

Each recipient slot can be set as To, Cc, or Bcc. Use Bcc when you want to send to a long list without exposing the recipient list (e.g., a board distribution).

If a report is misconfigured (no recipients, broken webhook), the workspace owner is notified after the first failure. Subsequent failures don't re-notify, so a broken report can't spam the owner.

Real-World Examples

Marketing — weekly competitor briefing. Scope: tag #competitors. Schedule: weekly, Monday 8am. Recipients: marketing team + CMO. Style: detailed executive summary. Escalation: enabled, threshold 90, channel Slack #competitor-intel.

Sales — daily pricing watch. Scope: tag #pricing. Schedule: daily, weekdays only, 7am. Recipients: VP Sales, RevOps lead. Style: bullets. Importance threshold: 50. Group by domain: on.

Legal — monthly compliance roundup. Scope: folder /vendor-legal. Schedule: monthly, 1st of the month, 9am. Recipients: General Counsel, compliance@. Style: risk assessment. Excel attachment: on.

Product — weekly launch radar. Scope: domains competitor1.com, competitor2.com, competitor3.com. Schedule: weekly, Friday 4pm. Recipients: product team + CEO. Style: action briefing. Priority escalation: on, threshold 80, Slack #product-radar.

Executive — monthly board pack. Scope: all monitors. Schedule: monthly, last Friday. Recipients: board distribution (Bcc). Style: detailed executive summary. Failing pages: on. Public share link: rotated each month.

Best Practices

Start with one report per audience, not per monitor set. Reports are cheap to add. Adding more later is easier than collapsing too-granular ones.
Use tags to slice across folders. Tags don't conflict with your folder hierarchy and can express cross-cutting concerns (#enterprise-watch, #regulatory).
Match the AI summary style to the audience. Headlines for execs, bullets for ops, risk assessment for legal, action briefing for sales.
Always enable priority escalation on the operational reports. Even daily digests miss things that need a same-hour response.
Use Cc for accountability, Bcc for distribution. Cc keeps everyone visible; Bcc protects long lists.
Test with "Generate now" before going live. A dry run catches misconfigured webhooks and unexpected scope before stakeholders see it.
Rotate the public share link if you suspect leakage. Old links stop working immediately.

How Reports Interact with Instant Notifications

When a monitor is assigned to any scheduled report, its instant workspace-level notifications are bypassed. Changes are collected and delivered in the next digest instead.

Exceptions: escalation alerts still fire immediately, and public subscriber notifications are unaffected. If you delete or disable a report, the monitors it covered go back to receiving instant notifications automatically.

Plan Limits

Standard plans include up to 2 reports. Higher-tier plans include unlimited reports plus on-demand generation, the full eight summary styles, custom AI focus prompts per report, the failing-pages section, and Excel attachments.

For exact limits, see the pricing page.

Check Scheduling

2026-05-06T14:16:21+00:00

Check Scheduling

Control when PageCrawl runs checks on your monitored pages by setting active days, hours, and check frequency. This is configured per workspace.

Available on paid plans.

Check Frequency

Set how often each page is checked. Available frequencies depend on your plan:

Plan	Minimum Interval
Free	Every hour
Standard	Every 15 minutes
Enterprise	Every 5 minutes
Ultimate	Every 2 minutes

Full frequency options: every 2 min, 3 min, 5 min, 15 min, 30 min, 45 min, hourly, every 2 hours, 3 hours, 6 hours, twice daily, daily, every 2 days, every 3 days, weekly, every 2 weeks, and monthly.

Workspace Schedule

Limit checks to specific days and times for an entire workspace:

Go to Settings > Workspace > Scheduling
Select which days of the week to run checks (Monday through Sunday)
Set the active hours (start and end time)
Hours are automatically converted to UTC based on your workspace timezone

When outside the scheduled hours or on inactive days, PageCrawl pauses checks for all pages in the workspace. Checks resume automatically when the next active period begins.

Email Digest

Instead of receiving individual notifications for each change, you can configure a daily email digest:

Go to Settings > Workspace > Notifications
Enable the daily digest
Choose the day and time for delivery

The digest summarizes all changes detected since the last digest was sent.

Bulk Edit - Change frequency and schedule settings across multiple pages
Check Limits - Understand plan check quotas
Advanced Configuration - Power User mode and per-page settings

PageCrawl.io + Zapier integration

2026-05-06T14:16:21+00:00

PageCrawl.io + Zapier integration

The integration of PageCrawl.io with Zapier takes web monitoring to the next level by automating tasks and connecting your web monitoring data to countless other applications. In this guide, we'll explore how to set up this powerful integration and unlock a world of possibilities.

Why Integrate PageCrawl.io with Zapier?

Zapier is an automation platform that connects your favorite apps and services, allowing them to work together seamlessly. By integrating PageCrawl.io with Zapier, you can:

Automate Workflow: Create "Zaps" to automate tasks triggered by changes detected by PageCrawl.io.
Extend Integration: Connect PageCrawl.io data to a vast array of other applications, enhancing its usefulness and allowing for more extensive analysis.
Improve Efficiency: Eliminate manual data entry and automate processes, saving time and reducing the risk of errors.

Setting Up PageCrawl.io + Zapier Integration

Here's a step-by-step guide to help you integrate PageCrawl.io with Zapier and enhance your web monitoring capabilities:

Step 1: Sign in to PageCrawl.io

If you're not already a PageCrawl.io user, sign up for an account.

Step 2: Configure A Page To Monitor

Set up the monitoring settings for the web page you're interested in tracking. Customize the elements you want to monitor and your notification preferences.

Step 3: Enable Zapier Integration

Visit the Integrations page and click Setup on the Zapier integration. In the modal that opens, click Open on Zapier to set up the Zapier + PageCrawl.io integration.

Step 4: Create a Zap in Zapier

Create a new Zap by clicking "Make a Zap.".
Search for "PageCrawl.io" and select it as your trigger app.
Choose the trigger event, such as "New Change Detected"

Step 5: Set Up Zap Actions

Define the actions you want to take when a trigger event occurs. This can include sending notifications, updating other apps, or performing custom actions.

Step 6: Activate Your Zap

Once you're satisfied with the setup, activate your Zap, and it will start automating tasks based on changes detected by PageCrawl.io.

n8n Integration - Open-source workflow automation
Webhook Integration - Send change data to any endpoint
API & Webhooks - Programmatic access

Store Website Changes on Google Sheets

2026-05-06T14:16:21+00:00

Store Website Changes on Google Sheets

Managing and tracking changes on websites is essential for various purposes, from monitoring competitors to ensuring your web services are running smoothly. PageCrawl.io simplifies this process by allowing you to effortlessly monitor web page changes and integrate the data directly into Google Sheets. In this guide, we'll explore how to set up this powerful integration to store website change history efficiently.

Why Store Website Change History on Google Sheets?

Google Sheets offers a versatile and collaborative platform for storing and analyzing data. By integrating PageCrawl.io with Google Sheets, you can keep all your web page change history in one place for easy access and analysis.

Setting Up PageCrawl.io Integration with Google Sheets

Here's a step-by-step guide to help you integrate PageCrawl.io with Google Sheets and start storing website change history effortlessly:

Log in to your PageCrawl account.
Navigate to the Settings -> Integrations section.
Click Setup on the Google Sheets integration. In the modal that opens, select your Google account and enter a spreadsheet name, then click Connect.
Once new changes are detected a new row will automatically be created in your Google Sheets document.

Common XPath Selectors to Use For Monitoring Websites Changes

2026-05-06T14:16:21+00:00

Common XPath Selectors to Use For Monitoring Websites Changes

XPath selectors are powerful tools that help you identify and extract specific elements on a web page. In this guide, we'll explore common XPath selectors that you can use when monitoring websites for changes to make your web monitoring efforts more effective.

Why Not CSS Selector?

CSS Selectors are favored by many web developers as they are easy to learn if you already know CSS syntax. On the other hand, XPath Selectors offer greater power and flexibility, such as the ability to find elements that contain specific text. However, the learning curve for XPath can be steeper. If you already know CSS - that's good, you should be able to use it for most use cases. If you don't know any, we recommend starting with XPath, since it can be more flexible.

XPath Cheat sheet

Here, you'll find a convenient 'cheat sheet' that comprehensively covers the most commonly used XPath selectors for your reference. We suggest taking a quick look through this list before proceeding to the Common XPath Selectors for Web Monitoring section below.

HTML Basics

Before we start, you should familiarize yourself with some fundamental concepts to better understand the terminology and functionality. Here are a few key terms:

Attribute: An attribute provides additional information about an HTML element. It is always specified in the start tag of an element and usually comes in name/value pairs like name="value". For example, in , href is an attribute name and https://example.com is its value.

Element: An HTML element is an individual component of an HTML document or web page. It is written with a start tag, with an optional end tag, and content in between. For example,

This is a paragraph

; here,

is the start tag,

is the end tag, and This is a paragraph is the content.

ID: The id attribute is used to specify a unique id for an HTML element. You cannot have more than one element with the same id in an HTML document. It is used for identifying and targeting the element with CSS and JavaScript. For example,

defines a division with a unique id of header.

Class: The class attribute is used for specifying a class name for an HTML element. Unlike the id attribute, the same class can be used on multiple elements. This is useful for applying the same styling or behavior to different elements. For example, assigns the highlight class to a span element, which can be targeted with CSS or JavaScript.

How to test the selector?

You might wonder where you can try the selector before pasting it in PageCrawl.io You should open browser console and use following commands to test your selector.

XPath

$x('//a')

CSS

document.querySelectorAll('a')

XPath Selector Basics

//: Selects all matching elements anywhere in the document.
/: Selects from the root element.
element: Selects elements with the specified name.
[@attribute]: Selects elements with the specified attribute.

Advanced XPath Selectors

[@attribute='value']: Selects elements with a specific attribute value.
[@attribute!='value']: Selects elements with an attribute value not equal to 'value'.
[starts-with(@attribute,'prefix')]: Selects elements with an attribute starting with 'prefix'.
[substring(@attribute, string-length(@attribute) - string-length('suffix') + 1) = 'suffix']: Selects elements with an attribute ending with 'suffix'. Note: there is no direct ends-with() function in XPath 1.0, so this workaround is needed.
[contains(@attribute,'substring')]: Selects elements with an attribute containing 'substring'.
[@attribute1='value1' and @attribute2='value2']: Selects elements that meet multiple attribute conditions.
[@attribute1='value1' or @attribute2='value2']: Selects elements that meet at least one of the attribute conditions.
not(expression): Negates a condition.

Text and Content Selection

text(): Selects the text content of an element.
contains(text(),'substring'): Selects elements containing specific text.
starts-with(text(),'prefix'): Selects elements with text starting with 'prefix'.
substring(text(), string-length(text()) - string-length('suffix') + 1) = 'suffix': Selects elements with text ending with 'suffix'. Note: ends-with() is an XPath 2.0 function and is NOT supported in browsers (which only support XPath 1.0). Use this substring() workaround instead.

Navigation and Hierarchy

/parent::element: Selects the parent of the current element.
/child::element: Selects the children of the current element.
/ancestor::element: Selects ancestors of the current element.
/descendant::element: Selects descendants of the current element.
[position()=1]: Selects the first matching element.
[last()]: Selects the last matching element.
[position()>2]: Selects elements after the first two.

Wildcards and Dynamic Selection

*: Selects all elements.
element[*]: Selects elements with at least one child element.
element[@*]: Selects elements with at least one attribute.
element[contains(@attribute,'value')]: Selects elements with attributes containing 'value'.

Functions

count(expression): Counts the number of matching elements.
sum(expression): Sums numeric values within matching elements.
concat(string1, string2): Combines two strings.
substring(string, start, length): Extracts a substring.
normalize-space(string): Removes leading/trailing spaces and collapses internal spaces.

Common XPath Selectors for Web Monitoring

Here are some common XPath selectors that you can employ when monitoring websites for changes. Initially, basic XPath selectors will be covered, and we will then proceed to more advanced examples.

1. Selecting Text

XPath allows you to target specific text elements on a webpage, which is useful for tracking changes in content, headlines, or paragraphs. For example:

//h1       // Selects all h1 headers on the page.
//p        // Selects all paragraph elements.
//div[@class='content'] // Selects text within div elements with a specific class.

2. Tracking Links

XPath selectors help you monitor links, whether you want to track all links on a page, external links, or links with specific text. For instance:

//a[@href]                  // Selects all links with an href attribute.
//@href[not(contains(.,'example.com'))] // Selects external links (replace 'example.com' with the target domain).
//a[contains(text(),'Download')]   // Selects links with specific anchor text, case-sensitive.

To view more examples with links, visit Tracking links with text tutorial.

3. Checking Images

To monitor images on a webpage, you can use XPath selectors to identify images by their source (src) attribute or alt text. For example:

//img               // Selects all image elements.
//img/@src          // Selects the src attribute of all images.
//img[contains(@alt,'logo')] // Selects images with specific alt text.

4. Handling Tables

XPath selectors are particularly useful for extracting data from tables, which are commonly used on websites for displaying structured information. For example:

//table                // Selects all tables on the page.
//table//tr             // Selects all table rows.
//table//tr/td[2]       // Selects the second column (td) in all rows.

5. Monitoring Specific Elements

You can target elements with specific attributes or attributes containing certain values using XPath selectors. For instance:

//*[@id='specificId'] // Selects elements with a specific ID attribute.
//*[@class='highlight'] // Selects elements with a specific class attribute.

6. Monitoring Elements where Text contains in Class or ID

To monitor elements when their class or ID contains a part of text, you can use XPath selectors with the contains() function. For example:

//*[contains(@class, 'partial-text')] // Selects elements with a class containing 'partial-text'.
//*[contains(@id, 'partial-text')]    // Selects elements with an ID containing 'partial-text'.
//input[starts-with(@name, 'user_')] // Selects input elements with names starting with 'user_'.
//input[contains(@id, 'search')]  // Selects input elements with IDs containing 'search'.
//button[contains(@class, 'btn-')] // Selects buttons with class names containing 'btn-'.

This XPath selector is particularly valuable, especially when dealing with CSS classes that include unpredictable or random text fragments.

For instance, suppose you want to extract the text 'Quality Choice' from an image, as shown in the example above. However, the CSS class, such as productTile_urgencyMessaging__V5DTS includes a suffix like __V5DTS that is prone to change with each website update.

To avoid having to update the selector each time website updates, you may employ the XPath contains() function to select an element.

//*[contains(@class, 'productTile_urgencyMessaging')] // Retrieve 'Quality Choice' text

7. Using Logical Operators

XPath supports logical operators for combining conditions. This is particularly useful for complex selections. For example:

//a[@class='external' or @class='external-link'] // Selects links with class 'external' or 'external-link'.
//div[@class='important' and contains(text(),'Alert')] // Selects divs with class 'important' containing 'Alert'.

8. Complex Expressions

You can create complex XPath expressions by combining multiple conditions and functions. This provides immense flexibility in your selections. For example:

//div[@class='content' and (contains(text(),'Important') or contains(text(),'Alert'))]
//table[not(@class='hidden')]/tbody/tr[td[2]='Completed']/td[3]

Using XPath Selectors in PageCrawl.io

To leverage these advanced XPath selectors effectively for website monitoring, you can integrate them with web monitoring tools such as PageCrawl.io:

Log in to your PageCrawl account.
Click on Track New Page, fill in the page URL then select Tracked Elements to track.
Select "Text" as tracked element and then specifying XPath selector to track.
Save & start monitoring page for changes.

Common Problems With Visual Selector

2026-05-06T14:16:21+00:00

Common Problems With Visual Selector

Occasionally, you might encounter challenges when using the Visual Selector tool. This guide outlines some common problems and provides solutions to help you resolve them.

Problem: Page Styles Are Not Displayed Properly

You can sometimes see a page loaded but missing some or all of their styles or elements on page.

Solution: To go around this issue you may try Enabling/Disabling JavaScript. If that does not help, you can always copy and paste the selector from your browser window.

Problem: Page Doesn't Load

In some instances, the Visual Selector tool may have difficulty loading certain pages. Our development team is continually working to enhance its compatibility. You may contact support to report a page that is not working.

Solution: If you encounter this issue, you can try pasting the selector directly from your web browser to work around the problem.

Problem: Visual Selector-Generated Selectors Frequently Change

The Visual Selector tool may generate CSS selectors that become obsolete when a website updates. In certain cases, websites intentionally modify CSS selectors or add suffixes to thwart page monitoring tools like PageCrawl.

Solution: For example, a selector like .productTile_urgencyMessaging__V5DTS might include a suffix like __V5DTS that is prone to change. To avoid having to update the selector each time the website changes you may use a specialized XPath function to search if class name contains:

//*[contains(@class, 'productTile_urgencyMessaging')]

Visit XPath tutorial for common selectors for more information how to create a XPath selector by yourself.

Problem: Uncertainty About Selector Method to Choose

We offer four selector generation methods:

CSS Selector: A short and unique CSS selector.
CSS (Other): An alternative CSS selector generation method that may produce different results on some pages.
Relative XPath: A short and unique XPath selector. XPath is more flexible than CSS.
Absolute XPath: A longer XPath that is more likely to break when page contents change significantly.

By default, you can use the CSS selector method. In some cases, generated CSS may be more effective on certain websites, while generated XPath works better on others. If you have expertise in writing CSS or XPath selectors, you have the flexibility to choose your preferred method and optimize it as necessary.

Looking to learn how to write a XPath selector yourself or explore common XPath selectors? Check out our XPath tutorial for common selectors. As a tip, you can also request ChatGPT to assist you in creating a CSS/XPath selector.

Complete Guide to Reducing False Positive Notifications When Monitoring Websites for Changes

2026-05-06T14:16:21+00:00

Complete Guide to Reducing False Positive Notifications When Monitoring Websites for Changes

False positive notifications can be frustrating when monitoring websites. These alerts signal changes that are either irrelevant or nonexistent, leading to wasted time and reduced efficiency.

When using PageCrawl to monitor website changes, the rate of false-positive alerts is typically low if pages are correctly configured. However, some detected changes may not be relevant to your specific monitoring needs. This comprehensive guide will show you how to effectively reduce unnecessary alerts and ensure you only receive notifications for meaningful changes.

1. Choose the Right Element to Track

Selecting the wrong type of element to monitor is one of the most common causes of false positives. With multiple monitoring options available, it's easy to get overwhelmed, especially if you're new to website monitoring.

Getting Started

Begin by tracking the text of the full page. This approach works best as a starting point for most monitoring scenarios, particularly when you need to monitor a large number of websites. If you notice frequent false positives, you can always revisit your setup and focus on specific page sections instead.

Optimizing Full Page Text Tracking

Monitoring Content Only is the first step to reduce false positives. This option filters out common page elements like headers, navigation menus, sidebars, and footers, focusing only on the main content area of the page. It's an effective way to eliminate noise from less relevant sections while still capturing most important content changes.

Reader mode takes content filtering a step further, similar to the reader mode you may have used on your phone. This mode monitors only the primary article text, using advanced algorithms to identify and extract the core content while filtering out everything else.

Reader mode is more restrictive than "Content Only" and works best for:

News articles and blog posts with clear article structure
Documentation pages with structured content
Research papers and academic content
Press releases and announcements
Tutorial and how-to articles
Terms of service and privacy policy pages
Legal documents and policy updates

However, Reader mode may not work well on:

Landing pages with mixed content types
E-commerce product pages with specifications, reviews, and pricing
Dashboard pages with multiple data sections
Pages with pricing tables, feature lists, or comparison charts
Forum discussions or comment sections
Complex layouts with multiple content blocks

Note: If you find that important content changes are being missed, consider switching back to "Content Only" for broader coverage.

When to Be More Selective

If tracking "Content Only" or "Reader mode" still results in unnecessary notifications, switch to the "Text" tracked element type and use our "Visual Selector" (click on the blue button) to pinpoint the exact area you want to monitor. Be aware that significant page redesigns can cause these selectors to stop working.

Advanced Tips:

AI Suggest feature: You may use "AI Suggest" when adding a new page to monitor. Describe what you want to monitor (e.g., "product price" or "availability status"), and PageCrawl's AI will suggest an optimal monitoring configuration for you.
Manual selectors: For maximum precision, manually create CSS or XPath selectors to track specific sections of the page. This approach works best for users with a technical background, but you can also use tools like ChatGPT to craft selectors by pasting the relevant HTML code.

2. Filter Out Irrelevant Updates

Websites frequently undergo minor updates, such as date changes, without substantial alterations to their content. These small updates can create unnecessary alerts that distract from meaningful changes. Here's how to avoid them.

Ignore Repeatedly Changing Text

In Timeline, when reviewing detected changes, you can select irrelevant text and ignore any line that contains the selected text. For example, if a page has a section with a latest news headline like "Latest News: Bitcoin has reached a new all-time high," you can select "Latest News" and all lines containing this text will be ignored in future change detections. If you monitor multiple pages on the same website, this will be applied to all pages with the same domain name.

Alternatively, you can add an "Ignore Text" condition or create a global filter (update your team settings) to ignore it across all pages. Use % as a wildcard to indicate that any line containing a %specific word% or sentence should be ignored.

Remove Specific Page Elements

If a specific page area keeps triggering change detections, add a "Remove page element" action and select an area to suppress it completely.

Remove Dates

Use the "Remove dates" action to replace dates with placeholders like [DATE REMOVED]. This prevents alerts for irrelevant updates like "updated 3 minutes ago" or publication timestamps such as "Updated at: 2025-02-25" that change frequently even when nothing was updated on the page.

Set a Change Threshold

You can configure a threshold to be alerted only when significant changes occur (e.g., when more than 1% of the page content changes). Before setting the threshold, review historic changes in Timeline to avoid setting it too high and missing important updates.

Ignore Numbers

If numeric changes aren't relevant to you, you can add an "Ignore numbers" condition in the "Conditions & Filters" section to prevent number changes from triggering change detections. This is particularly useful for pages with counters, view counts, or other metrics that change frequently.

3. Let AI Help Reduce False Positives

PageCrawl uses AI to analyze every detected change and help you focus on what matters most.

How AI Analysis Works

When a change is detected, our AI:

Summarizes the change in plain language so you can quickly understand what happened
Assigns a priority score to indicate how important the change likely is
Sorts your notifications so the most significant changes appear first

Provide Feedback on Changes

Use the feedback buttons to tell us which changes matter to you:

Thumbs up: This change is useful or important
Thumbs down: This change is noise or irrelevant

You can provide feedback:

On the page view when reviewing changes
Directly from email notifications using the quick-action links

4. Handling Dynamic Content

Dynamic websites load or update parts of their content after the initial page load. For example, prices, stock availability, or user-specific recommendations might load dynamically, leading to unnecessary notifications. Here's how to handle these scenarios.

Expand Collapsed Sections and Hidden Content

PageCrawl only captures text that is visible when in "Full-page text" mode. This can be problematic if the page contains collapsible sections (accordions, panels, etc.) that are only revealed when clicked.

To address this, add the "Reveal hidden text" action, which will automatically expand any collapsed sections on the page before capturing content.

Wait Until Page is Fully Loaded

PageCrawl waits until the page is fully loaded. However, in some situations, certain page elements only appear after additional time or after specific actions are executed (clicking, form submission, redirects, etc.).

You can add wait actions to ensure the page is completely ready before capturing content. Multiple "Wait" actions are available:

"Wait for Text to appear": Waits until specific text appears on the page.
"Wait for Text to disappear": Waits until specific text disappears from the page.
"Wait for page element to appear": Waits for a specific page element to become visible.
"Wait for Redirect": Waits for page redirects to complete. This is especially helpful when redirects are not immediate and take longer to process.
"Wait for Seconds": Waits between 1 to 9 seconds (least recommended option).

Note: Actions will wait up to 15 seconds before continuing (10 seconds for redirect waits). To avoid unnecessarily long wait times, different subscription tiers have varying timeout limits: Free (45 seconds), Standard (90 seconds), Enterprise (180 seconds), Ultimate (180 seconds). If loading takes longer than the timeout limit, the page will result in a timeout error.

5. Changes in Headers, Footers, and Sidebars

Frequently updated areas like footers, headers, and sidebars can result in irrelevant notifications. These sections often include changing elements such as timestamps, menus, or recent updates that are unrelated to the main content.

How to Avoid This

Switch to "Content Only": When tracking the full page, this option automatically filters out these less important areas. Change the Element from "Everything on the page" to "Content Only."
Remove Specific Elements: Use the "Remove page element" action with the selector nav,aside,footer,.footer,header to exclude them. This directly alters the page, and these areas will not be visible in screenshots. You may want to use this approach when using a Tracked Element other than "Full page text."
Focus on the Main Section: Track only the main content using the "Text" tracked element and the main selector. If no such element exists (e.g., the website is not semantically structured), you will see a "No selector found" error.

6. Page Errors or Blank Content

Occasionally, a monitored page may fail to load properly, leading to blank content or error messages. While PageCrawl detects these situations in most cases, it can still trigger false positives. This often happens when a website doesn't report errors properly, relies on external data sources that fail to load, or when dynamic content is not displayed correctly.

How to Avoid This

Use the "Mark Check as Failed When" action to flag a page as failed without recording changes. For example:

If a product's price unexpectedly drops to $0 due to an error and a message such as "Not available" is shown, PageCrawl can mark the page as failed instead of notifying you about a false change from $9.99 to $0.00.
- Add "Mark Check as Failed When" with "Text Contains" set to "Not available"

Additionally, customize the "Report Errors" setting to trigger only after a certain number of consecutive failures (e.g., after 10 consecutive failed checks) to avoid being overwhelmed by temporary issues.

If you check pages frequently, ensure the "Delay when Failed" setting is deactivated (in Advanced preferences) to prevent page failures from reducing the page-checking frequency.

7. Appearing/Disappearing Content

Websites may display varying content based on user sessions, location, or elements that frequently appear and disappear. This can lead to false positive notifications.

Smart Suggestions

Once sufficient sample data is collected, PageCrawl will automatically suggest filters to reduce false triggers. Look for the "Frequently changing content detected" panel on your monitored page.

You can:

Click on text fragments to add them to your ignore list
Click "Ignore all above" to ignore all suggested items at once
Use "Ignore all numbers" if numeric changes aren't relevant

Provide Feedback

For changes that slip through, use the thumbs down button to mark them as noise.

Additional Solutions

Ensure the page is fully loaded: Add a "Wait" action until specific text or elements appear on the page before capturing content.
Consider deactivating "Intelligent Reconnect" if the page content changes depending on the user's location or session (found under Advanced Preferences).

8. Cookie Banners and Overlay Popups (Default Settings)

By default, PageCrawl enables "Block cookie banners and ads" and "Hide website overlays and popups" actions to reduce unnecessary notifications. However, you can disable these settings if not needed.

Cookie Banners

Cookie banners often appear dynamically after the page loads, altering the content and triggering false positives.

Default Setting: Cookie banners are automatically suppressed during monitoring.
Optional: You can disable this feature in your settings if necessary.

Overlay Popups

Overlay popups, such as ads or newsletter subscription prompts, may appear sporadically and interfere with accurate monitoring.

Default Setting: PageCrawl hides overlay popups by default to ensure they don’t trigger false positives.
Optional: This feature can also be turned off if not required.

These default settings simplify the monitoring process but can be adjusted based on your specific needs.

9. Scroll-Triggered Content

Sometimes pages use animations to reveal content sections that only appear as you scroll down the page.

Solutions

Use the "Scroll to Bottom" action to automatically scroll to the bottom of the page before capturing content.
Use the "Disable JavaScript" action which will likely disable all animations. Note that this may cause issues with loading dynamic content on some websites.

Conclusion

By implementing these strategies, you can significantly reduce false positive notifications when monitoring websites with PageCrawl.

Quick wins for reducing false positives:

Start with "Content Only" or "Reader mode" for text tracking
Use the thumbs down button to mark irrelevant changes
Review and apply suggested filters when they appear
Set up appropriate filters for dates, numbers, and repeated text

Remember:

AI analysis helps prioritize important changes
Regularly review your settings and filters
Use the suggested actions when they appear
Test different approaches to find what works best for your specific use case

With proper configuration and ongoing fine-tuning, you'll achieve efficient and reliable website change monitoring.

If you're still experiencing issues with false positives after trying these solutions, don't hesitate to contact our support team for personalized assistance with your specific monitoring setup.

How to Track All Pages Within a Website

2026-05-06T14:16:21+00:00

How to Track All Pages Within a Website

PageCrawl.io is a powerful website changes monitoring tool designed to help you keep track of all the pages within your website effortlessly. One of its standout features is the ability to crawl and automatically discover all pages within a website, much like Google's indexing process. This article will guide you through the process of utilizing PageCrawl.io to effectively track and manage all pages within your website.

Creating a template within PageCrawl.io is the initial step to enable auto-discovery for tracking all pages within a website.

Setting Up Automatic Page Discovery

Create a Template:
- Provide Sample URL: Sample URL helps to automatically setup common parameters such as Base Discovery URL, filters and automatically detect sitemaps within the site.
- Activate Automatic Page Discovery: Enable this feature to automatically uncover new pages as they're added to the site.
- Choose Your Crawling Method:
  - Sitemap only: Perfect if tracked site has a sitemap.xml file detailing all pages.
  - Homepage links only: Start the crawl from your provided URL, discovering pages through links on the homepage.
  - Follow links 2 levels deep / Follow links 3 levels deep: Opt for an extensive exploration, ensuring maximum page coverage by following links across multiple levels. These options are available on Enterprise/Ultimate plans only.
  - Automatic (recommended): Uses all available methods for page discovery.
Configuration: Fine-tune additional settings like tracked elements to monitor, update frequency, and specific directories for inclusion or exclusion.
Apply and Save: Save your template settings and apply them to the relevant projects within your PageCrawl.io account.
Wait for newly discovered pages to appear in your PageCrawl.io account.

Leveraging Automatic Page Discovery for Thorough Tracking

Once your template is in place, PageCrawl.io systematically discovers and indexes all available pages within your website.

Review Discovered Pages: Once page discovery completes, navigate through a detailed list of discovered URLs within the dashboard.
Customized Monitoring: Set up tailored monitoring for specific pages or sections, configuring alerts to notify you of any modifications.
Content Change Insights: Review content changes over time to spot updates, removals, or additions across your monitored pages.
Optimization: Employ the insights gathered to optimize your website, refining user experience, enhancing SEO strategies, and rectifying any issues spotted during the crawl.

In Conclusion

PageCrawl.io's automatic page discovery feature simplifies the process of monitoring all pages within a website. By following these steps, efficiently manage, monitor, and stay updated on your website's content, ensuring an informed approach to website management.

For further guidance or inquiries, consult PageCrawl.io's support resources or reach out to their customer service team.

Happy tracking!

Hiding Popup Overlays When Monitoring Pages for Changes

2026-05-06T14:16:21+00:00

Hiding Popup Overlays When Monitoring Pages for Changes

When you visit a website for the first time, you may sometimes encounter an annoying ad or offer that overlays the content. While this is usually not a problem when monitoring websites for changes, it can still sometimes cause false-positive alerts if screenshots capture the content overlaid with the popup. These popups may only appear once, or for specific visitors or geographic locations.

The "Hide Website Overlays & Popups" Action

To mitigate false positives, we highly recommend using the "Hide website overlays & popups" action on affected pages. Keep in mind that this may not work on all pages.

Alternative Approach

If the "Hide website overlays & popups" action did not work, or if all content on the page becomes invisible, you can manually target the overlay with the "Remove page element" action to exclude it.

Keep an HTML Record of a Page Without Being Notified of Minor Changes

2026-05-06T14:16:21+00:00

Keep an HTML Record of a Page Without Being Notified of Minor Changes

When monitoring web pages, you might find it useful to keep a historical HTML record for future reference. However, minor changes, such as dynamic updates to attributes, styles, or tags, can often trigger unnecessary alerts. These changes, while technically present in the HTML, might not affect the visual representation or the substantive content of the page.

Focus on Text Content: By monitoring the text content of a page rather than its HTML structure, you can significantly reduce the number of false alerts. Text content changes are more likely to represent meaningful updates to the page.

Use Multiple Tracked Elements: You can add several tracked elements to a single page. This lets you record HTML changes for reference while only receiving notifications for the elements that matter most.

Set the "Do Not Trigger" Threshold

The "Do not trigger" option is a threshold setting on individual tracked elements. It records changes for that element without sending any notifications. Here is how to set it up:

Open the page editor and go to the Tracked Elements section.
Make sure you have at least two tracked elements (for example, a Text element and an HTML element). The "Do not trigger" option only appears when more than one tracked element is configured.
On the HTML tracked element, open the Threshold dropdown.
Select "Do not trigger".
Save the page.

With this configuration, PageCrawl will continue recording HTML changes in the timeline for future reference, but only changes on your other tracked elements (such as Text) will trigger notifications.

By carefully adjusting your monitoring settings, you can ensure that you are alerted only to significant changes that impact the content's meaning or visual presentation. This approach helps maintain the effectiveness of your monitoring efforts without the distraction of frequent, unnecessary notifications.

Can I pay by Crypto?

2026-03-05T10:31:13+00:00

Can I pay by Crypto?

Yes, we support cryptocurrency payments for Ultimate plans paid annually.

To arrange payment, please contact support at support@pagecrawl.io.

Automatically Discover New Pages To Track

2026-05-06T14:16:21+00:00

Automatically Discover New Pages To Track

PageCrawl is designed to make website change monitoring and management seamless. The "Discover New Pages" feature takes your change monitoring to the next level by automatically identifying new links, tracking changes, and ensuring your online presence remains up-to-date. In this guide, we'll delve into the capabilities of this feature, including its scanning methods, automated monitoring, and filtering options.

Automated Link Discovery

This feature performs automated scans of your website, identifying new links that have been added. This proactive approach keeps you informed about any changes to your website's link structure and updates.

Choice of Scanning Methods

PageCrawl provides multiple scanning methods to suit your needs. The default mode is Automatic (recommended), which combines methods to find new pages using the best approach for each website:

Automatic (recommended): Combines sitemap and link discovery to find pages using the best method for the website. This is the default and recommended setting.
Homepage Links Only: Discover new links by following links on the homepage. Available as a daily or weekly check. Useful if you want to focus on pages directly linked from the main page.
Sitemap Only: Discover pages listed in the website's sitemap. Most websites have a sitemap to help search engines find their pages, making this an efficient method for large sites.
Follow Links 2 Levels Deep: Follows links on the homepage, then follows links on those pages too. Available as a weekly check. Note: Only available on Enterprise and Ultimate plans.
Follow Links 3 Levels Deep: Follows links on the homepage, then follows links two more levels deep. Available as a weekly check. Note: Only available on Enterprise and Ultimate plans.
Deep Scan: Conduct a comprehensive analysis by visiting every accessible page on your website. This ensures that no new links go unnoticed, even on deeply nested pages. Note: Only available on Enterprise and Ultimate plans.

Filtering Options

Include Pages: Specify keywords or patterns that pages must contain to be included in monitoring. Useful for tracking specific types of content.
Exclude Pages: Define keywords or patterns that pages must not contain to be included in monitoring. Ideal for excluding pages that you are not interested in.

Configuring Automated Monitoring and Tracking

Create a Template

To start monitoring the website and automatically discover all new pages, configure a new Template which will serve as the basis for monitoring new pages.

Under "Sample URL address," enter an example page URL that you wish to track. The rest of the fields will be auto-filled for you.

Configure Tracked Elements

You may choose to monitor all pages on the website or only those with a specific structure (e.g., if you only want to track product pages and not other pages).

If you wish to monitor all pages, for Tracked Element configuration, select "Full-page Text."
To monitor pages with a specific layout, configure multiple Tracked Element configurations, such as product title, price, and description. If these elements do not exist on the page, the page will simply be skipped.

Enable "Discover New Pages" feature

Activate the "Discover New Pages" feature and customize any settings if needed.
Save the template and watch out for newly added pages when they become discovered
If there are too many irrelevant pages discovered, adjust filters and remove irrelevant pages.

File Checksum Monitoring

2026-03-05T10:31:12+00:00

File Checksum Monitoring

File Checksum Monitoring detects when any online file has been modified by comparing its SHA-256 hash. Unlike text-based monitoring, this works with any file type, including zip archives, images, videos, and binary files. When a change is detected, the original file is stored so you can download and compare versions.

What is SHA-256?

SHA-256 is a cryptographic hash function that produces a unique fingerprint for a file. If even a single byte changes, the hash changes completely, making it reliable for detecting modifications.

How It Works

You provide the URL of the file to monitor
PageCrawl downloads the file and calculates its SHA-256 checksum
On each subsequent check, the checksum is recalculated and compared
If the checksum differs, you receive a notification
The previous version of the file is saved for manual comparison

Setup

Click Track New Page
Paste the direct URL to the file
PageCrawl detects the file and shows checksum monitoring options
Choose your check frequency and notification preferences
Save

Supported File Types

Any file accessible via URL, including: zip, rar, psd, video, audio, images, and more. Maximum file size is 15 MB. Contact support if you need to monitor larger files.

Checksum vs Text Monitoring

Method	Best For	Shows Exact Changes
File checksum	Any file type (binary, images, archives)	No, only that the file changed
Text monitoring	PDF, Excel, Word, CSV, PowerPoint	Yes, line-by-line diff

If you need to see exactly what text changed in a document, use the dedicated text monitoring for PDF, Excel, Word, CSV, or PowerPoint files instead.

FAQ

How often are files checked? You can set the frequency from every 5 minutes to monthly, depending on your plan.
What if the file is no longer accessible? You will be notified with an error status.
Can I stop monitoring a file? Yes, disable or delete it at any time.

PDF Changes - Monitor PDF text changes
Excel Spreadsheets - Monitor spreadsheet text changes
Word Documents - Monitor Word text changes

Monitor Changes in Google Sheets, Docs, and Drive Files

2026-03-05T10:31:12+00:00

Monitor Changes in Google Sheets, Docs, and Drive Files

PageCrawl can monitor publicly shared Google Sheets, Google Docs, and other Google Drive files for text changes. When content is added, edited, or removed, you receive a notification with a diff showing exactly what changed.

Requirements

The Google file must be accessible via a shareable link. In Google Drive, set the sharing to "Anyone with the link can view" to allow PageCrawl to access the content.

Setup

Click Track New Page
Paste the shareable link to your Google Sheet, Doc, or Drive file
PageCrawl detects the file type and shows the appropriate configuration
Choose your check frequency and notification preferences
Save

Supported File Types

File Type	What Is Tracked
Google Sheets	Cell text content across all sheets
Google Docs	Full document text
Google Drive files	Text content (for supported formats like PDF, DOCX)

SharePoint Documents - Monitor Microsoft SharePoint files
Excel Spreadsheets - Monitor Excel file changes
Google Sheets Sync - Export change data to Google Sheets

Monitor Changes in Microsoft SharePoint Documents

2026-03-05T10:31:12+00:00

Monitor Changes in Microsoft SharePoint Documents

PageCrawl can monitor Microsoft SharePoint pages and documents for text changes. When content is added, edited, or removed, you receive a notification showing what changed.

Requirements

The SharePoint page or document must be accessible via a direct URL.

Setup

Click Track New Page
Paste the URL to the SharePoint page or document
Choose your check frequency and notification preferences
If the page requires login, select your authentication configuration
Save

What Can Be Monitored

Content Type	How It Works
SharePoint pages	Tracks text content changes on the page
Word documents	Extracts and compares text content
Excel files	Extracts and compares cell data
PDF files	Extracts and compares text content

Google Docs & Sheets - Monitor Google Drive files
Password-Protected Pages - Configure login authentication
Word Documents - Monitor Word file changes

Monitoring Changes in PDF Files

2026-03-05T10:31:13+00:00

Monitoring Changes in PDF Files

Monitoring text changes in PDF files can be essential for managing contracts, reports, or any important documents that may be frequently updated. Manually reviewing each document for changes can be time-consuming and prone to error. This is where PageCrawl.io comes in handy, offering an automated solution for tracking text changes in PDF files and notifying you whenever there’s an update.

Why Monitor PDF Files for Text Changes?

PDFs are often used for official or finalized documents, which means any change can be significant. Whether it's contracts, legal documents, or product manuals, keeping an eye on text changes ensures that you're always aware of important updates. Monitoring PDF files helps with:

Keeping track of contract modifications.
Ensuring that no important edits are made without your knowledge.
Detecting unauthorized changes in sensitive documents.

How PageCrawl.io Helps with PDF Monitoring

With PageCrawl.io, you can set up automated tracking for PDF files. It scans the text in your PDF files and alerts you whenever there’s a change, so you don’t have to sift through documents manually.

What if PDF does not contain text

If the PDF you want to monitor does not contain readable text you can use File checksum monitoring instead to check if the PDF has been modified or changed. The downside of such approach is that you will not be able to quickly glance what exactly has changed but you will need to review page by page.

Setting Up PDF Monitoring with PageCrawl.io

Setting up PDF monitoring is easy with PageCrawl.io. Here’s a quick guide:

Step 1: Sign in to PageCrawl.io

Step 2: Add a New Monitored Page

Navigate to the dashboard and click on the "Track New Page" button. Here, you can paste a link to the PDF file you want to monitor.

Step 3: Set Up Notifications & How often to check for changes

Customize how and when you receive notifications. You can choose to be notified immediately when text changes, or you can set up periodic checks if you want less frequent updates.

Tracking PDFs Embedded in Web Pages

Some websites display PDF documents directly within a web page using iframes. This is common for contracts, terms of service, financial reports, and other documents that are embedded alongside regular page content.

PageCrawl automatically detects embedded iframes when you add a page for monitoring. When setting up fullpage monitoring on a page that contains iframes, you will see an "Include embedded content" checkbox. Enabling this option tells PageCrawl to extract and track text from the embedded PDF along with the rest of the page content.

This means you can monitor both the surrounding web page and the embedded PDF document in a single monitor, receiving notifications whenever either part changes.

Bulk Edit Pages

2026-05-06T14:16:21+00:00

Bulk Edit Pages

Select multiple monitored pages and change their settings in one operation. Bulk edit is available on paid plans.

How to Bulk Edit

Go to your page list
Select pages using the checkboxes (or select all)
Click Bulk Edit in the toolbar
Choose what to change and apply

Available Bulk Operations

Operation	Description
Enable / Disable	Turn monitoring on or off for selected pages
Delete	Permanently delete selected pages and/or folders
Trigger check	Run an immediate check on all selected pages
Mark as seen	Clear the "changed" indicator on selected pages

Bulk-Editable Settings

Setting	Options
Check frequency	5 min to monthly (depending on plan)
Engine	Default, Stealth, or Fast
Proxy location	London, New York, San Francisco, Toronto, Frankfurt, Tel Aviv, Fixed IP, Random (EU Datacenter), or Random (US Premium). Residential Proxy is available as a separate paid option
Custom proxies	Paste your own proxy list
Notifications	Email, Slack, Telegram, Discord, Teams, or disable
Notification emails	Choose which verified emails receive alerts
Labels	Add or remove labels
Folder	Move pages to a specific folder
Template	Apply a monitoring template
Screenshots	Enable or disable
Intelligent Reconnect	Enable or disable automatic retry on failure
Device	Emulate a specific device viewport
Language	Set browser language
Ignored text	Add or replace text patterns to ignore
Full page selector	Choose between Everything on the page, Content only, or Reader mode
AI summaries	Enable or disable AI-powered change summaries
AI focus	Set custom AI instructions for what matters
AI tier	Basic or Pro (Pro requires Ultimate plan)
Cookie blocking	Add or remove cookie consent blocking
Overlay removal	Add or remove popup overlay hiding
Date exclusion	Add or remove date filtering
Number exclusion	Add or remove number filtering
Archive	Enable web archiving (Ultimate plan only)
Reveal hidden text	Enable or disable extraction of visually hidden text
Monitored keywords	Set keywords to highlight in change reports
Report errors	Enable or disable error reporting for failed checks
Delay when failed	Add a delay before retrying after a failed check
Authentication	Configure login credentials for password-protected pages
AI model	Choose the AI model used for change summaries
Record always	Always save check results, even when no change detected

Adding Pages in Bulk

Beyond editing, you can also add multiple pages at once:

Method	Description
Paste URLs	Paste a list of URLs (one per line) to add them all at once
Upload file	Import URLs from a CSV or Excel file
Website scan	Scan an entire website to discover and add pages automatically

Bulk Export

Select pages and export their data to Excel, including current values, change history, and configuration.

Labels, Folders & Workspaces - Organize your monitored pages
Advanced Configuration - Templates and Power User settings
Page Discovery - Automatically discover new pages to monitor

Organize Monitored Pages with Labels, Folders, and Workspaces

2026-05-06T14:16:21+00:00

Organize Monitored Pages with Labels, Folders, and Workspaces

PageCrawl provides three levels of organization for your monitored pages: labels for tagging, folders for grouping, and workspaces for separating entire environments.

Labels

Labels are color-coded tags you can attach to any monitored page. Each label has a name, optional description, and a color.

Feature	Details
Colors	Each label has a hex color, auto-generated if not specified
Multiple labels per page	Attach as many labels as needed
Filtering	Filter your page list by one or more labels
Bulk tagging	Apply labels to multiple pages at once via Bulk Edit
Workspace-scoped	Labels belong to a workspace and are not shared across workspaces

To manage labels, go to any page list and use the label filter, or manage them when editing a page.

Labels can also be applied automatically by AI. See AI Label Automation for details.

Folders

Folders let you group pages into a nested hierarchy with unlimited depth. Each folder belongs to a workspace and can contain both pages and sub-folders.

Feature	Details
Nested hierarchy	Create sub-folders at any depth
Page counts	Each folder shows the total number of pages, including those in sub-folders
Bulk move	Move multiple pages to a folder via Bulk Edit
URL slugs	Each folder has a unique slug for direct navigation

Workspaces

Workspaces are separate environments within your account. Each workspace has its own pages, folders, labels, notification settings, schedule, and integrations.

Feature	Details
Separate everything	Pages, folders, labels, webhooks, and settings are workspace-scoped
Team access	Invite team members to specific workspaces
Independent settings	Each workspace has its own notification channels, schedule, AI configuration, and integrations
Quick switching	Switch between workspaces from the sidebar

Use workspaces to separate monitoring by team, client, project, or environment (e.g., production vs staging).

Creating and Managing

Action	Where
Create a folder	Click the folder icon in the page list sidebar
Create a label	When editing a page, or via the label filter
Create a workspace	Settings > Team > Workspaces
Switch workspace	Sidebar workspace selector
Bulk assign labels/folders	Select pages > Bulk Edit

Bulk Edit - Apply labels, folders, and settings to multiple pages at once
Advanced Configuration - Templates and workspace settings
Check Scheduling - Configure per-workspace monitoring schedules

Real Browser Monitoring and Engine Selection

2026-05-06T14:16:21+00:00

Real Browser Monitoring and Engine Selection

PageCrawl renders web pages using a real browser, executing JavaScript and loading dynamic content exactly as a visitor would see it. You can choose between three engine modes depending on the page you are monitoring.

Available Engines

Engine	Best For	How It Works
Default	Most websites	Full browser with JavaScript rendering
Stealth	Bot-protected pages	Enhanced mode for reliably accessing protected pages
Fast	Static pages, speed	Optimized for speed when JavaScript rendering is not needed

Default Engine

The default engine loads pages using a real browser. It processes JavaScript, waits for dynamic content, handles cookies, and renders the page as a real user would see it. This works for the majority of websites.

Stealth Mode

Some websites use bot protection services that block automated access. Stealth mode is designed to reliably access these pages.

PageCrawl can automatically switch to Stealth mode in several situations: on the first check of a new monitor, when workspace auto-stealth is enabled, or for price and availability monitors. It also activates when a page is blocked (timeout, 403 Forbidden, or 401 Unauthorized). You can also enable it manually per page.

Fast Mode

Fast mode is optimized for speed when JavaScript rendering is not needed, making it significantly faster and more resource-efficient. Use this for:

Static HTML pages that do not rely on JavaScript
API responses and JSON endpoints
Pages where you only need text or HTML content
High-frequency monitoring where speed matters

Fast mode supports Full Page, Full Page (iframe), Text, Text (all matches), Text (all matches sorted), Number, Price, Rating, Reviews, HTML, HTML (all matches), Boolean/Text Presence, Availability, Links, Feed, and SEO Tags element types. It does not support Visual comparison, screenshots, or actions (click, scroll, type).

Choosing the Right Engine

Scenario	Recommended Engine
Standard website	Default
JavaScript-heavy SPA	Default
Bot-protected page	Stealth
Page returning 403 or timeouts	Stealth
Static HTML page	Fast
API or JSON endpoint	Fast
Need screenshots or visual diff	Default or Stealth
High-frequency checks (every 5 min)	Fast (if page allows)

Configuration

Set the engine per page in the page editor under Power User settings, or apply it in bulk via Bulk Edit.

Monitoring Pages Behind Bot Protection - Handling bot-protected pages
Custom Proxies - Use your own proxy servers
Advanced Configuration - Power User mode and engine selection

Add to PageCrawl.io bookmark

2026-03-05T10:31:13+00:00

"Add to PageCrawl.io" bookmark

What is This Bookmarklet?

This bookmarklet is a quick tool for adding any webpage to your PageCrawl.io account in one click. By saving and clicking the bookmarklet while browsing, you’ll instantly open the PageCrawl.io "Track New Page" form with the URL and title of the current page already filled in for you.

Why Use This?

If you often add new pages to PageCrawl.io, this bookmarklet can save you time by:

Skipping the need to copy-paste URLs and titles.
Reducing clicks to navigate through PageCrawl.io’s interface.
Allowing you to add new pages directly from the page you’re currently on.

How to Save the Bookmarklet

To save, simply drag the link above to your bookmarks bar, or right-click and select "Bookmark This Link."

Add to PageCrawl.io()%3B)

How to Use the Bookmarklet

When you’re on a page you want to track in PageCrawl.io:

Click the "Add to PageCrawl.io" bookmark in your bookmarks bar.
PageCrawl.io will open with the URL and title of the new page prefilled.
Review or edit the details as needed, then save the page to your account.

Monitor Page Changes via RSS Feeds

2026-05-08T12:17:31+00:00

Monitor Page Changes via RSS Feeds

PageCrawl can generate RSS feeds for your monitored pages, allowing you to follow detected changes from any RSS reader or automation tool.

Looking to monitor an existing RSS, Atom, or sitemap feed instead? See Feed Tracking Mode, which watches a feed URL for new items and notifies you about specific additions, removals, and changes.

How RSS Feeds Work

Each RSS feed has a unique URL with an access code. When a monitored page detects a change, the feed is updated with the new entry. Feeds follow the Atom format and can be consumed by any standard RSS reader.

You can create feeds scoped to:

All pages in workspace - Get a combined feed of all changes across the workspace
By tags - Include only pages with specific tags
By folders - Include only pages in specific folders
By website/domain - Include only pages from a specific domain
Specific monitors - Track changes on individually selected monitors

Setting Up an RSS Feed

Go to Settings > RSS Feeds
Click Create Feed
Choose a scope (all pages, by tags, by folders, by website/domain, or specific monitors)
Copy the generated feed URL

The feed URL contains a unique access code, so anyone with the link can view the feed without logging in. Keep feed URLs private if the monitored content is sensitive.

Using Your Feed

Add the feed URL to any RSS-compatible tool:

Tool Type	Examples
RSS readers	Feedly, Inoreader, NewsBlur
Automation platforms	n8n, Zapier, Make
Dashboards	Custom widgets, internal portals
Browser extensions	RSS reader extensions for Chrome or Firefox

Managing Feeds

Action	How
List feeds	Go to Settings > RSS Feeds
Create feed	Click Create Feed and select options
Delete feed	Click the delete button next to the feed

Feed Tracking Mode - Monitor an existing RSS, Atom, or sitemap feed for new items
API & Webhooks - Programmatic access and real-time webhooks
Webhook Integration - HTTP POST notifications for changes
Slack Notifications - Get change alerts in Slack

API and Webhooks for Custom Integrations

2026-05-06T14:16:21+00:00

API and Webhooks for Custom Integrations

PageCrawl provides a REST API and webhook system for integrating page monitoring into your own applications and workflows. Use the API to manage monitors programmatically and webhooks to receive real-time notifications when changes are detected.

Available on paid plans.

Authentication

All API requests require a Bearer token. Find your API key in Settings > API.

Include it in the Authorization header:

Authorization: Bearer YOUR_API_KEY

API Endpoints

Method	Endpoint	Description
`GET`	`/api/pages`	List all monitored pages
`POST`	`/api/pages`	Create a new monitored page
`GET`	`/api/pages/{slug}`	Get page details and latest values
`PUT`	`/api/pages/{id}`	Update page settings
`DELETE`	`/api/pages/{id}`	Delete a monitored page
`PUT`	`/api/pages/{id}/check`	Trigger an immediate check
`PUT`	`/api/pages/{id}/status`	Enable or disable a page
`GET`	`/api/pages/{id}/history`	Get check history for a page

For the full API reference with all endpoints, parameters, and response schemas, see pagecrawl.io/developers.

Webhooks

Webhooks send HTTP POST requests with a JSON body to your endpoint whenever a page change is detected or an error occurs. Configure webhooks in Settings > Workspace > Integrations > Webhooks.

Setting	Description
Target URL	The HTTP endpoint that receives the POST request
Event triggers	Change detected, error, or both
Page filter	Limit to a specific page, or fire for all pages in the workspace
Payload fields	Select which fields to include (all by default)

Available payload fields include page ID, title, change summary, diff data, screenshots, AI summary, AI priority score, and more. See the Webhook Integration guide for the full field reference and example payloads.

Common Use Cases

Custom dashboards - Pull change data into your own monitoring dashboard via API
Automation workflows - Trigger actions in n8n, Make, Zapier, or custom scripts via webhooks
Database logging - Store all detected changes in your own database
Alerting systems - Forward high-priority changes to PagerDuty, Opsgenie, or similar

Full API Reference - Interactive OpenAPI reference with every endpoint and schema
Webhook Integration - Detailed webhook setup, payload reference, and testing
Zapier Integration - Connect PageCrawl to 5,000+ apps
n8n Integration - Open-source workflow automation
RSS Feeds - Subscribe to changes via RSS

How to Monitor Pages That Require OS Selection

2026-03-05T10:31:13+00:00

How to Monitor Pages That Require OS Selection

When monitoring pages that adjust their content based on the user's operating system, like those displaying OS-specific downloads or drivers, you might encounter challenges. Some sites perform OS detection and require interaction to display the desired information. Here's how you can effectively monitor such pages using PageCrawl.io.

Two Approaches to Handle OS Detection

There are two main ways to handle pages that require OS selection:

1. Set a Custom User Agent

You can configure PageCrawl to use a specific User Agent string that mimics a Windows browser. This approach is simple and works for most basic OS detection scenarios.

How to set it up:

Navigate to your page's Advanced Preferences

Set the User Agent to a Windows 10/11 browser string, for example:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.199 Safari/537.36

Advantages:

Quick and easy to implement
Works reliably for basic OS detection
No complex configuration required

Limitations:

Cannot distinguish between Windows 10 and Windows 11
May not work with sophisticated detection methods
Limited control over specific OS version selection
Older User Agent versions may be blocked by security/bot detection tools used by websites

2. Use Actions to Interact with OS Selection Forms

For pages with dropdown menus or forms where you need to select a specific OS version, you can use PageCrawl's Actions feature to automate the selection process.

How to set it up:

Navigate to your page's Actions settings
Create click actions on the appropriate selectors
Configure the sequence to:
- Click on the OS dropdown/selector
- Select your specific OS version
- Submit the form if required

Example scenario: If a driver download page has a form with OS selection dropdown, you can:

Add an action to click on the OS dropdown selector
Add an action to click on "Windows 11" option
Add an action to click the submit button

Advantages:

Precise control over OS version selection
Can handle complex multi-step forms
Works with any type of OS selection interface

Limitations:

More complex to set up initially
May need adjustments if the page structure changes
Requires identifying the correct CSS selectors

Which Method Should You Choose?

Use the User Agent method if:
- The site only needs basic OS detection
- You don't need to distinguish between specific OS versions
- You want a quick, maintenance-free solution
Use the Actions method if:
- You need to select a specific OS version (e.g., Windows 11 vs Windows 10)
- The page has a form or dropdown for OS selection
- The User Agent method doesn't work for your specific page

Available Tracked Element Types

2026-05-06T14:16:21+00:00

Available Tracked Element Types

When monitoring changes on a webpage, the type of tracked element selected defines what kind of content will be tracked and how updates are detected. You may use multiple tracked elements for each monitored page to monitor different areas of the page. Below is a detailed breakdown of the different tracked element types:

Commonly Used Tracked Element Types

1. Full Page Text

Description: Tracks all visible text on the entire webpage.
Use Case: Useful for capturing comprehensive textual content.

2. Text

Description: Monitors text changes in a specified area of a webpage.
Important Note: Only the first element matching the selector is tracked.
Use Case: Ideal for tracking text in specific areas, like headlines or descriptions.

3. Number

Description: Extracts and monitors numeric values in a specific webpage area.
Features: Provides basic statistical analysis and visual graphs.
Use Case: Useful for tracking numbers, such as stock levels or scores.

4. Visual

Description: Monitors and alerts on visual changes in a specified area.
Note: This is a beta feature; report any issues encountered.
Use Case: Ideal for tracking visual changes like layout updates or style changes.

Page Areas

1. Price

Description: Detects and extracts the first price found on the page.
Limitation: May not work well on pages with multiple prices.
Use Case: Monitoring product prices on e-commerce websites.

2. Links

Description: Tracks internal and external links originating from a webpage.
Use Case: Ideal for monitoring link changes on resource-heavy websites.

3. Iframes

Description: Monitors embedded content within elements.
Important Note: May cause issues in some cases if “Hide cookie banners & block ads” is enabled.
Use Case: Useful for monitoring third-party embedded content.

Files

1. PDF File

Description: Tracks text content within PDF files.
Limitation: Use "File Checksum" if text extraction is not possible.
Use Case: Monitoring changes in documents like manuals or policies.

2. Word File

Description: Tracks text content within Word documents.
Use Case: Ideal for tracking updates in editable text documents.

3. Excel and CSV Files

Description: Monitors content within spreadsheets.
Use Case: Useful for tracking data changes in structured formats.

4. File Checksum

Description: Computes and compares SHA-256 checksums to detect file changes.
Limitation: Does not preview specific changes; manual review required.
Use Case: Best for unsupported file formats or non-readable PDFs.

Multiple Matching Elements

1. Text (All Matches)

Description: Tracks all elements matching the selector (not just the first).
Use Case: Useful for tracking lists, tables, or repeated content blocks.

2. Text (All Matches, Sorted)

Description: Similar to “Text (All Matches)” but sorts results alphabetically.
Use Case: Reduces false positives for frequently reordered elements like product listings.

3. HTML (All Matches)

Description: Tracks all matching HTML elements on the page.
Use Case: Ideal for monitoring multiple dynamic sections.

E-Commerce and Product Tracking

1. Availability

Description: Tracks the availability status of a product on the page.
Use Case: Monitoring whether a product is in stock, out of stock, or on pre-order.

2. Rating

Description: Tracks the product rating displayed on the page.
Use Case: Monitoring changes to product ratings on review sites or e-commerce platforms.

3. Reviews

Description: Tracks the review count displayed on the page.
Use Case: Monitoring how many reviews a product has received over time.

Feeds and Data Sources

1. Feed/List

Description: Tracks entries from RSS or Atom feeds, detecting new, removed, or changed items.
Use Case: Monitoring blog feeds, news feeds, or any structured list for new entries and updates.

2. WHOIS Record

Description: Tracks domain WHOIS registration data including registrar, expiration date, and name servers.
Use Case: Monitoring domain ownership changes, expiration dates, or registrar transfers.

3. SEO Tags

Description: Tracks key SEO-related elements on a page including the title tag, meta description, canonical URL, robots directives, H1 heading, and Open Graph tags.
Use Case: Monitoring competitor SEO changes, ensuring your own pages maintain correct metadata, or detecting unintended SEO regressions.

Advanced Tracked Element Types

1. Text Presence

Description: Searches the full page for specific keywords and returns a simple Yes/No result.
How it Works: Enter comma-separated keywords. Returns "Yes" if ANY keyword is found on the page, "No" otherwise. The search is case-insensitive.
Invert Option: Enable "Invert" to reverse the logic - returns "Yes" when NONE of the keywords are found.
Use Cases:
- Stock Availability: Monitor for "sold out", "out of stock" keywords
- Product Status: Track "discontinued", "pre-order", "coming soon" status
- Content Monitoring: Detect when specific text appears or disappears
- Back in Stock Alerts: Invert "sold out" to detect when product becomes available
- Compliance: Check for required disclaimers or legal text
Best Practice: Combine with other tracked elements (like Price or Text) to get both the status and the content.

2. HTML

Description: Monitors changes in the HTML content of a specific section.
Important Note: Focus on narrowly defined areas to avoid false positives.
Use Case: Useful for tracking changes in webpage structure or layout.

3. JavaScript

Description: Executes a JavaScript function to return results.
Skill Level: Requires programming expertise.
Use Case: Ideal for advanced users needing custom tracking logic.

Each tracked element type serves a unique purpose. Understanding these differences helps select the right type for specific monitoring needs, ensuring accuracy and reducing false positives. For more detailed guidance, refer to the tooltips within the interface or contact support for assistance.

AI-Powered Change Detection and Smart Filtering

2026-05-06T14:16:21+00:00

AI-Powered Change Detection and Smart Filtering

PageCrawl.io includes AI-powered analysis for all users. Every plan comes with monthly AI credits that work automatically with zero setup. When a page changes, AI summarizes what happened and scores how important the change is, so you only get notified about what matters.

For users who need more, you can also bring your own API key (BYOK) for unlimited AI usage and full model control.

AI Credits

Every plan includes monthly AI credits:

Plan	Monthly Credits
Free	15
Standard	200 (scales with quantity)
Enterprise	1,000 (scales with quantity)
Ultimate	10,000 (scales with quantity, includes Pro tier)

Credits are based on page size. Each 4,000-token block costs 1 credit on Basic tier or 10 credits on Pro tier (Ultimate plan only). A typical blog post uses 1-2 credits. Credits reset monthly.

When credits run out, page monitoring continues normally, but AI summaries and importance filtering pause until the next billing cycle. You can also switch to BYOK at any time for unlimited usage.

Getting Started

No setup is required. AI features are enabled by default for all workspaces:

Add pages to monitor as usual
When changes are detected, AI automatically summarizes them and assigns importance scores
View your credit usage in Settings > Workspace > Integrations > AI

Workspace-specific: AI features are configured per workspace. You can have some workspaces with AI enabled and others without.

How AI Features Work

Feature	Process
Summarization	Change detected > Content sent to AI > Human-readable summary generated > Included in notification
Importance Scoring	Change detected > AI analyzes content > Priority score assigned (0-100) > Low-priority changes filtered
Label Automation	Change detected > AI evaluates your label rules > Labels automatically added or removed

Configuration

Available for All Users

Setting	Description
Custom Instructions	Teach AI what matters for your monitoring (max 2,000 chars)
Summary Language	Generate summaries in 19 languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Ukrainian, Russian, Japanese, Korean, Chinese, Arabic, Hindi, Turkish, Lithuanian, Latvian, and Estonian
Notification Threshold	Set threshold (0-100) for Importance Scoring. Changes scoring below this still get tracked but do not trigger notifications.

Additional BYOK Settings

These settings are available when using your own API key:

Setting	Description
Deep Analysis	Send full page content to AI for better context. Uses more tokens but provides more accurate analysis. When disabled, only the changed text (diff) is sent.
Run on First Check	Get AI analysis on the initial page check, before any changes are detected
AI Requests Per Month	Set a monthly cap to control costs. When the limit is reached, AI features pause until the next month. Leave empty for unlimited.
Per Page Per Day	Limit how many AI analyses a single page can trigger in 24 hours. Prevents noisy pages from consuming your entire budget. Default: 10.
Max Tokens	Limit content size per request. If content exceeds this limit, AI analysis is skipped for that change.

Understanding Tokens

A token is roughly 4 characters or about 3/4 of a word. With included credits, each 4,000-token block counts as 1 credit.

Page Type	Typical Tokens
Simple (blog, article)	~1,000-2,000
Medium (product, news)	~2,000-5,000
Large (documentation)	~5,000-10,000

Using Your Own API Key (BYOK)

If your included credits are not enough, or you want full control over model selection, you can connect your own API key from OpenAI, Google Gemini, Anthropic, or OpenRouter.

Go to Settings > Workspace > Integrations > AI
Select your AI provider and enter your API key
Click Test Connection to verify
Choose your preferred model and save

When using BYOK, AI credits are not consumed and you pay your AI provider directly. See the BYOK Setup Guide for detailed instructions.

Best Practices

Start Small

AI is enabled by default, so monitor your credit usage for the first few weeks
Check usage statistics in Settings > Workspace > Integrations > AI
If you need more credits, upgrade your plan or connect your own API key

Optimize Credit Usage

Use Custom Instructions to help AI focus on what matters
A daily cap of 10 analyses per page prevents noisy pages from consuming your budget
For high-volume monitoring, consider BYOK with a budget model like Gemini Flash-Lite

Choose the Right Mode

Scenario	Recommendation
Getting started	Use included credits (no setup needed)
High-volume pages	Enable Importance Scoring to filter noise
Technical pages	Enable Summarization for readable changes
Need unlimited AI	Connect your own API key (BYOK)
Critical pages	Use BYOK with premium models (GPT-4.1, Claude Sonnet)

AI Label Automation

AI can automatically apply or remove labels on detected changes based on rules you define. Instead of manually categorizing changes, the AI reads each change and decides which labels to add or remove according to your instructions.

How to Set It Up

Go to Settings > Workspace > Labels
Scroll to the AI Label Automation section
Click Add Rule to create a label/instruction pair
For each rule, choose a label name and write a plain-language instruction explaining when the AI should apply it
Click Save Changes

You can configure up to 10 label rules per workspace.

How It Works

Each time a change is detected and AI analysis runs, the AI evaluates the change against your label rules and decides which labels to add or remove. The AI receives the current labels on the page, so it can remove labels that no longer apply (e.g., removing "Out of Stock" when a product is back in stock).

Labels are applied to the change record, making them available for filtering on the Review Board and in your page list.

Example Rules

Label	Instruction
Breaking News	Apply when urgent or breaking news appears
Policy Update	Apply when terms, policies, or legal text changes
New Event	Apply when a new conference or event is announced
Job Posted	Apply when new job listings are added
Content Removed	Apply when significant content is deleted from the page

Important Notes

AI can only manage labels defined in your automation rules. Manually applied labels are never touched.
Label names have a maximum of 50 characters; instructions have a maximum of 500 characters.
Labels are created automatically if they do not already exist in your workspace.
AI Label Automation requires AI to be configured for the workspace (either included credits or BYOK).
Label decisions run as part of the standard AI analysis, so no additional credits are used beyond the normal change analysis.

Security and Privacy

Aspect	Details
Included credits	Content is processed through PageCrawl's managed AI infrastructure
BYOK mode	Content is sent directly to your chosen AI provider
Storage	AI summaries stored in PageCrawl.io for your reference
Security	All transmission via HTTPS, API keys encrypted at rest
Provider policies	Review your AI provider's data usage and retention policies when using BYOK

AI Integration Setup Guide (BYOK) - Step-by-step guide to configure your own API keys for unlimited AI usage
Choosing the Right AI Model for Website Monitoring - Compare models and pricing for BYOK users

AI Integration Setup Guide - Bring Your Own Key (BYOK)

2026-05-06T14:16:21+00:00

AI Integration Setup Guide - Bring Your Own Key (BYOK)

All PageCrawl.io plans include AI credits that work automatically with no setup required. This guide is for users who want to go beyond their included credits by connecting their own API key for unlimited AI usage, full model choice, and advanced features like Deep Analysis.

When to Use BYOK

Most users won't need BYOK since all plans include AI credits. Consider BYOK if you:

Run out of credits regularly and need unlimited AI analyses
Want to choose a specific AI model for different page types
Need Deep Analysis mode (sends full page content for better context)
Want to use premium models like GPT-5.2 or Claude Opus 4.6 for critical pages
Monitor sensitive content and need a specific provider's data policies

You can switch between credits and BYOK at any time in your settings.

Supported Providers and Models

Provider	Recommended Model	Best For	Get API Key
OpenAI	GPT-5 Mini	Best value for most users	platform.openai.com
Google Gemini	Gemini 3 Flash	Balance of quality and cost	ai.google.dev
Anthropic	Claude Haiku 4.5	Fast and accurate	console.anthropic.com
OpenRouter	Any model	Access 200+ models via single API	openrouter.ai

OpenAI Models

Model	Use Case	Notes
GPT-5 Mini	Most users	Best balance of cost and quality
GPT-5.2	Complex analysis	Most capable, higher cost
GPT-5 Nano	High volume	Fastest and cheapest

Google Gemini Models

Model	Use Case	Notes
Gemini 3 Flash	General use	Good balance, default
Gemini 3.1 Pro	Complex tasks	Premium quality
Gemini 3.1 Flash Lite	Budget monitoring	Most affordable option
Gemini 2.5 Flash	Legacy	Still available, good balance

Anthropic Claude Models

Model	Use Case	Notes
Claude Haiku 4.5	Most users	Fast and cost-effective
Claude Sonnet 4.6	Complex tasks	Better quality, higher cost
Claude Opus 4.6	Critical apps	Highest accuracy
Claude Sonnet 4.5	Legacy	Still available
Claude Opus 4.5	Legacy	Still available

OpenRouter

OpenRouter provides unified access to 200+ AI models from multiple providers through a single API key.

Feature	Description
Unified billing	One account for all models
Automatic fallbacks	Switches models if one is unavailable
Free models	Access to Llama, Mistral, Qwen community models
Pricing	5.5% platform fee on top of base model costs

Recommended models: openai/gpt-5-mini, anthropic/claude-sonnet-4.6, google/gemini-3-flash-preview

Step-by-Step Setup

Step 1: Get Your API Key

Provider	Steps
OpenAI	Visit platform.openai.com > Create account > API Keys > Create new secret key > Add billing
Google Gemini	Visit ai.google.dev > Sign in with Google > Create project > Enable Gemini API > Generate API key
Anthropic	Visit console.anthropic.com > Create account > API Keys > Create new key > Add credits
OpenRouter	Visit openrouter.ai > Create account > Settings > API Key > Add credits

Step 2: Configure in PageCrawl.io

Go to Settings > Integrations > AI
Select your AI provider
Paste your API key
Choose your preferred model
Click Test Key to verify
Save your configuration

Your workspace will automatically switch to BYOK mode and AI credits will no longer be consumed.

Step 3: Enable AI Features

Toggle the features you want:

Feature	Description
AI Summarization	Get intelligent summaries of page changes
Importance Scoring	AI scores each change from 0-100, filtering out low-priority noise
Custom Instructions	Add context for better analysis
Deep Analysis	Send full page content for better context (BYOK only)
Run on First Check	Get AI analysis on initial page check (BYOK only)

Switching Back to Credits

If you want to stop using your own key and return to included credits:

Go to Settings > Workspace > Integrations > AI
Click Switch to included credits
Your API key configuration is preserved in case you want to switch back later

Page-Level Configuration

You can customize AI settings at three levels:

Level	Applies To	Best For
Workspace default	All pages	General settings
Template override	Pages using that template	Grouped pages (e.g., all product pages)
Page override	Individual pages	Critical or special pages

Example strategy:

Workspace default: Gemini 3.1 Flash Lite (cheapest)
E-commerce template: GPT-5 Mini (best value)
Legal/ToS template: Claude Haiku 4.5 (high accuracy)
Critical contracts: Claude Sonnet 4.6 (premium)

Model Selection Guidelines

By Priority

Priority	Recommended Models
Cost optimization	Gemini 3.1 Flash Lite, GPT-5 Nano
Accuracy	GPT-5.2, Claude Sonnet 4.6
Speed	Claude Haiku 4.5, GPT-5 Mini
Complex content	Claude Sonnet 4.6, GPT-5.2

By Page Complexity

For most pages, a general-purpose model provides excellent results at a lower cost:

Model	Provider	Best For
Gemini 3 Flash	Google	General monitoring, good balance of speed and quality
GPT-5 Mini	OpenAI	Reliable all-around performance

For complex pages that require deeper analysis or more reasoning (e.g., dense legal documents, technical specifications, multi-section reports), choose a more powerful model:

Model	Provider	Best For
Gemini 3.1 Pro	Google	Complex documents requiring extended reasoning
GPT-5.2	OpenAI	Nuanced analysis and detailed comparisons
Claude Opus 4.6	Anthropic	Critical documents requiring highest accuracy

Note: Start with a general-purpose model and upgrade to a more powerful one if you notice the AI missing important changes or providing superficial summaries.

By Content Type

Content Type	Budget	Recommended	Premium
Blogs, News	Gemini 3.1 Flash Lite	GPT-5 Mini	-
E-commerce	Gemini 3.1 Flash Lite	GPT-5 Mini	Claude Haiku 4.5
Legal, ToS	Claude Haiku 4.5	Claude Sonnet 4.6	Claude Sonnet 4.6
API Docs	Gemini 3.1 Flash Lite	GPT-5 Mini	-

AI-Powered Change Detection and Smart Filtering - Learn how AI summarization and Importance Scoring work
Choosing the Right AI Model for Website Monitoring - Compare models and pricing to find the best fit

How to Monitor Terms of Service and Privacy Policy Pages for Compliance

2026-03-05T10:31:13+00:00

How to Monitor Terms of Service and Privacy Policy Pages for Compliance

Businesses rely on numerous third-party services, each with their own Terms of Service and Privacy Policy that can change at any time. These changes might affect your compliance status, operational procedures, or legal obligations. PageCrawl.io provides an automated way to track these critical documents, ensuring you're always informed when important updates occur.

This guide will show you how to set up automated monitoring for legal documents using PageCrawl.io's features.

Why Monitor Legal Documents

When vendors update their terms without direct notification, it can impact your business in several ways. Payment processors might change their fee structures, cloud providers could modify data processing agreements, or analytics tools might update their data retention policies. Manual checking of these documents is time-consuming and prone to missing important updates.

Understanding What to Monitor

Legal document monitoring typically focuses on tracking changes in Terms of Service, Privacy Policies, Data Processing Agreements, and Service Level Agreements from your vendors and partners.

Setting Up Compliance Monitoring in PageCrawl.io

The process of setting up monitoring for legal documents is straightforward and can be completed in a few minutes per page.

Step 1: Add the Legal Document Page

Log in to your PageCrawl.io dashboard
Click the "Track New Page" button
Enter the URL of the Terms of Service or Privacy Policy you want to monitor
Provide a descriptive name for the monitoring task (e.g., "Stripe Terms of Service" or "AWS Privacy Policy")

Step 2: Configure Detection Settings

Select "Full page text" as your detection method and enable "Reader mode" - this captures only the main text content, automatically ignoring irrelevant changes in page footers, headers, or sidebar areas
Set how frequently the page should be checked - daily is sufficient for most legal documents, but you can adjust based on your needs (hourly for critical vendors, weekly for stable documents)

Step 3: Set Up Notifications

Choose when to receive notifications: Instantly when changes are detected, or as a daily/weekly digest that summarizes all changes across your monitored pages
Select notification channels: Email, Slack, Discord, Microsoft Teams, Telegram, or Webhooks for system integration
Configure team notifications by adding relevant team members to receive alerts

Practical Implementation Tips

Start by monitoring your most critical vendor agreements first, then gradually expand to include other services. Use clear naming conventions for your monitoring tasks to easily identify which document changed when you receive an alert.

Organizing Your Monitoring Portfolio

Create a structured approach to monitoring by categorizing your tracked pages. Group them into critical vendors (payment processors, infrastructure providers), data processors (analytics tools, CRM systems), and regulatory pages (government compliance guidelines).

Using Tags for Better Organization

Implement a tagging system from the start. Use tags like #vendor, #competitor, #gdpr, or #payment to quickly filter and manage your monitored pages. This becomes especially useful as your monitoring portfolio grows.

Handling Different Types of Changes

Not all changes are equal. Some updates might be minor formatting adjustments, while others could be significant legal modifications. PageCrawl.io helps you distinguish between these by highlighting exactly what changed, showing removed text in red and new text in green.

For each detected change, PageCrawl.io stores:

Screenshots of the page before and after the change
Text differences with clear highlighting of additions and removals
AI summaries explaining what changed in plain language (when enabled)
Historical versions for complete audit trails

This comprehensive record ensures you have all the evidence needed for compliance audits and legal reviews.

Troubleshooting Common Issues

If you're receiving too many alerts about minor changes, check that Reader mode is enabled to filter out navigation and footer updates. For more strategies on reducing false positives, see our guide on reducing false positives when monitoring websites.

If you're missing important changes, verify that the correct URL is being monitored and that the page is accessible.

Next Steps

Once you've set up basic monitoring, consider implementing advanced strategies such as keyword-based alerts for critical terms like "price increase" or "data breach", or comparison monitoring to track how your policies compare to competitors.

Getting Started Today

Begin with your most important vendor agreements. Setup takes just a minute or two per page, or you can save time by importing multiple URLs at once - simply copy and paste a list of URLs or upload an Excel file for bulk import.

PageCrawl.io handles the monitoring automatically once configured. You'll receive clear notifications when changes occur, allowing you to review and respond promptly to maintain compliance.

For businesses monitoring multiple vendors, check our pricing page - monitoring 500 URLs costs just $30/month, making enterprise-wide compliance monitoring affordable and efficient.

Monitoring Numeric Values for Changes to Spot Trends

2026-04-15T07:18:16+00:00

Monitoring Numeric Values for Changes to Spot Trends

You can track numeric values on a page using the "Number" tracked element type. This extracts numbers from a selected area on the page and displays them in a chart so you can quickly see the history of values and spot trends. Instead of manually checking a number every day, PageCrawl monitors it for you and builds a visual record over time.

What You Can Track

Common things to monitor with the Number tracker:

E-commerce: Product prices, discounts, stock quantities available
Finance: Stock prices, cryptocurrency values, exchange rates
Analytics: Page views, visitor counts, conversion rates
Ratings: Product ratings, review scores, customer satisfaction metrics
Inventory: Stock levels, warehouse quantities, supply counts

Set Up on PageCrawl.io

Log in to your pagecrawl.io account
Click Track New Page and enter the URL of the page containing the number you want to monitor
Click Tracked Elements to add what you want to monitor
Select "Number" as the tracked element type
Use the visual selector to click directly on the number on the page, or manually enter an XPath/CSS selector if you prefer

The visual selector is the easiest way - just point and click on the number you want to track. PageCrawl will figure out the selector for you automatically.

Using Selectors Manually

If you prefer to manually write selectors by analyzing HTML source, here are some examples:

For a price like this:

$49.99

Use: //span[@class="price"] or .price

For an inventory count:

150 items available

Use: //div[@class="stock"] or div.stock

For a specific ID:

2,543 views

Use: //p[@id="total-views"] or #total-views

For a rating or score:

4.5

Use: //span[@class="rating"] or .rating

How It Works

Once you've set up your number tracker, PageCrawl will:

Extract the numeric value each time it checks the page
Store the values over time and build a historical record
Display all values in a chart so you can see trends at a glance
Show you when values go up or down and by how much
Alert you if the number changes by a certain amount (if you configure notification conditions)

The chart displays your complete history, making it easy to spot patterns and see how values change over different time periods. You can see exactly when changes happened and track the progression of any number over days, weeks, or months.

Understanding the Chart

Your number tracking chart shows:

All previous values recorded over time
Exact dates and times when each value was captured
Trends and patterns in how the number changes
Peaks (highest values) and valleys (lowest values)
How much the number changed between each check

This gives you a clear visual picture of what's happening with the metric you're tracking.

Statistics Overview

PageCrawl displays comprehensive statistics about your tracked number:

Data Points: Total number of checks performed and days tracked
Average: The mean of all recorded values over time
Median: The middle value, useful for understanding typical values when outliers exist
First Recorded: The initial value and when tracking began
Current Value: Your most recent reading with:
- 90-day change comparison
- Distance from average (shows if current value is higher or lower than typical)
Highest Value: The maximum value ever recorded and when it occurred
Lowest Value: The minimum value ever recorded and when it occurred
Total Change: How much the value has changed since you started tracking (absolute and percentage)
Trend: Overall direction indicator (📈 Up, 📉 Down, or ➡️ Stable)
Last Changed: When the value actually changed (not just checked)

These statistics are color-coded:

Green indicates increases or positive changes
Red indicates decreases or negative changes
Gray indicates neutral or stable values

This helps you quickly understand the overall behavior of your metric without manually analyzing the chart.

Chart

The chart visualization includes powerful interactive features to help you analyze your data:

Date Range Filters:

Use the quick filter buttons to view specific time periods:
- Last 7 Days - Recent short-term trends
- Last 30 Days - Monthly patterns
- Last 90 Days - Quarterly trends
- All Time - Complete history

Chart Controls:

Avg Line: Toggle the average reference line on/off
Moving Avg: Toggle moving average lines on/off to smooth out short-term fluctuations
- Choose between 7-day or 30-day moving averages
- The moving average line appears as a dashed line in the same color as your data
- Helps identify underlying trends by filtering out daily noise
- Hover over any point to see both the actual value and the moving average

Visual Annotations:

Average Line: A dashed horizontal line shows the overall average value
Highest Point: Marked with a red dot and label showing the peak value
Lowest Point: Marked with a green dot and label showing the minimum value
Color-Coded Dots: Each data point is colored based on change direction:
- Green dots indicate the value increased from the previous check
- Red dots indicate the value decreased
- Standard color means no change
Zoom Brush: On desktop, use the brush tool at the bottom to zoom into specific date ranges

Legend:

Click on any metric name in the legend to show/hide that line
Disabled lines appear grayed out with a strikethrough
Perfect for focusing on specific metrics when tracking multiple values
Click again to re-enable the line
All reference lines (average, annotations) update based on visible lines

Tooltips: When you hover over any point on the chart, you'll see:

The exact date and time of the check
The current value at that point
Change from the previous check with up/down arrows (▲ ▼)
The moving average value at that point (if enabled)
All values are clearly labeled so you know what each number means

Performance Optimizations: For long tracking periods with thousands of data points:

The chart automatically samples data when viewing "All Time" to maintain smooth performance
You'll see a note indicating how many points are shown (e.g., "Showing 150 of 500 points")
This ensures fast, responsive charts even with years of historical data

These features make it easy to spot trends, identify when significant changes occurred, and understand your data at a glance.

Tips for Best Results

Use the visual selector: Click directly on the number you want to track rather than writing selectors manually
Check your selector works: Make sure the selector is targeting the right element on the page
Set reasonable check frequency: How often PageCrawl checks depends on how fast you expect the number to change
Use templates for multiple pages: If you're tracking the same metric on different pages (like product prices), create a template and apply it to all pages. If you need to update the monitored pages, you will only need to make one change.

Using Templates

If you need to monitor the same numeric value across multiple pages on a website, you can:

Create a template with your Number tracker configuration
Apply that template to all the pages you want to monitor
Compare how the value changes across different pages

This saves you time and makes it easy to track metrics across your entire site.

Comparing Multiple Monitors on One Chart

If you're tracking the same type of number across different pages (for example, the price of a product on multiple retailers), you can overlay them all on a single chart to compare side by side.

How to set it up:

Open any monitor that has a Number or Price tracked element
Above the chart, you'll see a "Compare with..." dropdown
Click it and search for other monitors you want to add by name or URL
Select the monitors you want to compare. You can add up to 5 monitors on the same chart

PageCrawl will suggest relevant monitors automatically, prioritizing monitors in the same folder, on the same domain, or tracking similar products.

What the combined chart shows:

Each monitor appears as a separate line in a distinct color
All data points are merged onto a shared timeline so you can see how values move relative to each other
The chart legend lists every line. Click any line in the legend to show or hide it
Hovering over the chart shows a tooltip with the values from all monitors at that point in time
Date filters, moving averages, and zoom all apply to every line at once

Reading the comparison:

The Y-axis adjusts automatically to fit all values
The average, highest, and lowest annotations still apply to the primary monitor
Comparison data in tooltips is marked with a bullet (●) so you can tell which values belong to the primary monitor and which are from compared monitors

This is especially useful for:

Competitive price tracking: See how your price compares to competitors over time on one chart
Cross-retailer monitoring: Track the same product on Amazon, Walmart, and other stores and see price differences instantly
Regional comparisons: Compare the same metric across different regional pages
Benchmarking: Overlay your metric against an industry reference point

Your comparison selections are saved, so the next time you open the monitor the same comparison lines will appear on the chart.

Common Examples

E-commerce Store: Track product prices across listings. When prices drop or go on sale, you'll see it immediately in the chart. Compare pricing across multiple product pages to spot trends.

Real Estate Pricing: Track property prices on listing sites. Monitor how prices change over time, identify when properties go on sale, or track pricing trends in your area of interest.

Competitor Pricing: Monitor competitor product prices, discount percentages, or pricing changes. The chart gives you a clear view of when they adjust their prices.

Job Postings: Track how many open positions a company has posted. The chart shows when they're actively hiring and when positions get filled.

Education Programs: Monitor tuition costs, enrollment numbers for programs, or available spots in courses. Track how these metrics change throughout the year.

Government Fees & Services: Monitor permit costs, license fees, visa application prices, or other government service charges that may be subject to change.

Stock Price Monitoring: Monitor the current price of a stock or cryptocurrency. The chart shows you exactly when the price changed and by how much.

SAML SSO Configuration in PageCrawl

2026-05-06T14:16:21+00:00

SAML SSO Configuration in PageCrawl

This guide covers the PageCrawl side of SSO setup: importing your identity provider's metadata, enabling SSO, configuring enforcement and user provisioning. For step-by-step instructions on configuring your identity provider (Azure AD, Google Workspace, Okta, etc.), see the Identity Provider Setup Guide.

Single Sign-On (SSO) allows your team members to securely access PageCrawl using your organization's identity provider, such as Azure AD, Google Workspace, Okta, or OneLogin.

Requirements

To use SAML SSO, your team must meet the following requirements:

Enterprise or Ultimate Plan subscription
Corporate email domain - The team owner must use a verified corporate email address (free email providers like Gmail, Yahoo, Outlook, and iCloud are not supported)
Identity Provider that supports SAML 2.0 standard

How to Configure SAML SSO

1. Access SSO Settings

Navigate to Settings → Team → Auth & SSO in your PageCrawl account. You must be a team Owner or Administrator to access these settings.

When you first access the SSO settings page, PageCrawl automatically generates a unique identifier (UUID) and creates an initial SSO configuration for your team. This UUID is immediately available and used to create your Entity ID and Metadata URL.

2. Get Service Provider Information

Before configuring your Identity Provider, copy the Metadata URL displayed in the blue information box at the top of the SSO settings page.

The URL will look like: https://pagecrawl.io/sso/saml/abc-123-def-456/metadata

Important: Copy the actual URL shown in PageCrawl, not this example.

Most Identity Providers can automatically import all necessary configuration (Entity ID, ACS URL, Logout URL, etc.) from this metadata URL.

Note: If your IdP requires manual entry, the individual URLs are also displayed in the same box:

Reply URL (Assertion Consumer Service URL)
Sign on URL
Logout URL

3. Configure Your Identity Provider

Follow the instructions in our Identity Provider Setup Guide for your specific IdP (Azure AD, Google Workspace, Okta, etc.).

You'll need to create a SAML application in your IdP and provide the ACS URL and Entity ID from step 2.

4. Import Identity Provider Metadata into PageCrawl

You have three options to configure your IdP:

Option A: Metadata URL (Recommended)

Enter your IdP's metadata URL
Click "Parse Metadata from URL"
PageCrawl will automatically extract all required settings

Option B: Metadata XML

Copy your IdP's metadata XML
Paste it into the metadata XML field
Click "Parse Metadata XML"

Option C: Manual Entry

Manually enter Entity ID, SSO URL, SLO URL, and X.509 Certificate
This option is useful for custom configurations

5. Enable SSO Features

Configure the following settings based on your needs:

Enable SSO

Turn on SAML authentication for your domain.

Enforce SSO

When enabled, password login will be disabled for users with your email domain. Users must authenticate via your identity provider.

Just-in-Time (JIT) Provisioning

Enable Automatic Account Creation

Enabled: New users logging in via SSO will automatically get accounts created
Disabled: Only existing users can log in via SSO. New users must be manually added first.

When JIT provisioning is enabled, you can configure:

Default Role for New SSO Users

Administrator
Standard User
Viewer

Default Workspaces

Leave empty to assign all workspaces
Select specific workspaces to limit access

Auto-Create Personal Workspace

When enabled, each new SSO user gets a personal workspace
Note: Your account has a workspace limit based on your subscription
If the limit is reached, no personal workspaces will be created

Workspace Limits

Personal workspace creation depends on your subscription plan:

If you enable "Auto-Create Personal Workspace" and have reached your limit, new SSO users will be assigned to default workspaces instead of creating personal workspaces.

SSO Login Flow

Once configured, users with your email domain will:

Go to PageCrawl login page
Enter their email address
Be redirected to your identity provider
Authenticate with their corporate credentials
Be redirected back to PageCrawl and logged in automatically

If JIT provisioning is enabled and they're a new user, an account will be created automatically with the configured role and workspace assignments.

Troubleshooting Common Issues

"Team has reached member limit"

Error: "Unable to provision SSO user: Team has reached its member limit."

Solution:

Check your subscription plan in Settings → Team → Subscription
Either upgrade to a plan with more seats or remove inactive members
Once you have available seats, the user can try logging in again

"Automatic account creation is disabled"

Error: "Automatic account creation is disabled. Please ask your team administrator to enable JIT provisioning."

Solution:

Enable "Enable Automatic Account Creation" in Settings → Team → Auth & SSO
Or manually add the user in Settings → Team → Users before they log in

User Not Assigned in Identity Provider

Symptoms: User gets error after authenticating at IdP.

Solution:

Azure AD: Go to Enterprise Applications → PageCrawl → Users and groups → Add user/group
Google Workspace: Admin Console → PageCrawl app → User access → Enable for user's org unit
Okta: Applications → PageCrawl → Assignments → Assign to People

Certificate Expired or Invalid

Symptoms: "Invalid signature" or authentication fails at final step.

Solution:

In PageCrawl SSO settings, update the metadata:
- Click Parse Metadata from URL to refresh, or
- Download fresh XML from IdP and paste it, then click Parse Metadata XML
Most IdPs rotate certificates every 1-3 years

Metadata Import Errors

Common Issues:

EntitiesDescriptor Format: PageCrawl requires EntityDescriptor format, not EntitiesDescriptor
Invalid XML: Ensure you copied the entire XML including declaration


URL Not Accessible: Ensure metadata URL is publicly accessible


Personal Workspace Not Created
Cause: Team has reached workspace limit for subscription plan.
Solution:

Delete unused workspaces in Settings → Team → Workspaces
Or upgrade to a plan with more workspaces
New users will still be assigned to default workspaces

Testing Your SSO Configuration

Use Incognito/Private Window to test fresh user experience
Test with Assigned User who has access in your IdP
Verify Each Step:
Enter email at PageCrawl login
Verify redirect to IdP
Authenticate at IdP
Verify redirect back to PageCrawl
Confirm successful login


Test Different Scenarios:
New user (if JIT enabled)
Existing user
User with wrong domain (should fail correctly)



Security Best Practices

Monitor certificate expiration dates and update before they expire
Only assign necessary users in your IdP
Set appropriate default role (usually "Viewer" or "Standard User")
Enable "Enforce SSO" only after thorough testing with all users

Frequently Asked Questions
Q: Can I have multiple identity providers?
A: No, PageCrawl supports one identity provider per team.
Q: What happens to existing users when I enable SSO?
A: Existing users can continue using password login unless you enable "Enforce SSO". With JIT provisioning enabled, their accounts will be automatically linked to SSO on first SSO login.
Q: Can I disable SSO after enabling it?
A: Yes, you can disable SSO anytime in the settings. Users will revert to password-based login.
Q: What if my IdP certificate expires?
A: Users won't be able to log in until you update the certificate. Update metadata in PageCrawl SSO settings as soon as your IdP rotates certificates.
Q: Why can't I use Gmail or other free email providers?
A: SSO requires corporate email domains for security. Free email providers don't provide the organizational control needed for enterprise SSO.
Q: How do I migrate all users to SSO?
A: Enable SSO with JIT provisioning first. Test with a few users. Once confirmed working, enable "Enforce SSO" to require all users to use SSO.
Q: What happens if we reach our member or workspace limit?
A: New SSO users won't be able to log in if member limit is reached. If workspace limit is reached, personal workspaces won't be created, but users will still be assigned to default workspaces.
Support
For assistance with SSO configuration or to request early access, contact support@pagecrawl.io.



Set Up Your Identity Provider for SAML SSO
2026-05-06T14:16:21+00:00
Set Up Your Identity Provider for SAML SSO
This guide covers the identity provider (IdP) side of SSO setup with step-by-step instructions for Azure AD, Google Workspace, Okta, OneLogin, and custom SAML providers. For PageCrawl-side settings (enabling SSO, enforcement, JIT provisioning), see the SSO Configuration Guide.
Before you begin, ensure you have:

Access to your identity provider's admin console
PageCrawl Enterprise or Ultimate plan with SSO enabled
Team owner's verified corporate email address

Get Your Service Provider Information
IMPORTANT: Complete this step first before configuring your Identity Provider


Navigate to Settings → Team → Auth & SSO in PageCrawl


Copy the Metadata URL shown in the blue Service Provider information box

It will look like: https://pagecrawl.io/sso/saml/abc-123-def-456/metadata
Important: Copy the actual URL from PageCrawl, not this example



Keep this URL handy - most Identity Providers can automatically import all configuration from this metadata URL


Note: If your IdP doesn't support metadata import, copy the individual URLs from PageCrawl (they will also be shown in the same box):

Reply URL (Assertion Consumer Service URL)
Sign on URL
Logout URL

Additional information for reference:

NameID Format: Email Address (urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress)
Binding: HTTP-POST for ACS, HTTP-Redirect for Single Sign-On


Azure AD / Microsoft Entra ID
Step 1: Create Enterprise Application

Sign in to the Azure Portal
Navigate to Azure Active Directory → Enterprise Applications
Click New application
Click Create your own application
Name it "PageCrawl" and select Integrate any other application you don't find in the gallery (Non-gallery)
Click Create

Step 2: Configure SAML

In your PageCrawl application, click Single sign-on in the left menu
Select SAML as the single sign-on method
In section 1. Basic SAML Configuration, click Edit and enter:
Identifier (Entity ID): Paste your Entity ID from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../metadata)
Reply URL (ACS URL): Paste your Reply URL from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../acs)


Click Save

Step 3: Configure Attributes & Claims
The default Name ID (user.mail) is sufficient. No additional changes needed.
Step 4: Download Metadata

In section 3. SAML Signing Certificate, copy the App Federation Metadata Url
In PageCrawl SSO settings, paste this URL in the Metadata URL field
Click Parse Metadata from URL

Step 5: Assign Users

Navigate to Users and groups
Click Add user/group
Select users or groups who should have access to PageCrawl
Click Assign


Google Workspace
Step 1: Create Custom SAML Application

Sign in to your Google Admin Console
Go to Apps → Web and mobile apps
Click Add app → Add custom SAML app
Enter "PageCrawl" as the app name
Click Continue

Step 2: Download Google IdP Metadata

On the Google Identity Provider details page, click Download Metadata
Save the XML file
Click Continue

Step 3: Configure Service Provider Details

Enter the following values:
ACS URL: Paste your Reply URL from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../acs)
Entity ID: Paste your Entity ID from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../metadata)
Start URL: Leave empty
Name ID format: EMAIL
Name ID: Basic Information > Primary email
Signed response: Leave unchecked (PageCrawl requires signed assertions, which is the industry standard default)


Click Continue
Click Finish (skip attribute mapping)

Step 4: Import Metadata to PageCrawl

Open the downloaded metadata XML file
In PageCrawl SSO settings, paste the content into Metadata XML field
Click Parse Metadata XML

Step 5: Turn On the App

In Google Admin, click on your PageCrawl app
Click User access
Select ON for everyone or specific organizational units
Click Save


Okta
Step 1: Add Application

Sign in to your Okta Admin Console
Go to Applications → Applications
Click Create App Integration
Select SAML 2.0 and click Next

Step 2: General Settings

Enter "PageCrawl" as the App name
(Optional) Upload a logo
Click Next

Step 3: Configure SAML

In the SAML Settings section, enter:
Single sign-on URL: Paste your Reply URL from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../acs)
Audience URI (SP Entity ID): Paste your Entity ID from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../metadata)
Name ID format: EmailAddress
Application username: Email


Leave other settings as default
Click Next

Step 4: Feedback

Select I'm an Okta customer adding an internal app
Click Finish

Step 5: Get Metadata URL

On the Sign On tab, scroll to SAML Signing Certificates
Click Actions next to the active certificate
Click View IdP metadata
Copy the URL from your browser's address bar
In PageCrawl SSO settings, paste this URL in the Metadata URL field
Click Parse Metadata from URL

Step 6: Assign Users

Go to the Assignments tab
Click Assign and select Assign to People or Assign to Groups
Assign users who should have access to PageCrawl
Click Done


OneLogin
Step 1: Add Application

Sign in to your OneLogin Admin Console
Go to Applications → Applications
Click Add App
Search for "SAML Test Connector (Advanced)" and select it

Step 2: Configure Application

Enter "PageCrawl" as the Display Name
Click Save

Step 3: Configure SAML Settings

Go to the Configuration tab
Enter the following:
Audience (Entity ID): Paste your Entity ID from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../metadata)
Recipient: Paste your Reply URL from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../acs)
ACS (Consumer) URL Validator: Use regex pattern https://pagecrawl\.io/sso/saml/[^/]+/acs
ACS (Consumer) URL: Paste your Reply URL from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../acs)


Click Save

Step 4: Get Metadata URL

Go to the More Actions menu
Select SAML Metadata
Copy the metadata URL
In PageCrawl SSO settings, paste this URL in the Metadata URL field
Click Parse Metadata from URL

Step 5: Assign Users

Go to the Users tab
Select users who should have access
Click Save


Custom SAML 2.0 Provider
If your identity provider isn't listed above but supports SAML 2.0, you can configure it manually:
Step 1: Configure Your Identity Provider
In your IdP, create a new SAML application with these settings:

Entity ID: Paste your Entity ID from PageCrawl (you copied this in the first section above, e.g., https://pagecrawl.io/sso/saml/abc-123.../metadata)
ACS URL: Paste your Reply URL from PageCrawl (e.g., https://pagecrawl.io/sso/saml/abc-123.../acs)
NameID Format: Email Address
Binding: HTTP-POST for ACS, HTTP-Redirect for SSO

Step 2: Get IdP Information
From your identity provider, collect:

Entity ID (IdP Issuer)
SSO URL (Sign-on URL)
SLO URL (Sign-out URL) - Optional
X.509 Certificate

Step 3: Manual Configuration in PageCrawl

In PageCrawl SSO settings, select the Manual Entry tab
Enter the collected information:
Entity ID
SSO URL
SLO URL (optional)
X.509 Certificate (paste the full certificate including BEGIN/END markers)


Enable SSO and configure JIT provisioning settings
Click Save Changes


Validation
After configuration, test your SSO:

Open an incognito/private browser window
Go to PageCrawl login page
Enter a test user's email address with your domain
Verify you're redirected to your IdP
Complete authentication
Verify you're logged into PageCrawl successfully

If you encounter issues, check:

User is assigned to the PageCrawl application in your IdP
Email domain matches your configured domain
Metadata was imported correctly
X.509 certificate is valid and not expired


Notes

Metadata XML Format: PageCrawl does not support the EntitiesDescriptor element. Use EntityDescriptor format.
Multiple IdPs: PageCrawl supports one identity provider per team.
Certificate Rotation: When your IdP certificate expires, update the metadata in PageCrawl SSO settings.

Support
For assistance with your specific identity provider, contact support@pagecrawl.io.


Choosing the Right AI Model for Website Change Monitoring in 2026
2026-05-06T14:16:21+00:00
Choosing the Right AI Model for Website Change Monitoring in 2026
Every PageCrawl.io plan includes AI credits that work automatically with no setup. For most users, the included credits are all you need. This guide is primarily for users who want to bring their own API key (BYOK) and choose a specific model, covering budget options to premium models with cost comparisons based on 2026 pricing.
Using included credits? You don't need to choose a model. PageCrawl automatically uses optimized models on your behalf. Each 4,000-token block costs 1 credit (Basic tier) or 10 credits (Pro tier, Ultimate plan only). See AI-Powered Change Detection for details on how credits work.
Pricing updates frequently. Verify current rates at: OpenAI, Gemini, Anthropic, OpenRouter
Why AI Models Matter
AI models enhance website monitoring by automatically summarizing changes, assigning priority scores, and distinguishing meaningful updates from noise.
PageCrawl.io supports four AI providers:

OpenAI - GPT-5 family, reliable and fast
Google Gemini - Gemini 3 family with competitive pricing
Anthropic Claude - Claude 4.5 and 4.6 series, high accuracy and premium quality
OpenRouter - A marketplace that gives you access to 200+ AI models from different providers, all through a single account and API key

Understanding Tokens and Costs
What is a Token?
A token is roughly 4 characters or about 3/4 of a word. AI providers charge based on tokens processed:

"Hello world" = ~3 tokens
A typical paragraph = ~100 tokens
A blog post (1,000 words) = ~1,300 tokens
A full webpage = ~2,000-10,000 tokens

How PageCrawl Uses Tokens
PageCrawl's AI costs are dominated by input tokens (the page content sent to AI). Output tokens are minimal because summaries are typically just 1-2 paragraphs (~100-200 tokens).
Typical token usage per check:

Simple page (blog post, article): ~1,000-2,000 tokens
Medium page (product page, news): ~2,000-5,000 tokens
Large page (documentation, e-commerce): ~5,000-10,000 tokens

Example cost calculation (Gemini 3 Flash at ~$0.40/M input):

2,000 token page = $0.0008 per check (~1,250 checks per dollar)
5,000 token page = $0.002 per check (~500 checks per dollar)

Example cost calculation (Claude Opus 4.6 at ~$15/M input):

2,000 token page = $0.03 per check (~33 checks per dollar)
5,000 token page = $0.075 per check (~13 checks per dollar)

Since output is just a short summary (~150 tokens), output costs add less than 10% to the total. Additionally, AI only runs when a meaningful change is detected on the page. PageCrawl's advanced change detection infrastructure filters out tiny, insignificant changes before they ever reach AI, so you only spend tokens on changes that actually matter.
Available Models by Provider
Below are the models currently available in PageCrawl.io. Models marked with a star are the recommended defaults for each provider.
OpenAI Models



Model
Notes




GPT-5 Mini ⭐
Default. Great balance of speed, quality, and cost.


GPT-5.2
Most capable OpenAI model. Best for complex pages.


GPT-5
Full GPT-5 model.


GPT-5 Nano
Fastest and cheapest. Good for simple pages.


O3
Reasoning model for complex analysis.


O4 Mini
Smaller reasoning model.


GPT-4.1 Mini
Previous generation, still reliable.


GPT-4.1
Previous generation, good for complex tasks.


GPT-4.1 Nano
Previous generation budget option.



Google Gemini Models



Model
Notes




Gemini 3 Flash ⭐
Default. Latest generation with great speed and quality.


Gemini 3.1 Pro
Premium model, Google's most capable.


Gemini 3.1 Flash Lite
Budget option in the latest generation.


Gemini 2.5 Flash
Reliable previous generation model.


Gemini 2.5 Flash Lite
Very affordable previous generation option.


Gemini 2.5 Pro
Previous generation premium model.



Anthropic Claude Models



Model
Notes




Claude Haiku 4.5 ⭐
Default. Fast, affordable, and accurate.


Claude Sonnet 4.6
Latest generation with excellent accuracy.


Claude Opus 4.6
Most capable Anthropic model. Premium pricing.


Claude Sonnet 4.5
Previous generation, strong all-rounder.


Claude Opus 4.5
Previous generation premium model.


Claude Haiku 3.5
Older generation budget option.



Recommended Models by Use Case
PageCrawl.io only calls AI when a page actually changes. If you monitor 1,000 pages and only 150 change, you pay for 150 AI requests, not 1,000.
The PageCrawl.io settings page provides three Quick Select tiers to help you choose:
Best / Most Capable
For complex pages where accuracy matters most (legal documents, terms of service, compliance monitoring).



Provider
Model




OpenAI
GPT-5.2


Anthropic
Claude Sonnet 4.6


Google Gemini
Gemini 3.1 Pro


OpenRouter
Claude Sonnet 4.6



These models deliver the most accurate results but cost significantly more. Only use them for pages where precision is critical.
Good Quality (Recommended for Most Users)
Best balance of quality and cost for everyday monitoring.



Provider
Model




OpenAI
GPT-5 Mini


Anthropic
Claude Haiku 4.5


Google Gemini
Gemini 3 Flash


OpenRouter
Gemini 3 Flash



This tier is the sweet spot for most BYOK users. These models handle the vast majority of monitoring tasks reliably and affordably.
Budget
Lowest cost for high-volume monitoring where some accuracy trade-off is acceptable.



Provider
Model




OpenAI
GPT-5 Nano


Anthropic
Claude Haiku 4.5


Google Gemini
Gemini 2.5 Flash Lite


OpenRouter
DeepSeek V3.2



Good for simple page monitoring (blog posts, news, documentation) where you want to keep costs as low as possible.
Best Models by Content Type



Content Type
Budget Option
Recommended
Premium




Blogs, News, Docs
GPT-5 Nano
GPT-5 Mini
-


E-commerce, Pricing
Gemini 2.5 Flash Lite
Gemini 3 Flash
Claude Haiku 4.5


Legal, ToS, Compliance
Claude Haiku 4.5
Claude Sonnet 4.6
GPT-5.2


Competitor Monitoring
Gemini 2.5 Flash Lite
GPT-5 Mini
Claude Haiku 4.5


API Docs, Changelogs
GPT-5 Nano
Gemini 3 Flash
-



Real-World Cost Examples
Costs can vary significantly. These are estimates only. Your actual costs depend on:

Page complexity and content length
How often pages change
Deep Analysis setting (on = full page, off = changes only)
Max token settings

Token usage by page type:

Simple pages (blogs, docs): ~500 tokens
Average pages: ~2,000 tokens
Content-heavy pages: ~5,000-10,000 tokens
Complex pages (e-commerce, SPAs): 10,000-25,000+ tokens

Recommendation: Start with budget-friendly models like GPT-5 Nano or Gemini 2.5 Flash Lite and set strict monthly limits to avoid unexpected bills.
Controlling Token Usage
You can reduce token usage in PageCrawl.io settings:

Deep Analysis off: Only send changed text to AI (lower tokens, less context)
Deep Analysis on: Send entire page for better understanding (higher tokens)
Max tokens limit: Set a maximum per request (falls back to diff if exceeded)
Monthly request limits: Set max AI requests per month to cap costs
Per-page daily limit: Prevent noisy pages from consuming all your AI budget

Note: Check your actual token usage in PageCrawl.io's AI statistics to estimate your costs accurately.
OpenRouter: Access 200+ Models
OpenRouter provides unified access to AI models from multiple providers through a single API key. PageCrawl.io recommends OpenRouter as the default BYOK option because of its flexibility.
Benefits: Unified billing, automatic fallbacks, access to models from OpenAI, Anthropic, Google, xAI, DeepSeek, Meta, Mistral, Qwen, and Cohere
Best for: Users who want a single API key with access to many models, easy billing budgets, and the ability to switch models without changing keys
Privacy Mode: When enabled in PageCrawl.io settings, your data is only routed through AI providers that don't use it for training
The model dropdown in PageCrawl.io groups OpenRouter models by price tier (Premium, Recommended, Standard, Budget) and only shows models that support the structured output format required for PageCrawl's AI features.
How to Set Up BYOK in PageCrawl.io
Step 1: Get Your API Key



Provider
Get Key At




OpenRouter
openrouter.ai > Settings > API Key


OpenAI
platform.openai.com > API Keys


Google Gemini
ai.google.dev > Get API Key


Anthropic
console.anthropic.com > API Keys



Step 2: Configure in PageCrawl.io

Go to Settings > Workspace > Integrations
Click Manage or Setup on the AI Features card
Scroll down to the Bring Your Own Key section
Select your AI provider (OpenRouter is pre-selected and recommended)
Choose a model from the dropdown
Click Add API Key, paste your key, and use Test Key to verify it works
Click Save Configuration

Step 3: Use Quick Select for Easy Model Choices
After adding your API key, the settings page shows three Quick Select cards:

Best / Most Capable - Most accurate results, higher cost
Good Quality - Recommended for most users
Budget - Lowest cost, good for simple monitoring

Click any card to instantly switch to that model. You can also choose from the full model dropdown for more options.
Step 4: Optimize with Model Overrides
You can customize AI models at three levels:

Workspace default - applies to all pages
Template override - applies to pages using that template
Page override - applies to individual pages

Example strategy:

Workspace default: Gemini 2.5 Flash Lite (cheapest)
E-commerce template: GPT-5 Mini (good balance)
Legal template: Claude Sonnet 4.6 (high accuracy)
Critical page: GPT-5.2 (most capable)

Tips for Optimizing Costs

Start with the Good Quality tier - GPT-5 Mini, Gemini 3 Flash, or Claude Haiku 4.5 offer excellent quality at reasonable prices
Use templates - Group similar pages with the same model to optimize costs by content type
Check frequency doesn't affect AI costs - AI only runs when changes occur, not on every check
Set monthly limits - Use the "AI Requests Per Month" setting to cap spending
Monitor token usage - Check the token statistics in your AI settings to understand your actual costs
Use per-page daily limits - Prevent frequently updating pages from consuming all your budget

Privacy and Data Security Considerations



Provider
Data Usage
Best For




OpenAI/Anthropic
API data not used for training
Confidential content, legal docs


Google Gemini
Review Google's data policies
General monitoring


OpenRouter
Varies by underlying model. Enable Privacy Mode to restrict to non-training providers.
Flexible choice



When using included AI credits, content is processed through PageCrawl's managed AI infrastructure. When using BYOK, content is sent directly to your chosen provider.
Data protection policies: OpenAI, Anthropic, Google
Privacy note: Free tier models (including some OpenRouter models) may use your data for training. Use paid tiers for sensitive content.
FAQ
Do I need BYOK to use AI? No. All plans include AI credits that work automatically. BYOK is optional for users who want unlimited usage or specific model control.
What happens when my credits run out? Page monitoring continues normally, but AI summaries pause until credits reset next month. You can also switch to BYOK for unlimited usage.
Can I switch between credits and BYOK? Yes, at any time in Settings > Workspace > Integrations > AI.
Can I switch models after starting? Yes. Changes apply immediately to new checks. Historical data remains intact.
Do I pay for checks that don't find changes? No. AI only runs when pages actually change.
Can I use different models for different pages? Yes, via workspace defaults, template overrides, and page-level overrides.
Why is OpenRouter recommended as the default BYOK provider? OpenRouter gives you access to models from all major providers with a single API key. You can switch models anytime without changing keys, set spending limits in the OpenRouter dashboard, and enable Privacy Mode to control data handling.
Related Articles

AI-Powered Change Detection and Smart Filtering - Learn how AI summarization and Importance Scoring work
AI Integration Setup Guide (BYOK) - Step-by-step guide to configure your API keys



PageCrawl Browser Extension
2026-05-08T12:17:31+00:00
PageCrawl Browser Extension
The PageCrawl browser extension lets you instantly add any webpage to your monitoring list with just a few clicks. View recent changes, switch between workspaces, and start monitoring new pages - all without leaving your current tab.
Installation
The PageCrawl extension is available for:

Chrome: Install from Chrome Web Store
Firefox: Install from Firefox Add-ons
Safari, Edge, Brave, or any other browser: Save the PageCrawl bookmarklet to your bookmarks bar instead. Click it on any page and PageCrawl picks up where you are, no extension required.


  

Note: Click the pin icon next to PageCrawl to keep it visible in your browser toolbar for quick access.
Getting Started
1. Connect Your Account
After installing the extension, click the PageCrawl icon in your browser toolbar. You'll see a welcome screen prompting you to log in.

  


Click "Log In to PageCrawl"
You'll be redirected to PageCrawl to authenticate
Once logged in, you'll be automatically connected

If you don't have an account yet, click "Don't have an account? Sign up" to create one.
2. View Recent Changes
Once connected, the extension opens to your Recent Changes timeline. This shows the latest detected changes across all your monitored pages:

  


AI Summaries: If enabled, you'll see AI-generated summaries of what changed
Text Diffs: For text-based monitoring, you'll see the actual text additions (highlighted in green) and deletions (highlighted in red)
Visual Changes: Shows the percentage of visual difference detected
Price/Number Changes: Shows how the value changed (e.g., "increased by 10%")

Click any change to open it directly in your dashboard and see the full details.
3. Start Monitoring a Page
To add a new page to your monitoring:

Navigate to any webpage you want to monitor
Click the PageCrawl extension icon
Click "+ Track New Page"
Choose your monitoring type and options
Click "Start Monitoring"


  

Monitoring Types
Full Page Monitoring
Best for: Blog posts, news articles, documentation pages
Monitors text content on the page. Choose your tracking level:

Everything on page: Monitors all text, including navigation and footers
Content only: Excludes navigation, headers, and footers
Reader mode: Focuses on the main article content only

Keyword Monitoring: Optionally enter keywords (comma-separated) to only be notified when specific words appear or disappear. Leave empty to be notified of all changes.
Element Monitoring (Specific Area)
Best for: Prices, stock status, specific data points

Click "Click to Select Element"
Hover over the page and click the element you want to monitor
The selector will be automatically captured
Confirm your selection

You can also manually enter a CSS selector if you prefer.
Track as Number: Enable this to extract numeric values from the element. This allows you to track trends and percentage changes over time.
Keyword Monitoring: Same as Full Page - enter keywords to filter notifications.
Visual Monitoring
Best for: Charts, images, layouts, design changes

Click "Draw Area on Page"
Click and drag to select the area you want to monitor
Confirm your selection

The extension will capture screenshots of this area and compare them for changes.
Change Threshold: Set how much the area must change before you're notified:

Any change (most sensitive)
Tiny (1%) - Very Minor (3%) - Minor (5%)
Moderate (10%) - Recommended for most cases
Significant (30%) - Very High (50%) - Extremely High (80%)

Price Monitoring
Best for: Product pages, e-commerce sites
PageCrawl will automatically detect and track the main price on the page. This is optimized for common e-commerce platforms and product pages.
Check Frequency
Choose how often PageCrawl should check for changes:

Options depend on your subscription plan
Paid plans offer more frequent checks

Right-Click Menu
You can quickly access PageCrawl from any webpage using the right-click context menu:

Right-click anywhere on a webpage
Select "Open in PageCrawl"


  

What happens next depends on whether the page is already monitored:


If the page is already monitored: You'll be taken directly to the page's dashboard where you can view change history, adjust settings, or check the current status.


If the page is not monitored: You'll be taken to the page creation form with the URL pre-filled, ready to set up monitoring.


Header Actions
The extension header provides quick access to:

PageCrawl Logo: Click to open your main dashboard
Workspace Switcher: Switch between workspaces (if you have multiple)
Help (question mark icon): Open this guide

More Options
For advanced configuration (notifications, proxies, actions, etc.), click "More options →" below the Start Monitoring button. This opens the full page creation form on PageCrawl with your current settings pre-filled.


Add Pages to PageCrawl from iOS Safari
2026-05-06T14:16:21+00:00
Add Pages to PageCrawl from iOS Safari
What Is This?
Add any webpage to PageCrawl.io monitoring directly from Safari's Share Sheet on your iPhone or iPad. Just tap Share, tap the shortcut, and you're done.
Install the Shortcut
Tap the button below on your iPhone or iPad to install the "Add to PageCrawl" shortcut:
Get the Shortcut
When prompted, tap Get Shortcut to install it.
How to Use It

Open Safari and navigate to any page you want to monitor
Tap the Share button (square with arrow pointing up)
Scroll down and tap Add to PageCrawl
PageCrawl.io opens with the URL pre-filled
Configure your monitoring options and save


  

Works on Mac Too
This shortcut also works on macOS! In Safari on your Mac:

Click the Share button in the toolbar
Select Shortcuts from the menu
Click Add to PageCrawl

Alternatively, for desktop browsers you can use our bookmarklet — just drag it to your bookmarks bar for one-click access.

  

Why iOS Needs a Shortcut
Android phones can install PageCrawl as a Progressive Web App and share pages to it directly from the system share sheet. iOS Safari doesn't support that yet, even after a PWA is added to the Home Screen. The Shortcut above is the fastest way to bridge that gap on iPhone and iPad.
Android Users
On Android, install PageCrawl as an app and share pages straight from any browser. See Add Pages to PageCrawl from Android for the step-by-step. If you'd rather not install, our bookmarklet works in any mobile browser too.
Tips

Pin to top of Share Sheet: Tap "Edit Actions..." at the bottom of the Share Sheet to move "Add to PageCrawl" to your favorites for quicker access.
Works everywhere: This shortcut works in any app that shares URLs — Safari, Chrome, Firefox, News apps, or anywhere with a Share button.
Stay logged in: For the smoothest experience, make sure you're logged into PageCrawl.io in Safari.



What is the difference between Priority Support and Standard Support?
2026-03-05T10:31:13+00:00
What is the difference between Priority Support and Standard Support?
We aim to respond to your inquiries promptly but sometimes due to increased number of support requests, Enterprise and Ultimate customer requests/emails are prioritized over Standard customers. Therefore, the response time is faster, and you may expect a 'higher level' of support in case you are not able to set up the page the way you want.
For technical support our response times are prioritized according to your subscription plan:

Free Forever Plan: Technical support not offered
Standard Plan: Within 72 hours (excluding weekends)
Enterprise Plan: Within 24 hours (excluding weekends)
Ultimate Plan: Within 24 hours (excluding weekends)



PageCrawl.io + n8n integration
2026-03-05T10:31:12+00:00
PageCrawl.io + n8n integration

  

PageCrawl.io provides dedicated n8n community nodes that integrate directly into your n8n instance. With the PageCrawl Trigger and PageCrawl nodes, you can trigger workflows when changes are detected and interact with the PageCrawl.io API to manage pages, retrieve diffs, and download screenshots, all from within n8n's visual workflow editor.
Why integrate PageCrawl.io with n8n?
n8n is a workflow automation tool that you can self-host or run in the cloud. By connecting PageCrawl.io to n8n, you can:

Keep data on your infrastructure: Run workflows on your own servers, keeping sensitive change data within your network.
Build complex workflows visually: Use n8n's visual editor to chain together multiple steps, add conditional logic, and connect to hundreds of services.
Avoid per-task pricing: Unlike hosted automation platforms, self-hosted n8n has no limits on the number of workflow executions.
Connect to developer tools: Integrate directly with databases, APIs, Git repositories, and internal services that hosted platforms may not support.

Available nodes
PageCrawl.io provides two n8n nodes:
PageCrawl Trigger
The trigger node starts your workflow automatically when something happens on a monitored page. Supported events:

Change Detected: Fires when a monitored page's content changes.
Error: Fires when a page check fails (timeout, blocked, etc.).

You can filter triggers by workspace and by specific page, or listen for changes across all pages in a workspace. The node automatically registers and cleans up webhooks with the PageCrawl.io API.
PageCrawl (Action node)
The action node lets you interact with the PageCrawl.io API within your workflows. Available resources and operations:
Page operations

Get: Retrieve details about a monitored page including recent check history.
Quick Create: Add a new page to monitor with just a URL (auto-detects settings).
Create (Advanced): Add a page with full control over elements, actions, conditions, frequency, location, device, and more.
Update: Modify settings on an existing monitored page.
Delete: Remove a page from monitoring.
Run Check Now: Trigger an immediate check on a page.

Check operations

Get History: Retrieve check history for a page with change diffs.
Get Diff Image: Download a visual diff image showing what changed.
Get Diff HTML: Get the change diff as HTML markup.
Get Diff Markdown: Get the change diff as Markdown text.

Screenshot operations

Get Screenshot: Download the latest (or previous) screenshot of a page.
Get Screenshot Diff: Download a side-by-side visual comparison screenshot.

Setting up the integration
Step 1: Install the PageCrawl community node

Open your n8n instance and go to Settings > Community Nodes.
Click Install a community node.
Enter @pagecrawl/n8n-nodes-pagecrawl as the package name.
Click Install and confirm the installation.
Restart n8n if prompted.

Step 2: Add your API credentials

In your PageCrawl.io account, go to Settings > API and copy your API key.
In n8n, go to Credentials and create a new PageCrawl API credential.
Paste your API key and save.

Step 3: Create a workflow with the trigger

Create a new workflow in n8n.
Add the PageCrawl Trigger node.
Select your workspace and (optionally) a specific page to monitor.
Choose which events to listen for: change detected, error, or both.
Click Listen for Test Event to verify the connection. The node will automatically send a test event so you can see the data format.

Step 4: Add workflow actions
With the trigger in place, add any n8n nodes to define what happens when a change is detected. Some examples:

Store changes in a database using the PostgreSQL, MySQL, or MongoDB nodes.
Create a GitHub or GitLab issue for your team to review the change.
Summarize the change with AI using the OpenAI or Anthropic nodes.
Send a notification to Matrix, Mattermost, or any platform with an API.
Trigger an incident in PagerDuty or Opsgenie for critical page changes.

You can also add the PageCrawl action node mid-workflow to fetch additional data, such as downloading a diff image to attach to a notification or retrieving the full page details.
Step 5: Activate
Once your workflow is tested and working, activate it so it runs automatically whenever changes are detected.
Example workflow ideas

Compliance monitoring: When a vendor's terms of service change, use the PageCrawl node to get the diff as Markdown, store it in a database, create a Jira ticket for legal review, and notify the compliance team on Slack.
Competitor intelligence: When a competitor updates their pricing page, get the diff HTML, summarize the key changes with OpenAI, log them in a spreadsheet, and send a summary to your sales channel.
Visual regression tracking: When a page changes, download the screenshot diff image, attach it to a GitHub issue, and alert the design team for review.
Uptime and integrity checks: Listen for error events, trigger a PagerDuty incident, and post an alert to your ops channel when a critical page becomes unreachable.



User Access Roles and Permissions
2026-05-06T14:16:21+00:00
User Access Roles and Permissions
PageCrawl uses role-based access control to manage what each team member can do. There are four roles, each with different permission levels.
Available Roles



Role
Manage Team
Manage Workspaces
Edit Pages
View Pages




Owner
Yes
Yes
Yes
Yes


Administrator
Yes
Yes
Yes
Yes


Standard User
No
No
Yes
Yes


Viewer
No
No
No
Yes



Owner
Each team has exactly one Owner (the account creator). The Owner has full control over all team settings, billing, and member management. Ownership cannot be transferred or removed.
Administrator
Administrators can manage the team on behalf of the Owner:

Invite and remove team members
Change member roles
Assign workspace access to members
Create and delete workspaces
Edit all team and workspace settings (notifications, integrations, AI, etc.)
Full access to all workspaces

Standard User
Standard Users can work within their assigned workspaces:

View and edit monitored pages in assigned workspaces
Create new pages and tracked elements
Review changes and leave feedback
Access all monitoring features within their workspaces

Standard Users cannot invite members, change roles, or access workspaces they haven't been assigned to.
Viewer
Viewers have read-only access to their assigned workspaces:

View monitored pages and detected changes
Browse change history and reports
Cannot create, edit, or delete pages
Cannot modify any settings

Managing Team Members
To manage roles and access:

Go to Settings > Team > Users
View the member list showing name, email, workspaces, and role
Click a member's role to change it (Owner and Administrator only)
Click Update in the Workspaces column to assign or revoke workspace access

Inviting New Members

Go to Settings > Team > Users
Click Invite User
Enter their email address and select a role
The invite expires after 2 weeks. You can resend it if needed.

Workspace Access
Members only see workspaces they've been assigned to. Administrators can assign workspace access per user. If all workspace access is removed from a user, they are removed from the team entirely.
This means you can have team members who only see specific projects, clients, or departments without exposure to other workspaces.


Advanced Configuration Options for Power Users
2026-05-06T14:16:21+00:00
Advanced Configuration Options for Power Users
PageCrawl offers advanced configuration options for users who need fine-grained control over their monitoring setup. This guide covers the key power-user features.
Power User Mode
When editing a monitored page, you can enable Power User mode using the toggle in the page settings. This reveals additional settings that are hidden by default to keep the interface clean for everyday use.
With Power User mode enabled, you get access to:

Engine selection - Choose between the default browser engine, Stealth Mode (for sites that block bots), or Fast mode (optimized for static pages)
Intelligent Reconnect - Automatically retry failed checks with a different approach
Custom User Agent - Set a specific browser user agent string
Custom Headers - Add custom HTTP headers to requests
Custom JavaScript - Run JavaScript code before or after page load
Device emulation - Emulate specific device viewports

Power User settings are marked with a special icon throughout the edit form so you can easily identify them.
Advanced Mode vs Simple Mode
PageCrawl offers two ways to add and edit monitored pages:
Simple Mode (default) guides you through setup step by step. It auto-detects the best settings, shows a live preview, and covers the most common use cases. Best for getting started quickly.
Advanced Mode gives you full control over every setting in a single form. Use it when you need to:

Track multiple elements on the same page simultaneously
Configure complex action sequences
Set up templates or apply existing ones
Fine-tune notification conditions per element
Work with custom selectors, thresholds, and comparison methods

You can switch to Advanced Mode from the Simple Mode page by clicking the "Advanced setup" link at the bottom. If you prefer to always use Advanced Mode, check the "Always show Advanced Setup" option.
Multiple Tracked Elements
Each monitored page can track multiple elements simultaneously, each with its own comparison method:



Type
What It Tracks




Full Page
Entire page text content


Text
Text content of a specific element (by CSS/XPath selector)


Number
Numeric values with configurable change thresholds


Price
Price values with currency detection


Availability
In-stock/out-of-stock status


Links
All outgoing links on the page


Visual
Visual screenshot comparison with diff percentage


HTML
Raw HTML structure of an element


Boolean
Presence or absence of an element


Feed/List
RSS, Atom, or other feed content


Rating
Star ratings or review scores


Reviews
Customer review text and metadata


JavaScript
Values extracted by running custom JavaScript


SEO Tags
Meta tags, Open Graph data, and structured data


PDF
Text content extracted from PDF files


Word
Text content extracted from Word documents


Excel
Data extracted from Excel spreadsheets


CSV
Data extracted from CSV files


PowerPoint
Text content extracted from PowerPoint presentations



Each tracked element can have its own set of actions and comparison settings.
Templates
Templates let you save a monitoring configuration and apply it to multiple pages automatically. This is especially useful when combined with Page Discovery for auto-monitoring newly discovered pages.
To create a template:

Go to Settings > Workspace > Templates
Enter a sample URL to auto-fill settings
Configure tracked elements, actions, check frequency, and notifications
Save the template

Templates can also define URL filters for page discovery, so new pages matching your criteria are automatically monitored with the template's settings.
Bulk Editing
Edit settings across multiple pages at once:

Select pages from your page list using the checkboxes
Click Bulk Edit in the toolbar
Choose what to change: check frequency, engine, proxy, actions, notifications, tags, or folder
Apply changes to all selected pages

Available on paid plans.
AI Configuration
Configure AI-powered change analysis per workspace:

Go to Settings > Workspace > Integrations > AI
Choose your AI provider (OpenAI, Gemini, or Anthropic)
Select a model
Optionally set focus areas to guide the AI on what changes matter most

Each plan includes monthly AI credits. You can also bring your own API key (BYOK) for unlimited usage. See AI BYOK Setup for details.
Custom Check Scheduling
Control exactly when PageCrawl checks your pages:

Go to Settings > Workspace > Schedule
Set active monitoring hours (e.g., business hours only)
Choose which days of the week to run checks
Set the workspace timezone

This helps reduce unnecessary checks during off-hours and keeps your check quota focused on the times that matter.
Global Filters
Apply text filters across all pages in a workspace:

Go to Settings > Workspace > General
Add global ignored text patterns
These patterns are excluded from change detection on every page in the workspace

Useful for filtering out dynamic content like timestamps, ad copy, or session IDs that appear across many pages.
Proxy Configuration
Choose where PageCrawl checks your pages from:

Default - Automatic server selection
Custom proxy - Use your own proxy server for pages behind firewalls or geo-restrictions
Location-specific - Select from available proxy locations (London, New York, San Francisco, Toronto, Frankfurt, Tel Aviv)
Residential - Use residential IP addresses for pages that block datacenter IPs

Configure per page or apply via bulk edit.


JavaScript Tracked Elements and Custom JavaScript Actions
2026-05-06T14:16:21+00:00
JavaScript Tracked Elements and Custom JavaScript Actions
PageCrawl lets you use JavaScript in two powerful ways: as a tracked element to extract and monitor computed values, and as a custom action to manipulate the page before monitoring. Both run JavaScript directly in the browser context with full access to the DOM.
JavaScript Tracked Element
A JavaScript tracked element lets you execute JavaScript code on a page and monitor the return value for changes. This is useful when the data you want to track is not directly accessible via CSS or XPath selectors, for example computed values, data attributes, or content that requires logic to extract.
How to set it up:

Add a new tracked element to your monitored page
Select JavaScript as the element type
Enter your JavaScript code in the code field
The return value of your code becomes the monitored content

How it works: Your JavaScript code runs directly in the browser, giving it full access to the page's DOM, window object, and all standard browser APIs. The return value is captured and compared against the previous check to detect changes.
Examples:
Extract the page title:
document.title
Get text from a specific element:
document.querySelector('.status-badge').innerText
Count the number of items in a list:
document.querySelectorAll('.job-listing').length
Extract a data attribute:
document.querySelector('[data-version]').getAttribute('data-version')
Combine multiple values into one:
Array.from(document.querySelectorAll('.feature-list li')).map(el => el.textContent.trim()).join(', ')
Extract JSON-LD structured data:
JSON.parse(document.querySelector('script[type="application/ld+json"]').textContent).name
Count words on a page:
document.body.innerText.split(/\s+/).filter(w => w.length > 0).length
Advanced Examples
For multi-line logic, wrap your code in an immediately invoked function:
Extract a software version number from a release page:
(() => {
  const text = document.querySelector('.release-header, [class*="version"]')?.textContent || '';
  const match = text.match(/v?(\d+\.\d+\.\d+)/);
  return match ? match[1] : 'Version not found';
})()
Build a summary from a table:
(() => {
  const rows = document.querySelectorAll('table tbody tr');
  return Array.from(rows).map(row => {
    const cells = row.querySelectorAll('td');
    return Array.from(cells).map(c => c.textContent.trim()).join(' | ');
  }).join('\n');
})()
Count job listings by department:
(() => {
  const jobs = document.querySelectorAll('.job-listing');
  const departments = {};
  jobs.forEach(job => {
    const dept = job.querySelector('.department')?.textContent.trim() || 'Other';
    departments[dept] = (departments[dept] || 0) + 1;
  });
  return Object.entries(departments).map(([k, v]) => `${k}: ${v}`).join('\n');
})()
Extract all outbound links from a page:
(() => {
  const host = window.location.hostname;
  const links = Array.from(document.querySelectorAll('a[href]'))
    .map(a => a.href)
    .filter(href => href.startsWith('http') && !href.includes(host));
  return [...new Set(links)].join('\n');
})()
Monitor the number of open issues or pull requests:
(() => {
  const text = document.querySelector('[data-tab-item="issues"] .Counter, .issues-count')?.textContent.trim();
  return text ? parseInt(text.replace(/,/g, ''), 10) : 'Not found';
})()
Extract and format event dates from a schedule page:
(() => {
  const events = document.querySelectorAll('.event-item, .schedule-row');
  return Array.from(events).map(ev => {
    const date = ev.querySelector('.date, time')?.textContent.trim();
    const title = ev.querySelector('.title, .event-name')?.textContent.trim();
    return `${date}: ${title}`;
  }).join('\n');
})()
Important notes:

Your code should return a value (string, number, or any value that can be converted to text)
If the return value is null or undefined, an empty string is stored
Errors in your code will cause the check to fail for that element
JavaScript tracked elements require a real browser engine (not compatible with Fast mode)

Custom JavaScript Actions
Custom JavaScript actions let you run JavaScript code on the page as part of the action sequence, before the tracked elements are extracted. Use them for complex interactions that other action types (click, type, wait) cannot handle.
How to set it up:

Open the page settings and go to the Actions section
Add a new action and select Custom JavaScript
Enter your JavaScript code
The code runs during the check, before element extraction

How it works: The JavaScript runs in the browser context, similar to tracked elements. The key difference is that the return value is ignored. JavaScript actions are used for their side effects: modifying the DOM, triggering events, or setting up the page state needed for accurate monitoring.
When to use JavaScript actions: PageCrawl has built-in actions for common tasks like clicking elements, typing text, scrolling, waiting, removing elements, and selecting dropdown options. Use JavaScript actions when you need to do something the built-in actions cannot handle, such as setting browser storage, dispatching custom events, modifying element properties, or running multi-step DOM manipulation.
Examples:
Set localStorage or sessionStorage to change page behavior:
localStorage.setItem('region', 'us-east')
Set a cookie to bypass a language selector or A/B test:
document.cookie = 'lang=en; path=/; max-age=86400'
Replace dynamic content (session IDs, timestamps, random tokens) with static text to reduce false positives:
document.querySelectorAll('[data-session-id], .csrf-token, .nonce').forEach(el => el.textContent = '[REDACTED]')
Trigger a framework event that a regular click action does not fire (e.g., React, Vue, Angular):
(() => {
  const input = document.querySelector('#search-input');
  const nativeInputValueSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, 'value').set;
  nativeInputValueSetter.call(input, 'monitoring keywords');
  input.dispatchEvent(new Event('input', { bubbles: true }));
})()
Toggle a checkbox and dispatch both change and click events to satisfy form validation:
(() => {
  const checkbox = document.querySelector('#agree-terms');
  checkbox.checked = true;
  checkbox.dispatchEvent(new Event('change', { bubbles: true }));
  checkbox.dispatchEvent(new Event('click', { bubbles: true }));
})()
Switch a page to a specific view mode by modifying URL parameters without a full reload:
(() => {
  const url = new URL(window.location);
  url.searchParams.set('view', 'list');
  url.searchParams.set('per_page', '100');
  window.history.replaceState({}, '', url);
  window.dispatchEvent(new PopStateEvent('popstate'));
})()
Expand all collapsed sections at once on a FAQ or documentation page:
document.querySelectorAll('details:not([open])').forEach(el => el.setAttribute('open', ''))
Remove inline styles that hide content behind a paywall or login wall:
(() => {
  document.querySelectorAll('.article-body, .content-area').forEach(el => {
    el.style.maxHeight = 'none';
    el.style.overflow = 'visible';
    el.classList.remove('truncated', 'blurred', 'paywall');
  });
  document.querySelectorAll('.paywall-overlay, .signup-gate').forEach(el => el.remove());
})()
Important notes:

In the default engine, JavaScript action errors are silently caught and the check continues. In Stealth mode, action errors will stop the remaining action sequence by default
Actions run after the page has loaded but before elements are extracted
You can chain multiple JavaScript actions with other action types (click, wait, type)
JavaScript actions require a real browser engine (not compatible with Fast mode)

Difference Between JavaScript Elements and Actions




JavaScript Tracked Element
Custom JavaScript Action




Purpose
Extract and monitor a value
Manipulate the page before extraction


Return value
Captured and tracked for changes
Ignored


Error handling
Check fails if code errors
Default engine: errors silently ignored. Stealth mode: errors stop the action sequence


When it runs
During element extraction
Before element extraction (in action sequence)


Use case
"Get me this computed value"
"Set up the page so I can monitor it correctly"



Common Patterns
Extract then monitor: Use a JavaScript action to set up the page (e.g., click "Load more"), then use a regular Text or Full Page tracked element to capture the content. This is often simpler than writing a JavaScript tracked element.
Normalize before compare: Use a JavaScript action to replace dynamic content (timestamps, session IDs, random values) with static placeholders, then track the normalized page content. This reduces false positives without needing global filters.
Complex extraction: When the value you want to monitor requires logic (math, filtering, combining multiple elements), use a JavaScript tracked element instead of trying to target it with CSS selectors.
What JavaScript Has Access To
Your code runs in the browser page context with full access to:

DOM API - document.querySelector(), document.body, document.title, etc.
Window object - window.location, window.innerWidth, window.scrollTo(), etc.
Standard JavaScript - String methods, Array methods, Math, JSON, RegExp, etc.
Browser APIs - localStorage, sessionStorage, fetch(), etc.
Page state - Any JavaScript variables or functions defined by the page itself

Your code does not have access to Node.js APIs or the file system.


Monitoring Multiple Elements on a Page
2026-03-05T10:31:12+00:00
Monitoring Multiple Elements on a Page
PageCrawl lets you track multiple parts of the same page independently. Each tracked element gets its own comparison method, selector, label, and threshold, so you can monitor different sections of a page with the settings that make the most sense for each one.
Why Track Multiple Elements
Different parts of a page often change in different ways. For example, on a product page you might want to:

Track the price using the Price element type so you are alerted when it goes up or down
Track the stock status using the Availability element type so you know when an item is back in stock
Track the product description as text so you catch content updates

Each of these uses a dedicated element type designed for that kind of data, giving you more precise alerts and fewer false positives than tracking the entire page as a single unit.
Supported Element Types
Each tracked element can use one of these comparison types:



Type
Description




Full Page
Tracks the entire visible page content


Text
Extracts and compares text content from a CSS/XPath selector


Number
Extracts a numeric value for threshold-based comparison


Price
Specialized number extraction that handles currency symbols and formatting


Availability
Detects in-stock/out-of-stock status from common patterns


Visual
Compares screenshots of a specific element for visual changes


HTML
Compares the raw HTML of a selected element


Boolean
Checks whether an element exists or is visible on the page


Links
Extracts and compares all links within a selected area


JavaScript
Evaluates a custom JavaScript expression and tracks the return value


Text (All Matches)
Extracts text from all elements matching a selector


Text (All Matches Sorted)
Same as above, but sorted alphabetically for order-independent comparison


HTML (All Matches)
Extracts HTML from all elements matching a selector



How to Add Multiple Elements

Open the page you want to monitor and click Edit
Switch to Advanced Mode using the toggle at the top of the editor
You will see your current tracked element listed
Click Add Element to add another tracked element
Configure each element with its own selector, type, label, and threshold
Save your changes

Simple vs Advanced Mode

Simple Mode tracks a single element on the page. This is the default for new monitors and is the easiest way to get started.
Advanced Mode unlocks the ability to track multiple elements. Switch to Advanced Mode using the toggle in the page editor.

Once you add more than one tracked element, the monitor stays in Advanced Mode. To return to Simple Mode, remove the extra elements first so only one remains.
Per-Element Settings
Each tracked element has its own independent settings:

Label - A descriptive name for the element (e.g., "Product Price", "Stock Status")
Selector - A CSS selector or XPath expression that identifies the element on the page
Type - The comparison method to use (text, number, visual, etc.)
Threshold - How much the value needs to change before triggering a notification
Include hidden text - Whether to include text from elements hidden via CSS

Click-to-Select
You do not need to write CSS selectors or XPath expressions manually. Use the visual selector tool to click on elements directly on the page. PageCrawl generates the appropriate selector for you automatically.
Use Cases
Product page monitoring - Use the Price element type for the product price, the Availability element type for stock status, and a Text element for the product description. Each triggers its own alert so you know exactly what changed.
Content sections and sidebar tracking - Monitor the main article content as text and the sidebar navigation as HTML. Catch content updates without being distracted by layout changes.
Multi-section compliance monitoring - Track terms of service, privacy policy sections, and legal disclaimers as separate elements on the same page. Each section triggers its own alert when updated.
Related Articles

Advanced Configuration
Available Tracked Types
Perform Actions



Perform Actions: Automate Browser Interactions Before Monitoring
2026-05-06T14:16:21+00:00
Perform Actions: Automate Browser Interactions Before Monitoring
Actions are tasks that PageCrawl executes in the browser before taking a page snapshot. They let you automate interactions like dismissing cookie banners, clicking tabs, logging in, scrolling to load content, or waiting for dynamic elements to appear.
Actions are configured per tracked element and execute in order from top to bottom.
Where to Configure Actions
Open any monitored page and click Edit. In the page configuration form, find the Actions section. Click Add Action to add a new action, then select the action type from the dropdown.
Available Actions
Error Handling



Action
What It Does




Mark as failed
Mark the check as failed when conditions are met (page inaccessible, contains specific text, etc.)



Block and Hide



Action
What It Does




Block cookie banners & ads
Automatically hide cookie consent banners and block ads


Hide website overlays & popups
Hide website overlays and popups


Remove dates
Replace dates with "[DATE REMOVED]" to prevent false positives


Remove element
Remove a specific element by CSS or XPath selector


Remove text
Remove elements containing specific text



Wait



Action
What It Does




Wait for text
Wait up to 15 seconds for specific text to appear on the page


Wait for text to disappear
Wait up to 15 seconds for specific text to disappear


Wait for element
Wait for an element (by XPath or CSS selector) to appear


Wait for redirect
Wait for the page to redirect to a new URL


Wait
Pause for a specified number of seconds



Interact



Action
What It Does




Click button
Click an element containing specific text


Click element
Click any element by CSS or XPath selector


Click at coordinates
Click at specific X/Y pixel coordinates


Hover
Hover over an element


Type text
Type text into an input field


Select option
Select an option from a dropdown


Submit form
Submit a form


Scroll to bottom
Scroll the page to the bottom (useful for lazy-loaded content)


Go back
Navigate back in browser history


Reveal hidden text
Make hidden text visible. Has two modes: "Expandable Sections Only" (expands collapsible sections and accordions) and "All invisible text" (reveals all hidden text on the page)



Advanced



Action
What It Does




Disable JavaScript
Disable JavaScript before the page loads


Set cookie
Set or manage browser cookies


Execute JavaScript
Run custom JavaScript code on the page


Store Contents for Tracked Element
Store a tracked element's value at this point in the action sequence, useful when the element is only visible after a specific interaction


Handle CAPTCHA
Interact with CAPTCHA challenges



Common Use Cases
Dismiss cookie banners: Add a "Block cookie banners & ads" action to automatically hide consent popups and ads that can trigger false change notifications.
Load lazy content: Add "Scroll to bottom" followed by "Wait" (2-3 seconds) to load content that only appears when scrolling.
Navigate to a tab or section: Add a "Click element" action with the CSS selector of the tab you want to monitor.
Login to a page: Add "Type text" actions for username and password fields, followed by "Click button" to submit the login form.
Wait for dynamic content: Add "Wait for text" with the text that appears after the page finishes loading (e.g., "Showing results").


Review Board: Organize and Track Page Changes
2026-05-06T14:16:21+00:00
Review Board: Organize and Track Page Changes
The Review Board is a Kanban-style board that helps you organize and track detected changes across your monitored pages. Instead of reviewing changes one by one, you can drag and drop change cards between customizable lanes to manage your review workflow.
Accessing the Review Board
Navigate to the Review tab in the main sidebar to open the board.
How It Works
Each time PageCrawl detects a change on one of your monitored pages, a card appears on the board. Cards show:

Page name and URL
Time since the change was detected
Visual difference percentage
AI priority score and importance tag (if AI is enabled)

Click any card to view the full change details, timeline, and AI summary.
Customizing Lanes
By default, the board includes three lanes: To Review, Reviewed, and Flagged. You can customize these to match your workflow:

Click the + button to add a new lane
Give the lane a name and pick a color
Drag lanes to reorder them
Click the gear icon in the lane header to edit or delete it

Common lane setups:

New / In Review / Done - Simple three-stage workflow
New / Important / Needs Action / Archived - Priority-based workflow
New / Design Team / Dev Team / Resolved - Team-based workflow

Filtering and Sorting
Use the toolbar at the top of the board to filter changes:

Folders - Show changes from a specific folder
Tags - Filter by label
Website - Filter by website domain
Date range - All time, Today, Yesterday, Last 7 days, Last 30 days, Last 90 days, This week, Last week, This month, Last month, This year, Last year, and Custom range
Priority - Filter by AI priority score
Sort - Order cards by most recent, oldest, or priority score

Feedback Auto-Review
When enabled, giving thumbs-up or thumbs-down feedback on a change notification automatically moves the card to your "Reviewed" lane. Enable this from the gear icon menu on the board.
You can configure which lane cards move to after positive or negative feedback.


Sitemap Monitoring: Automatically Detect New Pages on Any Website
2026-05-06T14:16:21+00:00
Sitemap Monitoring: Automatically Detect New Pages on Any Website
Most websites maintain an XML sitemap listing every page on the site. They do this for SEO: a sitemap tells Google, Bing, and other search engines exactly which URLs exist, when each one was last modified, and how often it changes. Without a sitemap, search engines have to discover pages by crawling links one by one, which is slow and often misses freshly published or deeply nested content. Because Google rewards indexable content, almost every CMS (WordPress, Shopify, Squarespace, Wix, etc.) generates and publishes a sitemap automatically.
For change monitoring, that same sitemap is a goldmine - it is the website's own up-to-date list of every page that matters, maintained by the site itself. PageCrawl can monitor these sitemaps to detect new pages, removed URLs, and structural changes automatically.
PageCrawl supports two distinct ways to monitor a sitemap, and you should pick the one that fits your goal:

Page Discovery (Scan a Website) — turns each new URL into its own tracked page with full change history, screenshots, content alerts, and AI summaries. Best for deep monitoring of individual pages.
Feed tracking mode — treats the sitemap URL as a single tracked element and emits item-level alerts when URLs are added or removed. Best for lightweight new-URL alerts when you do not need per-page content tracking.

Most teams pick one or the other for a given site depending on whether they need deep per-page tracking or just new-URL alerts.
Approach 1: Page Discovery (Scan a Website)
This is the heavy-duty approach. Each new URL discovered in the sitemap becomes its own tracked page in your workspace, with full change history, screenshots, content alerts, and AI summaries.
How it works

PageCrawl downloads the website's XML sitemap on your configured schedule
New URLs are compared against the previous scan
Newly discovered pages are matched against your filters
You receive a notification listing the new pages
Optionally, matched pages are auto-monitored for content changes

Setting it up

Click Track New Page and select Scan a Website
Enter the website URL (e.g., competitor.com)
PageCrawl automatically detects the sitemap
Set your check frequency and add filters
Enable notifications and optionally enable auto-monitoring

Filtering discovered pages
Large websites may add many pages between checks. Filters help you focus on what matters:

URL filters - Match by path patterns (e.g., /products/, /blog/2026/*)
Exclude filters - Skip irrelevant sections (e.g., /products/accessories/)
Title/content filters - Match against page title or body text after fetching

Exclude filters always take priority over include filters. You can combine multiple filter types.
Auto-monitoring
When auto-monitoring is enabled, pages matching your filters are automatically added to your monitoring workspace. For example:

A competitor publishes a new product page on Monday
Sitemap monitoring discovers the URL the same day
From Tuesday onward, PageCrawl tracks that page for price and content changes

No manual setup required. Combined with templates, auto-monitored pages inherit your preferred check frequency, notification channels, and tracking settings.
Beyond sitemaps
Not all websites have complete sitemaps. PageCrawl supplements sitemap monitoring with additional discovery methods:

Base URL Link Discovery - Extracts all links from a specific page
Deep Scan - Follows links multiple levels deep with JavaScript rendering
Automatic Mode - Runs all discovery methods together and deduplicates results

See Page Discovery for full details on all discovery methods.
Plan limits
Sitemap monitoring via Page Discovery is available on all plans:



Plan
Pages per Website




Free
Up to 2,000


Standard
Up to 20,000


Enterprise
Up to 100,000


Ultimate
Up to 100,000



All plans include filters, notifications, and auto-monitoring.
Approach 2: Feed Tracking Mode
This is the lightweight approach. Instead of creating one tracked page per URL, the entire sitemap becomes a single tracked element. You get an alert when URLs are added or removed, but PageCrawl does not fetch or track the content of each page.
How it works

PageCrawl fetches the sitemap XML on your configured schedule
The XML is parsed into a list of items - one per  entry
Each item is identified by its  URL (the stable key)
The new list is compared against the previous check using the keys
You receive a notification listing the URLs that were added or removed

There is only one Change record in your workspace - the sitemap monitor itself - regardless of how many URLs the sitemap contains.
Setting it up

Click Track New Page
Paste the sitemap URL directly (e.g., competitor.com/sitemap.xml)
PageCrawl auto-detects it as a sitemap and switches to Feed mode
Confirm the preview shows the URLs you expect
Adjust the Track first N items cap if needed
Choose your notification channels and save

The item limit
Feeds are capped at a per-plan number of items so a 50,000-URL sitemap does not produce 50,000-item JSON blobs on every check:



Plan
Maximum Items Per Feed




Free
10


Standard
100


Enterprise
1,000


Ultimate
10,000



Items are returned in document order. For RSS and Atom feeds this is fine because the newest items are conventionally at the top, but sitemaps do not guarantee that. If your sitemap has more URLs than your plan cap, the UI shows a notice and suggests either raising the cap or using Page Discovery instead, which has no per-feed cap (it uses your monitor quota).
For sites with both a sitemap and an RSS or Atom feed, the RSS/Atom feed is usually a better choice for Feed mode because new content is guaranteed to appear at the top. Try /feed, /rss, or /atom.xml on the site.
When to choose Feed mode

You only need new-URL alerts, not per-page change tracking
The site has a small or medium sitemap that fits inside your plan's item cap
You do not want each URL consuming a monitor slot from your plan

For fully-fledged monitoring with per-page change history, screenshots, content alerts, AI summaries, and proper handling of large sitemaps, use Page Discovery (Scan a Website) instead. Feed mode is intentionally minimal - it is a fast way to get new-URL notifications without the overhead of tracking each page, but it cannot replace Page Discovery for serious change monitoring.
Sitemap vs RSS coverage (important)
If you are choosing between monitoring a site's sitemap and its RSS or Atom feed, the two are not equivalent:

A sitemap lists every indexable URL on the site. A WordPress blog with 500 posts will have all 500 in sitemap.xml. New posts appear there as soon as the CMS regenerates the sitemap.
An RSS or Atom feed is typically a rolling window of the most recent 10 to 20 posts. Older entries fall off the end as new ones arrive. The feed is designed for "what is new", not "what exists".

For tracking new content, both work - the RSS feed is usually more reliable because new posts are guaranteed to appear at the top, but you cannot use the RSS feed to discover the site's full back catalog. Use the sitemap when you need complete URL coverage and the RSS feed when you only care about new content.
Related Articles

Feed tracking mode - lightweight alternative that treats the sitemap as a single tracked feed instead of auto-creating per-page monitors
Page Discovery - other discovery methods (URL Scanning, Deep Crawl, Automatic Mode)
Organized page monitoring - templates and folders for keeping auto-monitored pages tidy



Web Archiving with WACZ: Preserve Full Page Snapshots
2026-03-26T05:33:22+00:00
Web Archiving with WACZ: Preserve Full Page Snapshots
PageCrawl can automatically create a full web archive of your monitored pages every time a change is detected. Archives capture the complete page (HTML, CSS, images, scripts) so you can replay it exactly as it appeared at that moment.
Archives are saved in the WACZ (Web Archive Collection Zipped) format, an open standard for web archiving used by libraries, governments, and legal teams worldwide.
Available on Ultimate plan.
How It Works

PageCrawl detects a change on a monitored page
A full WACZ archive is created capturing the complete page state
The archive is stored securely in the cloud
You can replay the archived page at any time from the change history

If WACZ generation fails (e.g., due to complex page structure), PageCrawl falls back to creating a self-contained HTML archive instead.
How Archives Differ from Screenshots
PageCrawl offers both screenshots and web archives, but they serve different purposes:




Screenshot
Web Archive (WACZ)




What it captures
A flat image of the visible page
The complete page: HTML, CSS, JavaScript, images, fonts


Interactivity
None (static image)
Fully interactive: scroll, click links, hover over elements


Content below the fold
Only if full-page screenshot is enabled
Always included, the entire page is preserved


Dynamic content
Shows one visual state
Preserves interactive elements, dropdowns, tabs


File size
Small (typically under 1 MB)
Larger (includes all page assets)


Best for
Quick visual reference, visual diff comparison
Legal evidence, compliance records, full preservation



Screenshots are great for a quick visual snapshot and for visual change detection (highlighting pixel differences). Web archives go further by preserving the entire page so you can interact with it later exactly as it appeared.
How PageCrawl Archives Differ from Archive.org
The Internet Archive (archive.org) and PageCrawl both preserve web pages, but they work very differently:
Archive.org (Wayback Machine):

Public, community-driven project that crawls the open web
Snapshots are taken on their own schedule (often weeks or months apart)
No control over when or how often pages are archived
Pages behind logins, paywalls, or bot protection are usually not captured
Anyone can view the archived pages
No change detection or notifications

PageCrawl Web Archiving:

Private to your account, stored securely in the cloud
Archives are created automatically every time a change is detected
You control the check frequency (every 5 minutes to daily)
Works with pages behind logins using browser actions (click, type, wait)
Works with pages behind bot protection
Archives are paired with change detection, so you know exactly what changed and when
Download WACZ files for offline storage or legal use

In short, archive.org is best for general public web preservation. PageCrawl archiving is designed for active monitoring where you need precise, private, frequent snapshots tied to detected changes.
Viewing Archives
To view an archived page:

Open a monitored page and go to its change history
Click on any check that has an archive (indicated by an archive icon)
The archive viewer opens, showing the page exactly as it appeared
Use the previous/next arrows to browse between archived versions

The viewer uses ReplayWeb.page to render WACZ archives interactively in your browser. You can scroll, click links, and interact with the page as if you were browsing it live at that point in time.
Downloading Archives
You can download any archive file directly:

Open the archive viewer for the check you want
Click the download button to save the WACZ file
Open it with any WACZ-compatible viewer (ReplayWeb.page, Webrecorder, etc.)

Downloaded archives can be used for legal evidence, compliance records, or offline browsing.
Use Cases

Legal and compliance - Preserve evidence of website content at specific dates for disputes, contracts, or regulatory compliance
Competitive intelligence - Keep a historical record of competitor pages, pricing, and product offerings
Content auditing - Track how your own website evolves over time with complete snapshots
Journalism - Archive source pages to preserve evidence that may be modified or removed

Enabling Archives
Archives are enabled at the workspace level. Contact support or check your workspace settings to enable archiving for your monitored pages.


Workspaces: Organize Monitoring by Project or Team
2026-03-05T10:31:12+00:00
Workspaces: Organize Monitoring by Project or Team
Workspaces let you organize your monitored pages into separate environments, each with its own settings, notifications, and team member access. Use workspaces to separate monitoring by project, client, department, or any other grouping that makes sense for your workflow.
What Each Workspace Gets
Every workspace has independent settings for:

Monitored pages - Each workspace contains its own set of tracked pages
Notification preferences - Separate email frequency, Slack/Discord/Teams/Telegram channels
AI configuration - Different AI provider, model, and focus areas per workspace
Check scheduling - Custom active hours and days for monitoring
Timezone - Each workspace can use a different timezone
Labels and tags - Workspace-specific labels for organizing pages
Templates - Page discovery templates tied to each workspace

Creating a Workspace

Go to Settings > Team > Workspaces
Click Add New Workspace
Enter a name for the workspace
Configure the workspace settings

Switching Between Workspaces
Use the workspace selector dropdown in the sidebar to switch between your workspaces. Each workspace shows its own set of pages, changes, and settings.
Managing Access
Administrators can control which team members have access to each workspace:

Go to Settings > Team > Workspaces
Find the workspace in the list
Click Update in the Access column
Add or remove team members

Members only see workspaces they've been assigned to. This lets you give client-facing teams access to client workspaces without exposing internal monitoring.
See User Roles & Permissions for details on what each role can do.
Common Setups
By client: One workspace per client, each with its own notification channels and team access.
By department: Marketing monitors competitor pages, Legal monitors compliance pages, Product monitors feature pages, each in their own workspace.
By priority: A "Critical" workspace with immediate notifications and frequent checks, and a "Background" workspace with weekly reports and less frequent checks.
By region: Separate workspaces for different geographic regions, each with region-specific proxy settings and timezones.


Save Screenshots to Dropbox
2026-05-06T14:16:21+00:00
Save Screenshots to Dropbox
PageCrawl can automatically save page screenshots to your Dropbox whenever a change is detected. This gives you a visual archive of every change, stored in your own cloud storage for easy access and sharing.
How It Works
When a change is detected on a monitored page and screenshots are enabled, PageCrawl uploads the screenshot to your chosen Dropbox folder. Files are organized by page name and timestamp:
{your-folder}/{page-name}/{datetime}.jpg
This makes it easy to browse through the history of visual changes for any monitored page.
Setting Up Dropbox Sync

Go to Settings > Integrations
Click Setup on the Dropbox integration
In the modal that opens, click Authenticate with Dropbox
Authorize PageCrawl in the Dropbox OAuth window that opens
Select a folder in your Dropbox where screenshots should be stored

Once connected, screenshots will be uploaded automatically whenever a change is detected on any of your monitored pages that have screenshots enabled.
Managing the Connection
After connecting your Dropbox account, you can:

View account info - See which Dropbox account is connected
Change folder - Select a different folder for screenshot storage
Revoke access - Disconnect your Dropbox account to stop automatic uploads

Troubleshooting
If your Dropbox access token expires, the connection is automatically disabled and you will receive a notification. Simply reconnect your Dropbox account at Settings > Integrations to restore screenshot syncing.
Availability
Dropbox screenshot sync is available on all plans.


AI Assistants (MCP Server)
2026-04-14T06:20:28+00:00
AI Assistants (MCP Server)
PageCrawl includes a built-in MCP (Model Context Protocol) server that lets AI assistants manage your page monitors. You can add monitors, check history, trigger checks, and more, all through natural conversation with tools like Claude or ChatGPT.
MCP is an open protocol that standardizes how AI tools connect to external services. Once connected, your AI assistant can directly interact with your PageCrawl account without you needing to use the web interface or API manually.
Available on all plans. Free plan users have read-only access (list monitors, view history, check diffs). Paid plans (Standard and above) have full access including creating monitors, triggering checks, and managing tags.
What You Can Do
The MCP server provides the following tools that your AI assistant can use:



Tool
What It Does




Add page monitor
Create a new monitor with URL, tracking mode, frequency, and notifications


List monitors
Search and view monitors across all workspaces by URL, domain, or name


Get monitor details
See full configuration of a specific monitor including tracked elements and latest values. Supports batch requests


Get monitor history
Retrieve historical checks and detected changes with AI summaries. Supports batch requests


Get latest values
Quickly retrieve just the current values for one or more monitors (e.g., current price). Supports batch requests


Get check diff
View the actual text differences detected in a specific check


Trigger check
Trigger a one-off check on a monitor


Manage tags
List workspace tags, or add and remove tags from monitors


Mark changes seen
Mark detected changes as reviewed on one or all monitors


List templates
View available templates that can be applied when creating monitors


List workspaces
View all your teams and workspaces with their IDs


Update monitor defaults
View or update default settings for new monitors created via MCP



Supported Element Types
When creating monitors through MCP, you can track the following element types:

Full Page - Entire page text content (no selector needed)
Text - Text content of a specific element (CSS selector required)
Number - Numeric values with change thresholds
Price - Price values with currency detection
HTML - Raw HTML structure of an element
JavaScript - Execute JavaScript and track the result
File Hash - Monitor file changes by checksum (no selector needed)
PDF - Track changes in PDF documents (no selector needed)

Setting Up with Claude (Web & Desktop)

Open claude.ai or Claude Desktop and go to Settings
Navigate to the Connectors section in the left sidebar
Click Add custom connector at the bottom of the page
Enter a name (e.g. "PageCrawl") and set the URL to: https://pagecrawl.io/mcp
Click Add. You will be redirected to PageCrawl to authorize access
Log in (if not already) and click Approve
PageCrawl tools are now available in your conversations



Setting Up with Claude Code
Add the following to your .mcp.json file (in your project root or ~/.claude/):
{
  "mcpServers": {
    "pagecrawl": {
      "url": "https://pagecrawl.io/mcp"
    }
  }
}
When Claude Code first tries to use PageCrawl tools, it will open a browser window for you to authorize the connection via OAuth.
Setting Up with ChatGPT
Works with ChatGPT on web, desktop, and mobile. Requires a ChatGPT Plus, Pro, Team, Enterprise, or Edu plan.

Go to chatgpt.com (or open the ChatGPT desktop app)
Navigate to Settings > Connectors > Create
Enter a name (e.g. "PageCrawl"), a short description, and set the URL to: https://pagecrawl.io/mcp
Click Create. You will be redirected to PageCrawl to authorize access
Log in and click Approve
To use in a conversation, click the + button near the message input, select More, and enable PageCrawl

Setting Up with Other MCP Clients (OAuth)
Any MCP-compatible client that supports OAuth can connect to PageCrawl. The server details:

URL: https://pagecrawl.io/mcp
Authentication: OAuth 2.0 (automatic via MCP protocol)
Protocol: MCP over HTTP with JSON-RPC 2.0
OAuth Discovery: https://pagecrawl.io/.well-known/oauth-authorization-server

The client will handle the OAuth flow automatically. No manual token setup is required.
Setting Up with API Token (OpenClaw, Cursor, Cline, Windsurf, and others)
For MCP clients that do not support OAuth, you can connect using a personal API token instead. This works with OpenClaw, Cursor, Cline, Windsurf, VS Code, Claude Code, and any other MCP client that supports custom headers.
Step 1: Generate an API token in PageCrawl:

Go to Settings > API
Click Create Token
Give it a name (e.g. "OpenClaw") and click Create
Copy the token. It will only be shown once.

Step 2: Add the following configuration to your MCP client. The JSON format below works with Cursor (.cursor/mcp.json), Cline, Windsurf (.vscode/mcp.json), Claude Code (.mcp.json), and most other clients:
{
  "mcpServers": {
    "pagecrawl": {
      "url": "https://pagecrawl.io/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}
For OpenClaw, use the CLI:
openclaw mcp set pagecrawl \
  --transport streamable-http \
  --url https://pagecrawl.io/mcp \
  --header "Authorization: Bearer YOUR_TOKEN_HERE"
For Cursor, you can also add via Settings > MCP Servers > Add > Streamable HTTP and enter the URL and authorization header there.
Note: API tokens require a paid plan. Treat your token like a password. You can revoke tokens at any time from Settings > API.
Example Conversations
Once connected, you can interact with PageCrawl naturally:
Adding monitors:

"Monitor example.com/pricing every hour and track the full page text"


"Set up price tracking for these 3 product pages: [url1], [url2], [url3]. Check every 15 minutes and notify me on Slack when prices drop."

Checking current values:

"What's the current price on my Amazon product monitor?"


"Compare the prices across all my competitor monitors right now"

Reviewing changes:

"Show me all monitors that changed in the last 24 hours with a summary of what changed"


"Show me the diff for the terms of service page. What exactly was added or removed?"

Analysis and reporting:

"Which of my monitors have had the most changes this month? Are there any patterns?"


"Give me a weekly summary: how many changes were detected across all monitors, which ones had price drops, and which ones had errors?"

Batch operations:

"Tag all monitors tracking amazon.com with 'competitor' and 'ecommerce'"


"Check the latest values for all monitors tagged 'pricing' and tell me which products are currently out of stock"

Troubleshooting:

"Are any of my monitors failing? Show me the ones with errors and what the issue is"


"The pricing page monitor hasn't detected changes in weeks. Trigger a fresh check and show me what it finds"

Setting up workflows:

"Create a monitor for each of these 5 competitor pricing pages. Use the 'competitor-tracking' template and tag them all as 'q2-research'"


"Monitor the SEC EDGAR page for new filings from Tesla. Use content-only mode so it ignores the navigation, check every 30 minutes"

Working with Workspaces
All tools automatically search across every workspace you have access to. You do not need to know which workspace a monitor is in to find or interact with it.

Use List monitors with the search parameter to find monitors by URL, domain, or name
Use List monitors with workspace_id to filter results to a specific workspace
Use List workspaces to see all your teams and workspaces with their IDs
Add page monitor only requires a workspace_id if you have more than one workspace

Limits and Quotas
MCP operations respect your plan's limits:

Monitor creation counts toward your page monitor quota
Triggered checks are rate limited and placed in a deprioritized queue, so checks may take a while to complete. This is designed for occasional, manual use only (one or two checks at a time). It does not support programmatic or automated triggering - requests that exceed rate limits will be rejected with an error. Instead, configure the check frequency on each monitor and use scheduling settings to run checks at specific times.
If you exceed your monitor limit, new monitors are created in a disabled state
If you exceed your check limit, manual check triggers will be rejected

See Check Limits and Website Limits for details on plan quotas.


Webhook Integration: Send Change Data to Any External Service
2026-05-06T14:16:21+00:00
Webhook Integration: Send Change Data to Any External Service
Webhooks allow PageCrawl to send HTTP POST requests to any external URL whenever a page change is detected or an error occurs. Use webhooks to connect PageCrawl with custom applications, automation platforms, databases, or any service that accepts HTTP requests.
Setting Up a Webhook

Go to Settings > Webhooks (found under "Other" in the sidebar)
Click Add Webhook
Enter your target URL and configure the options below
Click Save

Configuration Options
Target URL: The HTTP endpoint that will receive the POST request.
Event Triggers: Choose which events fire the webhook:

Change detected - Fires when page content changes
Error - Fires when a check fails (timeout, blocked, 404, etc.)
Or both

Page Filter: Optionally limit which pages trigger the webhook. You can filter by:

All pages in workspace (default)
By tag
By folder
By website/domain
Specific monitors

If no filter is set, the webhook fires for all pages in the workspace.
Active/Inactive Toggle: Disable a webhook without deleting it.
Payload Fields
By default, webhooks send all available fields. You can customize the payload by selecting only the fields you need:



Category
Fields




Basic
id, title, status, event_type, changed_at, visual_diff, difference, human_difference, short_summary


Tracked Elements
content_type, elements (array of tracked element data)


Differences
markdown_difference, html_difference


Images
text_difference_image, page_screenshot_image


Page Info
page metadata, page_elements array


Content
contents, original (for extracted values)


Comparison
previous_check data


JSON
json, json_patch


AI
ai_summary, ai_priority_score



Testing Webhooks
After saving a webhook, click the Test button to send a sample payload to your endpoint. This verifies the connection works before relying on it for real notifications.
Example Payload
{
  "id": 12345,
  "title": "Product Page - Example.com",
  "status": "ok",
  "event_type": "change_detected",
  "content_type": "fullpage",
  "changed_at": "2026-01-15T10:30:00Z",
  "visual_diff": 12.5,
  "difference": 3,
  "human_difference": "3 lines changed",
  "short_summary": "Price updated from $99 to $89",
  "ai_summary": "The product price was reduced by 10%.",
  "ai_priority_score": 85
}
Use Cases

Custom dashboards - Feed change data into your own monitoring dashboard
Database logging - Store all detected changes in your own database
Automation workflows - Trigger actions in tools like n8n, Make, or custom scripts
Alerting systems - Forward high-priority changes to PagerDuty, Opsgenie, or similar tools

Notes

Webhooks send data as HTTP POST with a JSON body
If you need Slack, Discord, or Teams notifications, use the dedicated integrations instead, as they format messages correctly for those platforms
Webhooks are available on all plans

Related Articles

Full API Reference - Interactive OpenAPI reference for the complete webhook payload schema and all related API endpoints
API and Webhooks for Custom Integrations - Authentication and endpoint overview



Email Notifications for Website Change Detection
2026-05-06T14:16:21+00:00
Email Notifications for Website Change Detection
Email is the default notification channel in PageCrawl. It is enabled on all plans and requires no additional setup. As soon as you add a page to monitor, you will receive email notifications whenever changes are detected.
What's Included in Email Notifications
Every change notification email includes:

AI summary - A plain-language explanation of what changed on the page
Priority score - An importance score from 0 to 100 so you can quickly assess relevance
Text diff with highlighting - Changed content is highlighted so you can see exactly what was added, removed, or modified
Keyword matches - If you have keyword rules configured, matching keywords are highlighted in the notification

Email Attachments
Email notifications can include several attachments to give you a complete picture of the change:

Screenshot - A full-page screenshot of the page at the time of the change (enabled by default)
Visual diff screenshot - A side-by-side or overlay comparison showing visual differences
Text diff image - A rendered image of the text diff for easy sharing
Text file - A plain text file containing the diff content

You can configure which attachments are included at Settings > Workspace > Notifications.
Additional Recipients
On paid plans, you can add additional recipients to your change notifications:

CC - Add email addresses to receive a copy of every notification
BCC - Add email addresses to receive a blind copy

This is useful for keeping team members, clients, or stakeholders informed without requiring them to have a PageCrawl account.
Notification Frequency
Email notification frequency is configured at Settings > Workspace > Alerts & Reports. You have two options:

Off - Email notifications are disabled
Send instant notification for every change - You receive an email as soon as a change is detected

Scheduled Summary Reports
If you prefer to receive changes in batches (daily, weekly, or monthly digests), use the Scheduled Summary Reports feature. You can find it under the Digests tab in the same Settings > Workspace > Alerts & Reports area. Scheduled summary reports let you bundle changes across multiple monitors into a single consolidated email delivered on the schedule you choose.
Diff Display Options
You can customize how text differences are displayed in your email notifications:

Highlight mode - Choose between highlighting by lines, by words, or both
Content filter - Show everything, changed content only, added content only, or removed content only

These options let you focus on the type of changes that matter most to you.
Domain-Based Grouping
When you are monitoring 5 or more pages on the same domain, PageCrawl can group notifications by domain. To enable this, turn on the Group emails by domain setting in your workspace notification preferences. Once enabled, changes from the same domain are bundled into a single email, keeping your inbox organized and making it easier to review related changes together.
AI Feedback
Each email notification includes feedback links that let you mark a change as Important or Noise. PageCrawl's AI learns from your feedback and uses it to improve future importance scoring, so over time you receive fewer irrelevant notifications.
Other Supported Notification Channels
PageCrawl supports several other notification channels to suit your preferences:

Slack notifications
Discord notifications
Microsoft Teams notifications
Telegram notifications
Webhook integration
Zapier integration



Notification Conditions and Filters
2026-05-06T14:16:21+00:00
Notification Conditions and Filters
Conditions and Filters let you control which changes trigger notifications on a per-page basis. Instead of receiving a notification for every detected change, you can define rules so that only meaningful changes are reported.
When adding a page, the simple setup mode offers common conditions directly for price, number, and selected area tracking (such as price thresholds, percentage change alerts, and keyword monitoring). For the full set of conditions described below, click "More options" to switch to Advanced Mode. When editing an existing page, toggle Advanced Mode on. In both cases, you will find the Conditions & Filters section.
How to Enable Conditions
In the page editor, switch to Advanced Mode (click "More options" when adding a new page, or toggle "Advanced Mode" when editing an existing page). Look for the Conditions & Filters section with the description: "Looking for specific changes or alerts for certain keywords? Customize conditions to minimize unnecessary change alerts."
Toggle the switch on to enable conditions. Once enabled, you can add one or more conditions by clicking the Add Condition button.
AND / OR Logic
When you have multiple conditions, you can choose how they are evaluated using the Match all conditions toggle:

On (AND) - All conditions must be met for the notification to trigger
Off (OR) - Any single condition being met will trigger the notification

This lets you build precise rules. For example, with AND logic you could require that a specific keyword appeared AND a price dropped below a threshold.
Always Record Change Detections
By default, when conditions are not met, the change detection is not recorded and no notification is sent. This means the next check compares against the last version that did meet conditions.
Enable Always record change detections to record every change regardless of whether conditions are met, but only send notifications when conditions match. This is particularly useful with one-directional conditions like "Keyword appeared" or "Keyword disappeared", where skipping unmatched detections could cause the condition to never trigger again.
Most Common Condition
Keyword Appeared or Disappeared
The most commonly used condition. It triggers a notification only when a specific keyword is added to or removed from the page.
Enter one or more keywords (each keyword is a separate tag). The condition is met when any of the specified keywords appear in newly added text or disappear from removed text.
Match mode options control how keywords are compared against the page text:



Match Mode
Case Sensitive
Whole Word
Example: keyword "assist"




Match any text (default)
No
No
Matches "assist", "Assist", "assistance", "ASSISTANT"


Match any text (case sensitive)
Yes
No
Matches "assist", "assistance" but not "Assist"


Match exact words only
No
Yes
Matches "assist", "ASSIST" but not "assistance"


Match exact words (case sensitive)
Yes
Yes
Matches only "assist" exactly



Filters
Filters remove noise by excluding certain types of changes from triggering notifications.
Ignore Text
Exclude specific words, sentences, or patterns from change detection. Place each entry on a separate line. This is useful for text that changes frequently but is not relevant, like timestamps, cookie banners, or dynamic counters.
Supported patterns:

Exact text - Enter the exact text to ignore (e.g., This website uses cookies)
Wildcard (%) - Use % to match any text within a line. For example, %Published at% will ignore any line containing "Published at", such as "Published at: 2024-12-24 by John"
Regular expressions - Wrap patterns in forward slashes for regex matching (e.g., /custom-regex-pattern-\d+/). Requires a paid plan.

Note: If the ignored text line is replaced with a new line that is not in the filter, the change detection will still trigger.
Ignore Numbers
Prevents any numeric changes on the page from triggering change detections. Useful when pages contain counters, view counts, or other dynamic numbers that are not relevant to you.
Text Conditions
These conditions let you control notifications based on specific text content. They are available for text-based tracked elements (not visual elements).
Keyword Appeared
Triggers when a keyword is added to the page. Unlike "Keyword appeared or disappeared", this will not notify you when a keyword is removed.
Important: If "Always record change detections" is not enabled, using this condition alone can cause missed detections. When the keyword is not found, no change is recorded, so the comparison baseline never updates. We recommend using "Keyword appeared or disappeared" instead, or enabling "Always record change detections".
Keyword Disappeared
Triggers when a keyword is removed from the page. The condition compares the current check with the previous one and fires if the keyword was present before but is now gone.
The same warning about "Always record change detections" applies here.
Exact Match
Available for individual tracked elements (not full page monitors). The condition is met when the element's text matches the specified value exactly.
Doesn't Match
Available for individual tracked elements (not full page monitors). The condition is met when the element's text does not match the specified value exactly.
Text Exists
The condition is met when the tracked element's text contains any of the specified keywords. Best used in combination with other conditions, for example: "the page must always contain the text 'Welcome' AND a keyword appeared." If you only need to know when text is added or removed, use "Keyword appeared or disappeared" instead.
Text Doesn't Exist
The condition is met when the tracked element's text does not contain any of the specified keywords. Useful for combined conditions like "the page does not contain 'Website failed to load' AND a change was detected." If you only need to know when text is added or removed, use "Keyword appeared or disappeared" instead.
Number and Price Conditions
These conditions are only available for "Number" and "Price detect" tracked elements. They allow you to set thresholds and track numeric changes with precision.
Comparison Conditions



Condition
Description
Example




Greater than
Triggers when the number exceeds the specified value
Value is 150, triggers when number > 150


Greater than or equals
Triggers when the number is at or above the specified value
Value is 150, triggers when number >= 150


Less than
Triggers when the number drops below the specified value
Value is 50, triggers when number < 50


Less than or equals
Triggers when the number is at or below the specified value
Value is 50, triggers when number <= 50



Change-Based Conditions
These conditions compare the current value against the previous value to detect significant changes.



Condition
Description
Example




Increased or Decreased by at least x percent
Triggers when the number changes in either direction by at least x%.
Value is 10, x is 20%. Triggers when value becomes 12+ or 8 or less.


Increased or Decreased by at least x
Triggers when the number changes in either direction by at least x (absolute).
Value is 10, x is 5. Triggers when value becomes 15+ or 5 or less.


Increased by at least x percent
Triggers only when the number goes up by at least x%.
Value is 10, x is 20%. Triggers when value becomes 12 or more.


Increased by at least x
Triggers only when the number goes up by at least x (absolute).
Value is 10, x is 5. Triggers when value becomes 15 or more.


Decreased by at least x percent
Triggers only when the number goes down by at least x%.
Value is 10, x is 20%. Triggers when value becomes 8 or less.


Decreased by at least x
Triggers only when the number goes down by at least x (absolute).
Value is 10, x is 5. Triggers when value becomes 5 or less.



Feed Conditions
When a monitor uses the Feed tracking mode (RSS, Atom, or other feed formats), additional feed-specific conditions become available:



Condition
Description




Feed item added
Triggers when a new item appears in the feed


Feed item removed
Triggers when an item is removed from the feed


Feed item changed
Triggers when an existing feed item's content is modified


Feed order changed
Triggers when the order of items in the feed changes


Feed price changed
Triggers when a price value within a feed item changes



These conditions are only shown when the monitor is configured with a Feed tracking mode.
Comparison Conditions
When a monitor belongs to a product comparison group, additional comparison conditions become available:



Condition
Description




Cheapest in group
Triggers when this monitor's price becomes the lowest in the comparison group


Most expensive in group
Triggers when this monitor's price becomes the highest in the comparison group


Price spread
Triggers based on the price difference between the cheapest and most expensive items in the group



These conditions are only shown when the monitor is part of a product comparison group.
Practical Examples
Price drop alert: Monitor a product price with a "Number" tracked element. Add a "Less than" condition with your target price. You will only be notified when the price falls below your threshold.
Stock availability: Monitor an "In Stock" label with a "Keyword appeared or disappeared" condition. Set the keyword to "Out of Stock" to get notified the moment availability changes.
Ignore cookie banners: Add an "Ignore text" filter with entries like This website uses cookies and Accept all cookies to prevent cookie consent changes from triggering notifications.
Significant price changes only: Use "Increased or Decreased by at least x percent" with a value of 10 to only be notified when a price changes by 10% or more, filtering out minor fluctuations.
Combined conditions: Monitor a product page with AND logic: "Keyword appeared" for "Sale" combined with "Less than" 100 on the price element. You will only be notified when the product goes on sale AND the price drops below 100.


Web Push Notifications for Instant Website Change Alerts
2026-05-06T14:16:21+00:00
Web Push Notifications for Instant Website Change Alerts
Web push notifications deliver instant alerts directly to your browser when PageCrawl detects a change on your monitored pages. No extra apps, no browser extensions, and no webhook configuration needed.
How It Works
When a monitored page changes, PageCrawl sends a native browser notification to all your subscribed devices. You'll see the notification even when PageCrawl.io isn't open in your browser.
If AI summarization is enabled for the page, the notification includes a brief summary explaining what changed, so you can decide at a glance whether to investigate.
Setting Up Push Notifications

Go to Settings > Account Settings
Click Enable Push Notifications
Accept the browser permission prompt

That's it. Notifications start immediately.
Managing Devices
You can subscribe on multiple devices (desktop, laptop, phone, tablet). Each device receives notifications independently. To manage your subscribed devices:

Go to Settings > Account Settings
View your subscribed devices under Push Notifications
Remove old devices or send a test notification to verify the setup

Supported Browsers



Browser
Desktop
Mobile




Chrome
Yes
Yes (Android)


Firefox
Yes
Yes (Android)


Edge
Yes
-


Safari 16+
Yes (macOS)
Yes (iOS)



Push Notifications vs. Other Channels



Channel
Setup
Speed
Best For




Web Push
None
Instant
Personal monitoring, time-sensitive changes


Email
None
Minutes
Searchable archive, batch review


Slack
Webhook URL
Instant
Team collaboration


Discord
Webhook URL
Instant
Community monitoring


Teams
Webhook URL
Instant
Enterprise environments


Telegram
Chat ID
Instant
Mobile-first users



Combining Channels
You can use push notifications alongside other channels. A common setup:

Push for urgent, time-sensitive alerts (price drops, restocks)
Email for a searchable archive of all changes
Slack/Teams for changes that need team discussion

Configure different notification channels per page in the page settings.


Compare Product Prices Across Multiple Retailers
2026-05-06T14:16:21+00:00
Compare Product Prices Across Multiple Retailers
PageCrawl can automatically group monitors that track the same product on different websites, giving you a real-time view of how prices compare across retailers. When the competitive landscape shifts, you can get alerts and export comparison spreadsheets.
What You Can Do



Capability
Description




Side-by-side pricing
See all retailer prices for a product in one place via the Matched Pages panel


Comparison alerts
Get notified when a price becomes the cheapest, most expensive, or when the spread exceeds a threshold


Cross-retailer export
Download a spreadsheet with one row per product and columns per retailer


Smart suggestions
When linking monitors, PageCrawl suggests the most relevant candidates


Automatic grouping
Monitors are grouped automatically when product identifiers match


Reference labels
Manually group monitors using labels with a shared prefix


Google Sheets integration
Include comparison data and label-based columns in automated Google Sheets exports



How Products Are Grouped
PageCrawl uses multiple signals to determine whether two monitors on different websites track the same product. When a match is found, the monitors are placed into a comparison group automatically.
Matching happens after each page check and when labels are updated. If the same product is listed on five different retailer websites and each monitor is set up with price tracking, PageCrawl will link all five into a single group.
You can also group monitors manually from the comparison panel on any monitor's detail page, or by applying reference labels (covered below).
Each comparison group can contain up to 20 monitors.
The Matched Pages Panel
When a monitor belongs to a comparison group, its detail page shows a Matched Pages panel. This panel displays:

The name and domain of each grouped monitor
The current tracked value (typically a price) for each
Quick navigation links to each compared monitor

From this panel you can:

Add monitors - Search for other monitors to add to the group
Remove monitors - Detach a specific monitor from the group
View suggestions - See PageCrawl's recommended matches based on product signals

Smart Suggestions
When adding monitors to a comparison group, PageCrawl ranks candidates by relevance. Suggestions consider multiple factors including product identifiers, reference labels, folder grouping, domain similarity, and name overlap.
If the product comparison feature is enabled, suggestions are enhanced with stronger signals from product identifiers and reference labels. Without the feature enabled, suggestions still work but rely on name and structural similarity only.
You can also type in the search box to filter across all monitors in your workspace.
Comparison Alerts
Comparison alerts notify you when a monitor's price changes its competitive position within the group. There are three alert types:



Alert Type
When It Fires
Configuration




Cheapest
This monitor's price is the lowest in the group
No additional configuration needed


Most Expensive
This monitor's price is the highest in the group
No additional configuration needed


Price Spread
The gap between the lowest and highest price in the group exceeds a percentage
Set the spread threshold percentage



How Alerts Work
Alerts are transition-based. You receive a notification when the state changes (e.g., a monitor becomes the cheapest), but not on every subsequent check where it remains the cheapest. When the condition clears, the alert resets and can fire again later.
For example, if Monitor A is tracking a laptop at $999 and becomes the cheapest in a group of five retailers:

You receive a notification: "Laptop X is now the cheapest at $999 (range: $999 - $1,299)"
On subsequent checks, as long as Monitor A remains the cheapest, no new notification is sent
If another retailer drops to $949, Monitor A is no longer the cheapest and the alert clears
If Monitor A drops to $929 and becomes cheapest again, you receive a new notification

Price Spread alerts work similarly. If you set a 20% threshold and the spread increases from 15% to 25%, you receive a notification. The alert clears when the spread drops below 20%.
Setting Up Comparison Alerts

Open the monitor's settings (edit page)
Scroll to Comparison Alerts
Add a new rule and select one of the comparison alert types
For Price Spread, enter the percentage threshold (e.g., 25 for a 25% spread)
Save your changes

Comparison alerts are evaluated after every page check, using the most recent values from all group members. Alerts are delivered through your configured notification channels (email, Slack, Discord, Teams, Telegram, webhooks).
Cross-Retailer Export
Export a comparison spreadsheet to analyze all your grouped products and their prices in a single file.
How to Export

Select the pages you want to include from your page list
Click Export from the bulk actions toolbar
Choose Comparison as the export type
Download the XLSX spreadsheet

What the Export Contains



Column
Description




Product
Product name from page metadata, or monitor name as fallback


GTIN
Global Trade Item Number if detected


SKU
Stock Keeping Unit if detected


Brand
Product brand if detected


[retailer domain]
One column per unique retailer domain, containing the current tracked value



Each row represents one comparison group. If a group has members on amazon.com, bestbuy.com, and walmart.com, the spreadsheet will have three retailer columns.
If the same retailer domain appears more than once in a group (e.g., two product variants on the same site), the column headers are disambiguated with the monitor name.
Only monitors that belong to a comparison group are included in the export. Ungrouped monitors are excluded.
Reference Labels
Reference labels provide a way to manually group monitors using a label prefix. This is useful when automatic matching is not sufficient, or when you want to define your own product identifiers.
How Reference Labels Work
Apply a label with a specific prefix to monitors that track the same product. For example:



Monitor
Label




Laptop X on Amazon
ref:LAPTOP-X-2024


Laptop X on Best Buy
ref:LAPTOP-X-2024


Laptop X on Walmart
ref:LAPTOP-X-2024



All three monitors share the label ref:LAPTOP-X-2024, so PageCrawl groups them together.
The default prefix is ref, but you can change it in your workspace settings.
Applying Reference Labels
You can apply reference labels in several ways:

Single page: Edit the page and add a label in the format prefix:value
Bulk edit: Select multiple pages, click Bulk Edit, and apply the label to all at once
API: Use the tag management API to programmatically assign labels

When a reference label is added or changed, PageCrawl automatically re-evaluates comparison groups.
Tag Prefix Columns
Tag prefix columns turn label prefixes into structured data columns available in exports and Google Sheets integrations.
Configuration

Go to Settings > Workspace > Tag Prefix Columns
Add the prefixes you want as columns (e.g., sku, brand, ref)
Optionally change the Comparison Prefix (the prefix used for product grouping)
Save




Setting
Description




Prefix Columns
List of prefixes to expose as export/Google Sheets columns (max 10)


Comparison Prefix
The prefix used for product comparison grouping (default: ref)



Using Tag Prefix Columns in Exports
Once configured, tag prefix columns appear as available columns in your Excel and Google Sheets export settings alongside the built-in columns (name, URL, current value, etc.).
For example, if you configure prefixes sku and brand:

A monitor with labels sku:WGT-500 and brand:Acme will show WGT-500 in the SKU column and Acme in the Brand column
Columns appear as tag_sku and tag_brand in column configuration

Changing the Comparison Prefix
When you change the comparison prefix (e.g., from ref to group), PageCrawl automatically re-evaluates groups for monitors that have labels with the new prefix. Existing groups built from product identifiers are not affected.
Note: Prefix names must be lowercase alphanumeric characters or underscores, with a maximum length of 50 characters.
Discovered Pages and Product Matching
When Page Discovery finds new pages and product comparison is enabled, PageCrawl checks whether the discovered page matches an existing monitored product. If a match is found, the discovered page shows the matched product's name and domain, helping you decide whether to add it to monitoring.
This is particularly useful for automatically finding the same product on newly discovered retailer pages.
Best Practices
Start with Price Tracking
Product comparison works best with monitors using price or number tracking modes, since these produce numeric values that can be compared. Full-page text monitors will appear in groups but cannot generate comparison alerts.
Use Consistent Reference Labels
If you manage a large catalog, establish a naming convention for reference labels. Using the same internal product ID across all retailers (e.g., ref:INTERNAL-SKU-001) ensures consistent grouping.
Combine Automatic and Manual Grouping
Let automatic matching handle the initial grouping, then review and adjust using reference labels for any products that were not matched correctly. Automatic and manual matching work together and complement each other.
Set Up Alerts Selectively
Rather than adding comparison alerts to every monitor, focus on the products where competitive pricing matters most. This keeps your notifications actionable and avoids alert fatigue.
Use Cross-Retailer Export for Reporting
Schedule regular exports to track pricing trends over time. Combined with Google Sheets integration, you can build dashboards that update automatically.
Limits



Limit
Value




Max group size
20 monitors per comparison group


Max prefix columns
10 per workspace


Prefix name length
50 characters



Requirements
Product comparison is available as a team-level add-on. Contact support or your account manager to enable it for your account.
Related Articles

Bulk Edit - Export and manage multiple pages at once
Labels, Folders & Workspaces - Organize your monitored pages
Page Discovery - Automatically discover new pages to track
AI Change Detection - AI-powered summaries and importance scoring
Advanced Configuration - Templates, tracked elements, and power user settings



Premium Residential Proxies
2026-03-31T08:36:40+00:00
Premium Residential Proxies
Premium residential proxies let you monitor websites that block standard datacenter IP addresses. They use real residential internet connections from 200+ countries, making your monitoring checks appear as regular user traffic.
When Do You Need Residential Proxies?
Most websites work fine with the datacenter proxies already included in every PageCrawl plan. You only need residential proxies if:

A website actively blocks datacenter IPs (you see 403 errors, timeouts, or blank pages after retries)
You need to see content as it appears in a specific country, state, or city
The site uses advanced bot detection that datacenter proxies cannot bypass

Before purchasing, try these free alternatives:

Enable Stealth engine in your monitor settings. Stealth mode uses advanced techniques to bypass bot detection and works on most protected websites
Reduce your check frequency. Many blocks are triggered by frequent requests. Switching from every 15 minutes to hourly or daily often resolves the issue
Switch proxy location in your monitor settings (e.g., try London instead of New York)
Contact support for help diagnosing the issue

How Residential Proxy Bandwidth Works
Residential proxies are priced at $10/GB of data transferred. Every page check consumes bandwidth based on the page size:



Page Type
Approximate Size Per Check




Simple text page (blog, news article)
~0.5 MB


Standard e-commerce or listing page
~2 MB


Heavy page with images and scripts
~5 MB



Bandwidth never expires. You can purchase 1 GB today and use it over months.
Cost Impact of Check Frequency
Check frequency has a large impact on bandwidth consumption. The same 10 pages can cost very different amounts depending on how often you check:



Frequency
10 Pages Monthly Cost




Daily
~$10 (0.6 GB)


Hourly
~$150 (14.4 GB)


Every 15 minutes
~$570 (57.6 GB)



For most monitoring use cases, daily or hourly checks are sufficient. Only use high-frequency residential proxy checks when near real-time monitoring is essential.
How to Set Up

Go to Settings > Residential Proxies in your account
Purchase bandwidth (minimum 1 GB)
Open any monitor and change the Proxy Location to Premium Residential
Select a target country for geo-targeted monitoring
Save and trigger a check to verify it works

Geo-Targeting
When using residential proxies, you must select a target country from 200+ supported countries. This is useful for monitoring localized pricing, regional content, or geo-restricted pages.
Monitoring Your Usage

View your bandwidth balance and daily usage in Settings > Residential Proxies
Usage statistics update every 15 minutes
When your bandwidth reaches zero, monitors using residential proxies automatically fall back to datacenter proxies (your monitoring does not stop)

Availability
Premium residential proxy bandwidth is available on Enterprise and Ultimate plans. Contact us if you have questions about upgrading.
Related

Using Custom Proxies for using your own proxy servers
Cost Calculator for estimating your monthly bandwidth needs
Bulk Edit for applying proxy settings to multiple pages



Feed Tracking Mode: Structured Monitoring for RSS, Atom, and Sitemaps
2026-05-08T12:17:31+00:00
Feed Tracking Mode: Structured Monitoring for RSS, Atom, and Sitemaps
Feed tracking mode treats an RSS feed, Atom feed, or XML sitemap as a list of individual items rather than a single blob of text. Instead of "the page changed", you get "2 new posts added: [titles and links]". This matches how you actually want to consume a feed: item by item.
Looking to publish your monitored page changes as an RSS feed instead? See Monitor Page Changes via RSS Feeds, which generates a feed URL of detected changes you can plug into any RSS reader or automation tool.
When to Use Feed Tracking Mode
Pick Feed mode when the URL you are monitoring is a structured list that updates over time:

RSS and Atom feeds (/feed, /rss.xml, /atom.xml, /feeds/posts/default, /index.xml)
XML sitemaps (/sitemap.xml, /sitemap_index.xml)
GitHub release and commit Atom feeds (github.com/owner/repo/releases.atom)
Reddit subreddit feeds (reddit.com/r/subreddit/.rss)
Podcast feeds
Inventory grids and card-based HTML pages (detected via DOM pattern matching)

PageCrawl auto-detects the feed format when you paste the URL and switches to Feed mode automatically. You can also pick it manually from the tracking mode selector.
What You Get With Feed Mode
Compared to Full Page text tracking, Feed mode gives you:



Feature
Full Page Text
Feed Mode




Compares raw content
Yes
No (parses items)


Reports which items changed
No
Yes, with titles and links


Ignores reordering
No (false alerts)
Yes


Deduplicates by stable key
No
Yes (guid, id, link)


Caps item count
No
Yes (configurable limit)


Runs without a browser
Only if page is plain text
Yes, for XML feeds


Handles "No exact matches" fallbacks
No
Yes



The end result: fewer false alerts, clearer notifications, and lower monitoring cost per check.
Supported Formats
Feed tracking mode parses:

RSS 2.0 including , , , and 
RSS 1.0 / RDF including rdf:about identifiers
Atom 1.0 including  and 
XML Sitemap () and sitemap index ()
Generic repeating XML when an XML file has a list-like structure

For HTML pages like product grids, inventory lists, or news listings, Feed mode falls back to DOM pattern detection, which identifies repeated card-like elements on the page and tracks them as items.
How Detection Works
When you paste a URL into Track New Page, PageCrawl performs a content-based check:

Fetches the URL
Looks at the content type and first few bytes of the body
If it looks like XML, parses it with a namespace-aware XML parser
Identifies the feed format (RSS / Atom / Sitemap / etc.) by root element
Returns the detected format to the interface, which auto-switches to Feed mode

If the detection cannot classify the URL as an XML feed, the tracking mode stays at Full Page and you can switch to Feed manually if you want to use DOM pattern detection on an HTML page.
Item Limit
Every feed tracking element has a Track first N items cap. The default is 10 for new monitors. You can raise it up to your plan's maximum.
The limit exists for three reasons:

Avoid noise from variable-count pages. Some pages show a different number of items between checks (inventory pages, infinite-scroll feeds). Capping at a fixed count prevents fluctuations from triggering false change alerts.
Keep storage manageable. A sitemap with 50,000 URLs would create a 50,000-item JSON blob per check. The cap prevents this.
Focus on fresh content. Most of the time you care about the newest items. Tracking the first 10-20 entries is almost always enough.

How "First N" Is Decided
For RSS and Atom feeds, "first N" means the first N items in document order, which is the convention these formats use to put the newest items at the top. Reading position 0 through N-1 gives you the N most recent posts.
XML sitemaps are different. There is no convention requiring sitemaps to list new URLs first. New pages can appear anywhere in the file, including appended at the bottom. To handle this, PageCrawl sorts sitemap entries by their  date (newest first) before applying the cap, so the most recently modified URLs always win.
For sitemaps that do not include  on every URL, the dated entries are sorted first and the dateless entries fall to the bottom of the sort in their original document order. If you need to track every page on a very large sitemap regardless of modification date, use Page Discovery instead - it auto-monitors new pages as they appear without depending on the position-based cap.



Plan
Maximum Items Per Feed




Free
10


Standard
100


Enterprise
1,000


Ultimate
10,000



The default is 10 across all plans. You can raise it from the tracking mode panel any time after the monitor is created.
What Triggers a Change Alert
By default, Feed mode notifies you when items are added to the feed. You can also opt into:

Items removed – something disappeared from the feed
Content changed – an item's title or body was edited after publication
Price changed – an item's price updated (for product feeds)
Order changed – items were reordered (off by default since most feeds reorder as new items arrive)

Each item is identified by a stable key in this order: GUID → link → title. That means content changes on the same item are correctly recognized as updates, not as a new item.
Monitoring Frequency
Feed mode runs via a lightweight HTTP fetch without a browser, so you can check feeds frequently without burning through plan limits:



Feed Type
Recommended Frequency




Security advisories
Every 15 minutes


News and competitor blogs
Every 30 to 60 minutes


GitHub release feeds
Every 1 to 2 hours


Podcast feeds
Every 6 to 12 hours


Sitemaps for large sites
Every 1 to 4 hours


Low-volume blogs
Daily



Note: if you raise the frequency below 30 minutes on a browser-only feed (an HTML inventory page rather than an XML feed), PageCrawl will use the browser engine for reliability.
Common Examples
GitHub release feed:
https://github.com/owner/repo/releases.atom
WordPress blog:
https://example.com/feed/
Reddit subreddit:
https://www.reddit.com/r/webdev/.rss
Site sitemap:
https://example.com/sitemap.xml
For each of these, paste the URL into Track New Page. PageCrawl detects the format, switches to Feed mode, and shows the first 10 items as a preview before you save.
Related Articles

Monitor RSS feeds and get alerts for new content – broader guide comparing RSS monitoring approaches
Publish change alerts as an RSS feed – the inverse: generate a feed URL of detected changes for RSS readers and automation tools
Sitemap monitoring – automatically discover new pages across a website
Webhook integrations – route feed alerts to Slack, Discord, or custom automations
Reduce false positives – tune your monitors for cleaner alerts



Thumbs Up and Thumbs Down: Giving Feedback on Detected Changes
2026-04-16T09:45:39+00:00
Thumbs Up and Thumbs Down: Giving Feedback on Detected Changes
Every time PageCrawl detects a change, you can give quick feedback with the thumbs up and thumbs down buttons. This feedback helps you organize your review workflow and tells PageCrawl which changes are useful and which ones are noise.
Where to Find the Buttons
The feedback buttons appear in several places:

Page view, next to each detected change in the timeline
Review Board, when opening a change card
Email notifications, as quick-action buttons at the bottom of each change email
Slack, Discord, Microsoft Teams, and Telegram notifications, as inline action buttons next to each detected change
Browser extension, when reviewing changes on the go

You can give feedback directly from any of the notification channels above, no login required. You are taken to a short confirmation page that records the feedback, then returned to the change (or a simple confirmation screen if you are not signed in).
What Happens When You Press Thumbs Up
Pressing thumbs up flags the change as important or useful. This tells PageCrawl:

The change is the kind of update you want to be notified about
Similar changes on this page should continue to be surfaced
The change has been reviewed, so it is marked as seen automatically

If your workspace has feedback auto-review enabled, the change card also moves from "To Review" to your chosen destination lane on the Review Board (for example, a "Reviewed" or "Important" lane). You can configure which lane thumbs-up feedback moves cards to from the Review Board settings.
What Happens When You Press Thumbs Down
Pressing thumbs down flags the change as noise or irrelevant. This does several things:

The change is marked as seen so it no longer counts as unread
PageCrawl learns from the feedback and may automatically filter similar irrelevant changes on the same page in the future
You may be offered a suggested action to prevent this type of change from triggering alerts again. Depending on what changed, you might see:
"Ignore numbers" if the change was only numeric (view counts, stock tiers, price variants that do not matter to you)
"Remove dates" if the change was a date or timestamp update
"Ignore this text" if a specific phrase repeatedly appears in the diff


The card moves to your configured "noise" lane on the Review Board, if feedback auto-review is enabled

You can accept or dismiss the suggested action. Accepting it applies the filter so similar changes are automatically filtered on future checks.
Inverse Pattern Warning
If you press thumbs down on a change involving state-toggle text (for example, "in stock" to "out of stock", "available" to "unavailable", "open" to "closed"), PageCrawl shows a warning. This is because telling the system to ignore a change in one direction could cause it to also ignore the reverse change, which is often something you actually want to be alerted about. Read the warning carefully before confirming.
When Should You Press Thumbs Up?
Press thumbs up when:

The detected change is exactly the kind of update you set up this monitor for
You want to confirm that a pricing, availability, or content change was correctly caught
You want to keep a record of meaningful changes in your "Important" or "Reviewed" lane
You want to train PageCrawl to continue surfacing this type of change

Examples:

A competitor dropped their price from $49 to $39
A job listing you were tracking has been posted
A terms-of-service page added a new clause
A product page switched from "Out of stock" to "In stock"

When Should You Press Thumbs Down?
Press thumbs down when:

The change is not relevant to your monitoring goal
The detected text is noise, like a timestamp, view counter, random tagline, or rotating banner
The same type of irrelevant change keeps triggering alerts
You want to train PageCrawl to filter out similar changes on future checks

Examples:

The page says "Last updated 3 minutes ago" and that timestamp keeps changing
A "Users online: 1,234" counter triggered the alert
A rotating testimonial or hero image caption changed
A footer copyright year was updated
A "Trending now" section showed a different product

Press thumbs down even if the change is minor. Over time, consistent feedback makes your monitors much quieter and more precise.
When Should You Not Press Either?
If a change is neutral (neither clearly useful nor clearly noise), you can leave it without feedback and simply mark it as reviewed. Feedback is not mandatory. Only use it when you have a clear opinion, because consistent signals produce better filtering than mixed ones.
Clearing Feedback
If you change your mind, reopen the change and press the same button again to clear the flag, or press the opposite button to overwrite the previous feedback. Clearing feedback does not automatically remove any filters that were added as a result of it. Those filters are managed separately under the page's actions and ignore rules.
Tips for Better Results

Be consistent. The more feedback you give, the faster PageCrawl learns what you care about.
Accept suggested actions when they look right. A single "Ignore numbers" or "Remove dates" action can eliminate most repeat false positives on a page.
Configure auto-review lanes on the Review Board so feedback also organizes your workflow, not just your filtering.
Use feedback from notification channels (email, Slack, Discord, Teams, Telegram) when you are away from the app. They work with no login required.
Review your filters periodically. Feedback-driven filters live alongside the page's other actions and ignore rules, and can be edited or removed any time.

Related

Review Board for organizing changes into lanes based on feedback
Reducing False Positives for a complete guide to quieter monitors
AI-Powered Change Detection for how AI priority scores work alongside your feedback



API Quick Start: Monitor Your First Page in 60 Seconds
2026-05-06T14:16:21+00:00
API Quick Start: Monitor Your First Page in 60 Seconds
This guide walks you through creating your first monitor and webhook using the PageCrawl API. By the end, you will have a page being monitored with change notifications delivered to your endpoint.
API access requires a paid plan (Standard or above). Get your API token from Settings > API.
Step 1: Get Your API Token
Go to Settings > API > API Tokens and click Create Token. Copy the token immediately (it will not be shown again).
All API requests use this token in the Authorization header:
Authorization: Bearer YOUR_API_TOKEN
For the full API reference with all endpoints and schemas, visit pagecrawl.io/developers.
Step 2: Create a Monitor
The simplest way to start monitoring is the /api/track-simple endpoint. It only requires a URL.
curl
curl -X POST "https://pagecrawl.io/api/track-simple" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/pricing",
    "tracking_mode": "fullpage",
    "frequency": 60
  }'
Python
import requests

response = requests.post(
    "https://pagecrawl.io/api/track-simple",
    headers={"Authorization": "Bearer YOUR_API_TOKEN"},
    json={
        "url": "https://example.com/pricing",
        "tracking_mode": "fullpage",
        "frequency": 60,
    },
)

page = response.json()
print(f"Monitoring: {page['name']} (ID: {page['id']})")
Node.js
const response = await fetch("https://pagecrawl.io/api/track-simple", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com/pricing",
    tracking_mode: "fullpage",
    frequency: 60,
  }),
});

const page = await response.json();
console.log(`Monitoring: ${page.name} (ID: ${page.id})`);
Tracking modes:

fullpage - all visible text (default)
content_only - text without navigation, headers, footers
reader - reader mode content only
price - auto-detect and track prices
specific_text - specific element (requires selector)
specific_number - numeric value from element (requires selector)

Frequency is in minutes. Use 1440 for daily, 60 for hourly, 15 for every 15 minutes (depends on your plan).
Step 3: Set Up a Webhook
Create a webhook to receive notifications when changes are detected.
curl
curl -X POST "https://pagecrawl.io/api/hooks" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "target_url": "https://your-server.com/webhook",
    "match_type": "all",
    "events": ["change_detected"]
  }'
Python
response = requests.post(
    "https://pagecrawl.io/api/hooks",
    headers={"Authorization": "Bearer YOUR_API_TOKEN"},
    json={
        "target_url": "https://your-server.com/webhook",
        "match_type": "all",
        "events": ["change_detected"],
    },
)

hook = response.json()
print(f"Webhook created (ID: {hook['id']})")
Match types: all (every page), monitors (specific pages), tags (by tag), folders (by folder), domains (by domain).
Events: change_detected, error, price_change_detected.
Step 4: Handle Webhook Payloads
When a change is detected, PageCrawl POSTs a JSON payload to your webhook URL. Here is an example handler:
Python (Flask)
from flask import Flask, request

app = Flask(__name__)

@app.route("/webhook", methods=["POST"])
def handle_change():
    data = request.json

    print(f"Change detected: {data['title']}")
    print(f"New content: {data['contents']}")
    print(f"Difference: {data['human_difference']}")

    if data.get("ai_summary"):
        print(f"AI Summary: {data['ai_summary']}")

    return "", 200
Node.js (Express)
app.post("/webhook", (req, res) => {
  const data = req.body;

  console.log(`Change detected: ${data.title}`);
  console.log(`New content: ${data.contents}`);
  console.log(`Difference: ${data.human_difference}`);

  if (data.ai_summary) {
    console.log(`AI Summary: ${data.ai_summary}`);
  }

  res.sendStatus(200);
});
Key payload fields:

title - Page name
contents - Current value of the tracked element
difference - Text difference percentage (0-100)
human_difference - Human-readable change description
ai_summary - AI-generated plain-language summary of the change
ai_priority_score - 0-100 importance score
markdown_difference - Change diff in markdown format
page_screenshot_image - Signed URL to the page screenshot

You can customize which fields are included when creating the webhook via the payload_fields parameter.
Other Useful Endpoints



Endpoint
Description




GET /api/pages
List all monitored pages


GET /api/pages/{id}
Get page details and latest values


PUT /api/pages/{id}
Update page settings


DELETE /api/pages/{id}
Delete a page


PUT /api/pages/{id}/check
Trigger an immediate check


PUT /api/pages/{id}/status
Enable or disable monitoring


GET /api/pages/{id}/history
Get full check history


GET /api/pages/{id}/checks/{checkId}/diff.markdown
Get text diff as markdown



Download the OpenAPI Spec
The full API specification is available as an OpenAPI 3.0 YAML file. Import it into Postman, Insomnia, or any API client:
https://pagecrawl.io/api/openapi.yaml
Related Articles

API & Webhooks overview
Webhook Integration guide
AI Assistants (MCP Server)
Full API Reference



Scheduled Reports - Bundle Change Notifications Into Digests
2026-05-06T14:16:21+00:00
Scheduled Reports - Bundle Change Notifications Into Digests
Scheduled reports let you group monitors together and receive a single digest summarizing all detected changes on a schedule you choose. Instead of getting an instant notification for every change, you get one consolidated report covering everything that happened since the last digest.
This is especially useful when you monitor many pages and want to review changes in batches rather than reacting to each one individually.
When to Use Reports vs Instant Notifications



Scenario
Recommended




Monitoring a handful of critical pages
Instant notifications


Tracking 50+ competitor pages for pricing
Scheduled report (daily or weekly)


Legal/compliance pages that rarely change
Scheduled report (weekly or monthly)


Stock availability that needs immediate action
Instant notifications with escalation


Executive stakeholder updates
Scheduled report with AI summary



You can mix both approaches. Monitors that are not assigned to any report continue to send instant notifications as usual. Monitors assigned to a report will only appear in digests (unless escalation is configured for urgent changes).
Creating a Report
Go to Settings > Workspace > Alerts & Reports and select the Scheduled Summary Reports tab. Click Add Report to configure:
Name - Give your report a descriptive name, such as "Weekly competitor pricing" or "Daily legal page updates."
Include changes from - Choose which monitors to include:

All monitors - Every monitor in the workspace
By tag - Monitors with specific tags (useful for grouping by category, client, or project)
By folder - Monitors in specific folders
By website - Monitors grouped by their website domain
Specific monitors - Hand-pick individual monitors by name or URL

Schedule - How often the digest is generated and sent:

Daily - Every day at your chosen hour
Weekdays only - Monday through Friday
Weekly - On a specific day of the week
Monthly - On a specific day of the month
On-demand only - Only generated when you manually click "Generate now"

All times are based on your workspace timezone, which you can set in Settings > Workspace > General.
Delivery Channels
Each report can be delivered through one or more channels:

Email - Select team members and/or verified email addresses as recipients. You can add CC and BCC recipients for stakeholders who need a copy.
Slack - Enter a webhook URL or leave blank to use your workspace default
Discord - Enter a webhook URL or leave blank to use your workspace default
Microsoft Teams - Enter a webhook URL or leave blank to use your workspace default
Telegram - Enter a chat ID or leave blank to use your workspace default

Content Filters
Control which changes appear in each digest:
Minimum importance - Every change is assigned an importance level based on how significant it is. You can filter each report to only include changes above a certain threshold:

All changes - Everything detected
Medium and up - Skips trivial edits like whitespace or date stamps
Important and up - Only notable changes like price drops, content rewrites
Critical only - Only major changes like large price swings, availability shifts
Custom - Set your own threshold

Show only most recent change per monitor - When a monitor detects multiple changes between digests, only the latest one is shown. This keeps reports concise.
Group by domain - Groups changes by website domain, useful when monitoring pages across many different sites.
AI Executive Summary
When enabled, each digest includes a short AI-written paragraph at the top summarizing the most important changes across all included monitors. This lets you scan the digest quickly without reading every individual change.
You can choose from several summary styles depending on how much detail you want, ranging from a single headline to a full multi-paragraph briefing. Some advanced styles are available on higher-tier plans.
Priority Escalation
Reports batch notifications by design, but some changes may need immediate attention. Priority escalation lets you bypass the schedule for high-priority changes.
When enabled, any change with a priority score above your escalation threshold is sent immediately through the escalation channels you configure. These can be different from your regular delivery channels. For example, you might receive daily email digests but get Slack alerts immediately when something critical happens.
Scoring is automatic. Larger, more meaningful changes (like significant price drops or availability shifts) score higher than minor edits. You don't need to configure scoring - it works out of the box for all monitor types.
Shareable Digest Links
Every generated digest gets a unique shareable link that works without requiring a PageCrawl account. You can share this link with anyone who needs to see the report.
Share links expire after 30 days by default. From the digest history, you can:

Rotate the link (generates a new URL, invalidating the old one)
Revoke the link (disables access immediately)
Refresh the expiration (extends it another 30 days)

Exporting Digests
Each digest can be exported as:

PDF - Formatted report suitable for printing or archiving
Excel - Spreadsheet with columns for date, group, monitor name, URL, priority, and AI summary
CSV - Same data as Excel in CSV format

How Reports Interact with Instant Notifications
When a monitor is assigned to any scheduled report, its instant workspace-level notifications (email, Slack, Discord, etc.) are bypassed. Changes are collected and delivered in the next digest instead.
The exceptions:

Escalation alerts still fire immediately when a change exceeds the escalation threshold
Public subscriber notifications (for publicly shared monitors) are unaffected

If you delete or disable a report, the monitors it covered go back to receiving instant notifications automatically.
Plan Limits
Standard plans include up to 2 reports. Higher-tier plans include unlimited reports with additional features like on-demand generation. If you downgrade your plan, excess reports are automatically paused and you receive an email listing which ones were affected.


Add Pages to PageCrawl from Android
2026-05-06T14:16:21+00:00
Add Pages to PageCrawl from Android
What Is This?
Install PageCrawl on your Android phone like any other app, then share any webpage to it directly from Chrome's share menu. The URL pre-fills automatically, so you can start monitoring in two taps.
Step 1: Install PageCrawl on Your Phone
PageCrawl works as a Progressive Web App (PWA) on Android. Once installed, it gets its own icon, opens without a browser bar, and shows up in the Android share sheet alongside apps like WhatsApp, Gmail, and Twitter.
To install:

Open Chrome on your Android phone
Visit PageCrawl.io and sign in
Tap the menu (three dots in the top right)
Tap Install app (or Add to Home screen)
Confirm the install

PageCrawl will now appear on your home screen and in your app drawer.
Step 2: Share a Page to PageCrawl
Open any page you want to monitor, in any app that has a Share button (Chrome, Firefox, news readers, X, Reddit, etc.):

Tap the Share button
Tap PageCrawl in the share sheet
PageCrawl opens with the URL pre-filled
Adjust the monitoring options and tap Save

That's it. No copy-pasting URLs.

  

Why You Need to Install First
The Android share sheet only lists installed apps. Until you install PageCrawl as a PWA, it won't appear as a share target. Once installed, it works like any native app for sharing.
Sharing Without Installing
If you don't want to install the app, you can still add pages quickly using our bookmarklet. Add it to your Chrome bookmarks, then tap it when you're on a page you want to monitor.
Tips

Pin to favorites: Long-press PageCrawl in the share sheet and pin it so it appears at the top.
Stay signed in: Sign in once and stay signed in so shared pages open straight to the monitor setup screen.
Works from any app: Any Android app that exposes a Share button can send to PageCrawl, not just Chrome.

Using an iPhone Instead?
iOS doesn't support installing apps as system share targets the same way Android does. iPhone users should follow our iOS Safari shortcut guide instead.


Packaging a PageCrawl Audit Trail for a Regulator
2026-05-08T12:17:31+00:00
Packaging a PageCrawl Audit Trail for a Regulator
When a regulator requests evidence of a public-facing webpage at a specific point in time, they expect three things: the archive itself, proof that it existed at that time, and a chain of custody that they can independently verify. PageCrawl produces all three by default on Ultimate plans.
This guide explains how to assemble and hand off a complete evidence bundle.
What's in a PageCrawl evidence bundle
For each tracked change, PageCrawl retains:

The WACZ archive (archive.wacz), a self-contained, replayable archive of the captured page including HTML, screenshots, and linked documents.
An embedded WACZ Auth signature inside the WACZ.
Sidecar proof files from independent providers: archive.wacz.ots (OpenTimestamps), archive.wacz.digicert.tsr (DigiCert AATL), archive.wacz.sectigo.tsr (Sectigo AATL), and on Custom plans, archive.wacz.qtsa.tsr (eIDAS qualified).
The raw underlying WARC (capture.warc) for ingestion into other archival systems.
A manifest hash and per-resource SHA-256 hashes inside the WACZ datapackage.
An access audit log recording every download, view, verify, and export of the archive.

Building an evidence bundle
From the PageCrawl dashboard, on any tracked change:

Select the checks to include in the bundle.
Click "Export evidence bundle".
PageCrawl produces a single zip containing each WACZ, every available sidecar proof, a manifest.json with per-archive integrity fingerprints, and a README.txt with verification instructions.

The bundle is portable. Hand it to the regulator on a USB stick, attach it to a regulatory submission, or share it via the customer's own secure file transfer.
The public verification page
For regulators who prefer to inspect each archive interactively, generate a public verification link from any tracked archive. The link is a signed URL that grants read-only access to a verification page. The recipient does not need a PageCrawl account.
The verification page shows:

The source URL and capture timestamp.
The manifest hash.
Every cryptographic attestation present (embedded signature plus each sidecar provider).
Download buttons for each raw proof file with verification command hints (e.g. ots verify ..., openssl ts -reply -in ...).

Anonymous access is logged in the firm's archive access log so chain of custody is preserved.
Sector-specific guidance
SEC examinations (broker-dealers, 17a-4)
Pair the evidence bundle with the firm's recordkeeping policy and the designated executive officer's attestation. The 2022 amendments to 17a-4(f) explicitly contemplate audit-trail-based tamper evidence as an alternative to WORM storage. PageCrawl's manifest hashes plus multiple independent timestamp providers satisfy the structural tamper-evidence requirement.
For relevant FRE 902(13) / 902(14) framing in a parallel litigation context, see our verification guide.
FDA 21 CFR Part 11 inspections (life sciences)
The validation summary your regulated firm maintains for the PageCrawl system should reference: the URS describing what records the system retains, the audit-trail mechanism (manifest hashes plus timestamp proofs), the retention period, and the retrieval procedure. The bundle gives the inspector everything they need to validate the claim that the firm has accurate copies and audit trail per 11.10.
HIPAA OCR investigations (healthcare)
OCR investigators typically request the version of a Notice of Privacy Practices, breach notice page, or business-associate sub-processor list as it existed on a specific date. The public verification link is the single easiest artefact to share: the investigator clicks through, sees the manifest hash, and verifies the timestamp without needing access to internal systems.
EU DPA inspections (GDPR, DORA)
For data protection authority inspections under GDPR or DORA, the eIDAS-qualified-timestamp layer (Custom plan) provides Article 41(2) statutory legal presumption of accuracy. Even without it, the OpenTimestamps Bitcoin anchor plus DigiCert and Sectigo AATL timestamps give the supervisory authority sufficient evidence of the archive's existence at the recorded time.
Why multi-provider matters
Single-provider proofs are vulnerable to the provider's lifecycle: a TSA can revoke a key, sunset a service, or be compromised. Layering attestations across an open blockchain (OpenTimestamps), AATL providers (DigiCert, Sectigo), and optionally an eIDAS QTSP gives the firm independent backups. If any one layer becomes unverifiable in the future, the others still attest. This redundancy is itself a defensive credential.
Related articles

Verifying a PageCrawl Web Archive
Sharing archives publicly
What is WACZ?



eIDAS Qualified Timestamps (Custom plan add-on)
2026-05-08T12:17:31+00:00
eIDAS Qualified Timestamps
EU Regulation 910/2014 (eIDAS) defines a class of cryptographic timestamp called a Qualified Electronic Timestamp, issued by a Qualified Trust Service Provider (QTSP) under EU member-state supervisory oversight. Under Article 41(2), a qualified timestamp carries statutory legal presumption of accuracy of the date and time it indicates and of the integrity of the data bound to it. It is the strongest evidentiary credential available under EU law for proving that a piece of digital content existed at a specific moment.
PageCrawl supports eIDAS qualified timestamping as a Custom plan add-on. This article explains what the feature provides, when you might need it, and how to enable it.
When you need it
You probably do not need eIDAS qualified timestamps if your archives are primarily for internal compliance documentation, US-court evidentiary preparation under FRE 902(13)/(14), or routine regulatory inspections. The default Ultimate-plan archive (embedded WACZ Auth signature plus OpenTimestamps Bitcoin anchor plus DigiCert and Sectigo AATL certified timestamps) covers those scenarios.
You probably do need eIDAS qualified timestamps if you are an EU regulated entity (financial institution under DORA, life sciences company subject to EMA/national-authority inspection, controller subject to GDPR DPA inspection in a member state where qualified evidence is preferred), and you anticipate that the integrity of the archive will be tested in an EU court, supervisory inspection, or formal regulatory dispute.
What you get
When the add-on is enabled for your account, every WACZ archive PageCrawl produces is also stamped with an RFC 3161 timestamp from the QTSP we have contracted with. The resulting .qtsa.tsr file is retained alongside the WACZ in the same Check directory, downloadable via the API or via the public verification page.
The proof file is a standard RFC 3161 TimeStampResp DER-encoded structure. It is verifiable with openssl ts -reply -in archive.wacz.qtsa.tsr -text and with any commercial PKI verification tool. The proof binds the WACZ's SHA-256 hash to a specific moment in time, signed by the QTSP's qualified seal.
How to enable
eIDAS qualified timestamping is provisioned manually because each customer's setup involves a per-stamp QTSP cost and may require contractual coordination on specific qualified providers depending on jurisdiction. To enable:

Contact our sales team via the contact page on the website.
We discuss your jurisdiction, anticipated stamp volume, and provider preferences.
We provision the eidas_enabled flag on your team and configure the worker fleet to route stamping requests to the chosen QTSP.
From the next detected change onward, every Ultimate-plan WACZ ships with the qualified timestamp attached.

Existing archives produced before enablement are not retroactively stamped. Re-stamping historical archives is possible on request but would incur per-stamp costs and is typically only done when a specific historical period needs upgraded evidentiary status for a known regulatory matter.
Verifying a qualified timestamp
openssl ts -reply -in archive.wacz.qtsa.tsr -text
The output describes the timestamp authority's identity, the time of stamping, and the SHA-256 hash bound to the timestamp. To complete verification, validate the QTSP's signing certificate against the EU Trusted List for the issuing jurisdiction (https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2015.235.01.0026.01.ENG). The EU Commission and member-state authorities maintain the trust lists; the QTSP's certificate must appear there for the timestamp to qualify under Article 41.
What it does not do

Qualified timestamps prove time, not author. They do not bind the archive to a specific natural or legal person. Identity binding (qualified electronic signatures, qualified electronic seals) is a separate eIDAS service and not currently part of the PageCrawl integration.
Qualified timestamps do not extend to retention obligations. The archive itself must be retained for whatever period your regulatory regime requires, by you. PageCrawl's retention is determined by your plan tier.
Qualified timestamps do not retrofit unverifiable archives. The integrity guarantee applies from the moment of stamping forward.

Related articles

Verifying a PageCrawl Web Archive
Packaging a PageCrawl Audit Trail for a Regulator



Sharing PageCrawl Archives Publicly
2026-05-08T12:17:31+00:00
Sharing PageCrawl Archives Publicly
Some audiences need to verify a PageCrawl archive without a PageCrawl account: a regulator examining your records, opposing counsel reviewing a docket capture, an auditor packaging evidence for a board, a journalist citing a primary source. PageCrawl supports this through a public verification page accessible via a signed URL.
This article explains how the link works, what the recipient sees, and how to revoke a link if needed.
Generating a public verification link
From the PageCrawl interface, on any tracked change with an archive:

Click the link icon (or open the archive information panel).
Click "Copy public verification link".
Paste the link into an email, a docket filing, an audit report, or wherever else.

The link is a Laravel signed URL. Anyone who has the link can open the verification page. Anyone who does not have the link cannot guess it, since the signature is cryptographically derived from your account's signing key.
What the recipient sees
The verification page renders without authentication. It shows:

The source URL and capture timestamp.
The archive's manifest hash (SHA-256 of the WACZ datapackage).
Every cryptographic attestation present:
The embedded WACZ Auth signature (with signing-service domain and creation time)
The OpenTimestamps Bitcoin anchor proof
The DigiCert AATL certified timestamp
The Sectigo AATL certified timestamp
The eIDAS qualified timestamp (Custom plan only, when applicable)


A download button for each raw proof file with an inline command-line verification hint (e.g. ots verify ..., openssl ts -reply -in ...).
A link to open the WACZ in ReplayWeb.page for a fully interactive replay of the captured page.

The page does not expose any other archives, settings, or account information from your workspace. Only the specific archive corresponding to the link.
Audit log of public access
Every public verification page view and every public proof download is logged in your archive access audit log with action = public_verify or action = public_download_proof_, the recipient's IP address, the user agent, and the timestamp. The log is queryable from the PageCrawl interface and via the API. Chain of custody is preserved even when the recipient is anonymous.
Link expiry and revocation
By default the signed link does not expire. The link remains valid for as long as the archive is retained and the underlying signing key is unchanged.
To revoke a previously issued link, rotate your account's signing key from the workspace security settings. All previously issued public verification links become invalid; subsequent links generated after the rotation are valid against the new key.
For situations where you want time-bound access (for example, sharing with a vendor for a specific audit window), generate a fresh link from the API with an explicit expires_at parameter. The link will reject access after the expiry timestamp.
Why this matters
A public verification link is the lightest practical way to deliver a tamper-proof archive to an external party. The recipient does not need credentials, does not need to install tooling (although they can, for offline verification), and does not need to take your word for it. The page itself shows them every cryptographic attestation, and the underlying proofs are independently verifiable by any standard tooling.
In an era when AI can fabricate any screenshot or document, the public verification page is how PageCrawl users hand off "this is what the page looked like at this moment, attested by parties we don't control" without friction.
Related articles

Verifying a PageCrawl Web Archive
Packaging a PageCrawl Audit Trail for a Regulator
What is WACZ?



Verifying a PageCrawl Web Archive
2026-05-08T12:17:31+00:00
Verifying a PageCrawl Web Archive
A PageCrawl tamper-proof web archive carries multiple cryptographic attestations that prove the archive existed at a specific moment in time and has not been altered since. Each layer can be verified independently, offline, by anyone with the proof file and standard public tooling. No PageCrawl account is required to verify.
This guide walks through verifying each layer.
What you need

The WACZ archive file (archive.wacz).
One or more sidecar proof files: archive.wacz.ots, archive.wacz.digicert.tsr, archive.wacz.sectigo.tsr, archive.wacz.qtsa.tsr (whichever providers stamped the archive at capture time).
Either the public verification page link, or these command-line tools installed locally:
openssl (most systems have it)
The OpenTimestamps client (pip install opentimestamps-client, exposes the ots command)



Verifying the embedded WACZ Auth signature
Every Ultimate-plan archive includes an embedded signedData block inside datapackage-digest.json, conforming to the WACZ Auth specification. The simplest way to verify it is:

Open the WACZ in ReplayWeb.page.
The viewer renders a verified-signature badge if the embedded signature is intact.

If you prefer to inspect the signature manually, extract datapackage-digest.json from the WACZ zip and read the signedData block. The block contains the signing-service domain, the signature, and an embedded RFC 3161 timestamp.
Verifying the OpenTimestamps Bitcoin anchor
OpenTimestamps anchors a hash of the archive to the Bitcoin blockchain via a calendar server. The proof is verifiable offline against the public blockchain.
ots verify archive.wacz.ots
Output explains whether the proof is fully Bitcoin-anchored, pending confirmation, or invalid. A pending proof typically becomes anchored within a few hours of capture.
If you do not have the OpenTimestamps client installed, see https://opentimestamps.org for installation instructions.
Verifying the DigiCert and Sectigo certified timestamps
DigiCert and Sectigo both issue RFC 3161 timestamp responses, signed by their AATL-rooted Trust Service Provider keys. Verification uses standard OpenSSL.
openssl ts -reply -in archive.wacz.digicert.tsr -text
This prints the timestamp response in human-readable form: the time of stamping, the signing TSP's identity, and the hash bound to the timestamp. Pair it with a chain-validation step against the issuing TSP's public certificate to confirm the signature.
Repeat the same command with archive.wacz.sectigo.tsr for the Sectigo timestamp.
Verifying an eIDAS qualified timestamp (Custom plans)
Custom-plan archives may also include archive.wacz.qtsa.tsr, an eIDAS qualified timestamp under EU Regulation 910/2014 Article 41(2). Verification uses the same OpenSSL command, against the issuing Qualified Trust Service Provider's certificate chain.
openssl ts -reply -in archive.wacz.qtsa.tsr -text
Under Article 41(2), a successful verification gives the archive statutory legal presumption of accuracy of the date and time and of the integrity of the bound data.
The public verification page
Every Ultimate-plan archive can be shared publicly via a signed link generated from the PageCrawl interface. The recipient sees:

The source URL and capture timestamp.
The manifest hash.
A list of every layer that stamped the archive, with download links for the raw proof files and verification command hints.

The verification page does not require an account. It is intended for sharing with regulators, auditors, opposing counsel, or anyone who needs to independently confirm that the archive is genuine.
Why this matters in 2026
Generative AI can produce a plausible screenshot, HTML page, or PDF on demand. A self-stored archive proves nothing on its own. What AI cannot fabricate is a hash anchored to the Bitcoin blockchain, an RFC 3161 timestamp signed by a Trust Service Provider's private key, or a qualified seal from a regulated QTSP. Multi-provider cryptographic attestation is the only practical standard for evidentiary archives in an AI-saturated world.


What's in a PageCrawl WACZ Archive
2026-05-08T12:17:31+00:00
What's in a PageCrawl WACZ Archive
WACZ (Web Archive Collection Zipped) is an open specification developed by Webrecorder for packaging web archives in a portable, replayable, tamper-evident format. WACZ is used by the Internet Archive, the Library of Congress, and major eDiscovery and digital-preservation platforms. Storing PageCrawl archives in WACZ means they are interoperable with the wider archival ecosystem.
This article explains what's inside a PageCrawl WACZ, what the embedded signature does, and why we ship additional sidecar proofs alongside.
Inside the WACZ zip
A WACZ file is a zip archive with a defined internal structure:

archive/data.warc.gz, the WARC (Web ARChive) file containing the captured HTTP responses (HTML, images, scripts, stylesheets, linked PDFs, etc.) in their original byte form.
pages/pages.jsonl, a list of pages captured, one JSON object per line, with the URL, timestamp, and title.
datapackage.json, a manifest listing every file inside the archive along with its size, mime type, and SHA-256 hash. This is the canonical integrity manifest.
datapackage-digest.json, a SHA-256 hash of datapackage.json itself, plus an optional signedData block (WACZ Auth specification).

The hashes in datapackage.json chain into datapackage-digest.json, which is itself either signed by the WACZ Auth signedData block or simply stored alongside. Modifying any byte of any captured resource invalidates the manifest. The system is structurally tamper-evident.
The embedded signedData block (WACZ Auth)
When you enable WACZ capture on an Ultimate-plan page, the archive includes an embedded canonical signature following the WACZ Auth specification 0.1.0. The signature lives inside datapackage-digest.json and contains:

The cryptographic signature of datapackage.json's hash.
An RFC 3161 timestamp issued by a Trust Service Provider.
The signing service's domain certificate, proving its identity.

This is the WACZ-spec-compliant way to sign a WACZ archive. When a WACZ-aware tool (such as ReplayWeb.page) opens the archive, it reads the signedData block and renders an integrity badge if the signature validates. The badge tells a reviewer that the archive was signed by the named domain at the indicated time.
Sidecar proof files
Alongside the WACZ, PageCrawl retains additional proof files that don't fit inside the WACZ Auth spec:

archive.wacz.ots, OpenTimestamps proof, anchored to the Bitcoin blockchain.
archive.wacz.digicert.tsr, RFC 3161 timestamp from DigiCert (an Adobe Approved Trust List Trust Service Provider).
archive.wacz.sectigo.tsr, RFC 3161 timestamp from Sectigo (also an AATL TSP).
archive.wacz.qtsa.tsr (Custom plans), eIDAS qualified RFC 3161 timestamp from a Qualified Trust Service Provider.

The WACZ Auth spec only supports one embedded signature, so additional providers ship as sidecar files. Sidecars do not violate the WACZ format spec; they live in the same directory and are independent artefacts. Each sidecar is verifiable with public tooling (ots verify, openssl ts -reply -in) without touching the WACZ.
This dual approach gives the best of both worlds: spec-compliant embedded signature for WACZ-aware tooling, plus multi-provider redundancy for evidentiary depth.
How to read a WACZ
The simplest way to inspect a WACZ archive is to drag it into ReplayWeb.page. It renders the captured pages as the user originally saw them, including JavaScript-rendered content where applicable, plus the integrity badge from the embedded signature.
If you want to inspect the WACZ outside ReplayWeb.page, treat it as a regular zip archive. Standard zip tools can list and extract its contents. datapackage.json enumerates the captured resources and their hashes; pages/pages.jsonl enumerates the captured URLs.
How to download
Each tracked change with an archive shows a download button in the PageCrawl interface. From the archive details panel you can also download the per-provider timestamp proofs and the underlying WARC file for ingestion into other archival systems.
Related articles

Verifying a PageCrawl Web Archive
Sharing archives publicly
Packaging a PageCrawl Audit Trail for a Regulator



Parse: Monitor Any Value with Plain English
2026-05-24T14:48:12+00:00
Parse: Monitor Any Value with Plain English
Parse is a tracked element that uses AI to read the page on every check and return one value you described in plain English. You write a sentence describing what to retrieve, and the AI returns the answer each time the page changes. There are no CSS selectors, no XPath, no manual element picking, and no scripts.
If you have ever wanted to monitor a value that "moves around" between page layouts, sits inside a paragraph of prose, or has to be derived from the content rather than scraped from a fixed spot, Parse is the mode for that job.
What Parse Actually Does
On each check, PageCrawl reads the page and uses AI to return the single value described by your prompt. That value is stored as the current value of the tracked element.
From that point on, Parse behaves like any other tracked element. The value is compared to the previous value, alerts fire when it changes, history is recorded, and the value is available in notifications, webhooks, exports, and reports.
You are not building a chatbot. You are asking for one piece of data, and the result is treated as plain text.
How to Set It Up

Open the page you want to monitor, or create a new monitor.
Add a tracked element and choose the Parse type from the element list.
In the prompt field, describe the value you want, including the exact format. (See the prompt-writing guidance further down.)
Optionally give the element a friendly label so notifications and history are easy to read (for example "Headline price" or "Next earnings date").
Save the monitor. The first check runs immediately and populates the initial value, which becomes the baseline.

That is the whole setup. There is no selector to maintain and no script to debug.
When Parse Is Useful
Parse shines whenever the value you care about is semantic rather than positional. Good examples:

A date hidden inside a paragraph ("Our next investor call is scheduled for March 5, 2026.")
The lowest of several prices on a page, or the price after a discount is applied
A name from an About / team / leadership page
A version number, build identifier, or release tag in release notes
A status word like "Open", "Sold out", "Beta", "Coming soon"
A count or score that appears in different positions depending on the layout
A field on a page that frequently gets redesigned, where a CSS selector would keep breaking
A value that the page expresses in different units or formats, where you want the AI to normalize it

If your prompt could be answered by a person glancing at the page in under five seconds, Parse will usually handle it well.
When Parse Is Not the Right Tool
Parse is the most expensive tracking mode, and it is not always the best choice. Prefer a dedicated tracking type when:

The value is a price you can see clearly. Use the Price tracked element. It auto-detects, normalizes, and handles availability for free.
The value is a clean block of text. Use Text or Full Page Text. They run without AI and don't burn AI credits.
You only want to know whether something exists on the page. Use Boolean or Availability.
You want a number that is visually prominent. Use Number or Rating — they are cheaper and deterministic.
You want to track every item in a feed or grid. Use the Feed tracking mode for structured item lists.

Parse is also the wrong tool when:

The information you want is only revealed by user interaction (clicking a tab, expanding a section). Add an action to expose the content first.
The value changes constantly for cosmetic reasons (timestamps, "as of now" counters, rotating banner text). You will get noisy alerts.

Writing a Good Prompt
The single biggest factor in getting stable, useful results is being precise about format. AI can phrase the same answer in many different ways, and any of those variations will look like a change.
Compare these:



Vague
Explicit




"the price"
"the listed price as a plain number with no currency symbol, e.g. 24.99"


"when's the next earnings"
"the next earnings call date in YYYY-MM-DD format"


"who is the CEO"
"the CEO's full legal name as printed on the page, no titles"


"is it in stock"
"exactly the word Yes if the product is in stock, otherwise the word No"


"the latest version"
"the latest released version number in semver format (e.g. 4.2.0)"



Rules of thumb:

State the unit. "USD", "GBP", "as a percentage", "in days".
State the format. "YYYY-MM-DD", "ISO 8601", "plain number", "uppercase".
State what to do if the value is missing. For example, "If the value is not present on the page, return the word UNKNOWN." This prevents the AI from inventing something.
Keep it to one value. Parse returns a single value per element. If you need three values, add three Parse elements.
Reference visible cues. "The price shown in the largest red text", "the date in the section titled Upcoming Events".

What Not to Do

Don't ask the AI to summarize a page. Parse is for extracting one specific value, not for generating prose.
Don't ask for "anything that changed". The change comparison is automatic; your job is only to describe the value to extract.
Don't request a list, table, or JSON object. Use multiple Parse elements instead.
Don't include private context the AI cannot see ("the price our sales rep quoted yesterday"). It only knows what is on the page right now.
Don't write multi-paragraph prompts. One or two sentences is plenty and usually more reliable.

Cost
Parse is the most expensive tracking mode on every plan. Use it for the values that really benefit from AI extraction, and leave routine price, text, and availability tracking to the cheaper dedicated modes.
Plan Recommendation
Parse is recommended on the Enterprise or Ultimate plan. Free and Standard plans include small AI allowances that are fine for testing the feature, but they are not sized for ongoing monitoring of many Parse elements at frequent intervals.
Troubleshooting

The value flips between two formats and triggers false alerts. Your prompt is not specific enough about format. Add the exact format you want.
The value is sometimes empty. Tell the AI what to return when the value is missing (for example "UNKNOWN") so the result is consistent.
The AI returns something that is not on the page. Tighten the prompt and reference a visible cue ("from the section titled Pricing", "the value labelled Total"). If the value really isn't on the page, Parse cannot find it.
Parse is more expensive than expected. Check whether the page changes frequently for cosmetic reasons. Consider switching to a cheaper tracking mode or reducing the check frequency.

Related Articles

Available tracked monitoring types
AI-powered change detection
Choosing the best AI model for website monitoring
Reduce false positives

Model	Notes
GPT-5 Mini ⭐	Default. Great balance of speed, quality, and cost.
GPT-5.2	Most capable OpenAI model. Best for complex pages.
GPT-5	Full GPT-5 model.
GPT-5 Nano	Fastest and cheapest. Good for simple pages.
O3	Reasoning model for complex analysis.
O4 Mini	Smaller reasoning model.
GPT-4.1 Mini	Previous generation, still reliable.
GPT-4.1	Previous generation, good for complex tasks.
GPT-4.1 Nano	Previous generation budget option.

Model	Notes
Gemini 3 Flash ⭐	Default. Latest generation with great speed and quality.
Gemini 3.1 Pro	Premium model, Google's most capable.
Gemini 3.1 Flash Lite	Budget option in the latest generation.
Gemini 2.5 Flash	Reliable previous generation model.
Gemini 2.5 Flash Lite	Very affordable previous generation option.
Gemini 2.5 Pro	Previous generation premium model.

Model	Notes
Claude Haiku 4.5 ⭐	Default. Fast, affordable, and accurate.
Claude Sonnet 4.6	Latest generation with excellent accuracy.
Claude Opus 4.6	Most capable Anthropic model. Premium pricing.
Claude Sonnet 4.5	Previous generation, strong all-rounder.
Claude Opus 4.5	Previous generation premium model.
Claude Haiku 3.5	Older generation budget option.

Content Type	Budget Option	Recommended	Premium
Blogs, News, Docs	GPT-5 Nano	GPT-5 Mini	-
E-commerce, Pricing	Gemini 2.5 Flash Lite	Gemini 3 Flash	Claude Haiku 4.5
Legal, ToS, Compliance	Claude Haiku 4.5	Claude Sonnet 4.6	GPT-5.2
Competitor Monitoring	Gemini 2.5 Flash Lite	GPT-5 Mini	Claude Haiku 4.5
API Docs, Changelogs	GPT-5 Nano	Gemini 3 Flash	-

Provider	Get Key At
OpenRouter	openrouter.ai > Settings > API Key
OpenAI	platform.openai.com > API Keys
Google Gemini	ai.google.dev > Get API Key
Anthropic	console.anthropic.com > API Keys

Provider	Data Usage	Best For
OpenAI/Anthropic	API data not used for training	Confidential content, legal docs
Google Gemini	Review Google's data policies	General monitoring
OpenRouter	Varies by underlying model. Enable Privacy Mode to restrict to non-training providers.	Flexible choice

Role	Manage Team	Manage Workspaces	Edit Pages	View Pages
Owner	Yes	Yes	Yes	Yes
Administrator	Yes	Yes	Yes	Yes
Standard User	No	No	Yes	Yes
Viewer	No	No	No	Yes

Type	What It Tracks
Full Page	Entire page text content
Text	Text content of a specific element (by CSS/XPath selector)
Number	Numeric values with configurable change thresholds
Price	Price values with currency detection
Availability	In-stock/out-of-stock status
Links	All outgoing links on the page
Visual	Visual screenshot comparison with diff percentage
HTML	Raw HTML structure of an element
Boolean	Presence or absence of an element
Feed/List	RSS, Atom, or other feed content
Rating	Star ratings or review scores
Reviews	Customer review text and metadata
JavaScript	Values extracted by running custom JavaScript
SEO Tags	Meta tags, Open Graph data, and structured data
PDF	Text content extracted from PDF files
Word	Text content extracted from Word documents
Excel	Data extracted from Excel spreadsheets
CSV	Data extracted from CSV files
PowerPoint	Text content extracted from PowerPoint presentations

	JavaScript Tracked Element	Custom JavaScript Action
Purpose	Extract and monitor a value	Manipulate the page before extraction
Return value	Captured and tracked for changes	Ignored
Error handling	Check fails if code errors	Default engine: errors silently ignored. Stealth mode: errors stop the action sequence
When it runs	During element extraction	Before element extraction (in action sequence)
Use case	"Get me this computed value"	"Set up the page so I can monitor it correctly"

Type	Description
Full Page	Tracks the entire visible page content
Text	Extracts and compares text content from a CSS/XPath selector
Number	Extracts a numeric value for threshold-based comparison
Price	Specialized number extraction that handles currency symbols and formatting
Availability	Detects in-stock/out-of-stock status from common patterns
Visual	Compares screenshots of a specific element for visual changes
HTML	Compares the raw HTML of a selected element
Boolean	Checks whether an element exists or is visible on the page
Links	Extracts and compares all links within a selected area
JavaScript	Evaluates a custom JavaScript expression and tracks the return value
Text (All Matches)	Extracts text from all elements matching a selector
Text (All Matches Sorted)	Same as above, but sorted alphabetically for order-independent comparison
HTML (All Matches)	Extracts HTML from all elements matching a selector

Action	What It Does
Block cookie banners & ads	Automatically hide cookie consent banners and block ads
Hide website overlays & popups	Hide website overlays and popups
Remove dates	Replace dates with "[DATE REMOVED]" to prevent false positives
Remove element	Remove a specific element by CSS or XPath selector
Remove text	Remove elements containing specific text

Action	What It Does
Wait for text	Wait up to 15 seconds for specific text to appear on the page
Wait for text to disappear	Wait up to 15 seconds for specific text to disappear
Wait for element	Wait for an element (by XPath or CSS selector) to appear
Wait for redirect	Wait for the page to redirect to a new URL
Wait	Pause for a specified number of seconds

Action	What It Does
Click button	Click an element containing specific text
Click element	Click any element by CSS or XPath selector
Click at coordinates	Click at specific X/Y pixel coordinates
Hover	Hover over an element
Type text	Type text into an input field
Select option	Select an option from a dropdown
Submit form	Submit a form
Scroll to bottom	Scroll the page to the bottom (useful for lazy-loaded content)
Go back	Navigate back in browser history
Reveal hidden text	Make hidden text visible. Has two modes: "Expandable Sections Only" (expands collapsible sections and accordions) and "All invisible text" (reveals all hidden text on the page)

Action	What It Does
Disable JavaScript	Disable JavaScript before the page loads
Set cookie	Set or manage browser cookies
Execute JavaScript	Run custom JavaScript code on the page
Store Contents for Tracked Element	Store a tracked element's value at this point in the action sequence, useful when the element is only visible after a specific interaction
Handle CAPTCHA	Interact with CAPTCHA challenges

Plan	Pages per Website
Free	Up to 2,000
Standard	Up to 20,000
Enterprise	Up to 100,000
Ultimate	Up to 100,000

	Screenshot	Web Archive (WACZ)
What it captures	A flat image of the visible page	The complete page: HTML, CSS, JavaScript, images, fonts
Interactivity	None (static image)	Fully interactive: scroll, click links, hover over elements
Content below the fold	Only if full-page screenshot is enabled	Always included, the entire page is preserved
Dynamic content	Shows one visual state	Preserves interactive elements, dropdowns, tabs
File size	Small (typically under 1 MB)	Larger (includes all page assets)
Best for	Quick visual reference, visual diff comparison	Legal evidence, compliance records, full preservation

Tool	What It Does
Add page monitor	Create a new monitor with URL, tracking mode, frequency, and notifications
List monitors	Search and view monitors across all workspaces by URL, domain, or name
Get monitor details	See full configuration of a specific monitor including tracked elements and latest values. Supports batch requests
Get monitor history	Retrieve historical checks and detected changes with AI summaries. Supports batch requests
Get latest values	Quickly retrieve just the current values for one or more monitors (e.g., current price). Supports batch requests
Get check diff	View the actual text differences detected in a specific check
Trigger check	Trigger a one-off check on a monitor
Manage tags	List workspace tags, or add and remove tags from monitors
Mark changes seen	Mark detected changes as reviewed on one or all monitors
List templates	View available templates that can be applied when creating monitors
List workspaces	View all your teams and workspaces with their IDs
Update monitor defaults	View or update default settings for new monitors created via MCP

Category	Fields
Basic	id, title, status, event_type, changed_at, visual_diff, difference, human_difference, short_summary
Tracked Elements	content_type, elements (array of tracked element data)
Differences	markdown_difference, html_difference
Images	text_difference_image, page_screenshot_image
Page Info	page metadata, page_elements array
Content	contents, original (for extracted values)
Comparison	previous_check data
JSON	json, json_patch
AI	ai_summary, ai_priority_score

Match Mode	Case Sensitive	Whole Word	Example: keyword "assist"
Match any text (default)	No	No	Matches "assist", "Assist", "assistance", "ASSISTANT"
Match any text (case sensitive)	Yes	No	Matches "assist", "assistance" but not "Assist"
Match exact words only	No	Yes	Matches "assist", "ASSIST" but not "assistance"
Match exact words (case sensitive)	Yes	Yes	Matches only "assist" exactly

Condition	Description	Example
Greater than	Triggers when the number exceeds the specified value	Value is 150, triggers when number > 150
Greater than or equals	Triggers when the number is at or above the specified value	Value is 150, triggers when number >= 150
Less than	Triggers when the number drops below the specified value	Value is 50, triggers when number < 50
Less than or equals	Triggers when the number is at or below the specified value	Value is 50, triggers when number <= 50

Condition	Description	Example
Increased or Decreased by at least x percent	Triggers when the number changes in either direction by at least x%.	Value is 10, x is 20%. Triggers when value becomes 12+ or 8 or less.
Increased or Decreased by at least x	Triggers when the number changes in either direction by at least x (absolute).	Value is 10, x is 5. Triggers when value becomes 15+ or 5 or less.
Increased by at least x percent	Triggers only when the number goes up by at least x%.	Value is 10, x is 20%. Triggers when value becomes 12 or more.
Increased by at least x	Triggers only when the number goes up by at least x (absolute).	Value is 10, x is 5. Triggers when value becomes 15 or more.
Decreased by at least x percent	Triggers only when the number goes down by at least x%.	Value is 10, x is 20%. Triggers when value becomes 8 or less.
Decreased by at least x	Triggers only when the number goes down by at least x (absolute).	Value is 10, x is 5. Triggers when value becomes 5 or less.

Condition	Description
Feed item added	Triggers when a new item appears in the feed
Feed item removed	Triggers when an item is removed from the feed
Feed item changed	Triggers when an existing feed item's content is modified
Feed order changed	Triggers when the order of items in the feed changes
Feed price changed	Triggers when a price value within a feed item changes

Condition	Description
Cheapest in group	Triggers when this monitor's price becomes the lowest in the comparison group
Most expensive in group	Triggers when this monitor's price becomes the highest in the comparison group
Price spread	Triggers based on the price difference between the cheapest and most expensive items in the group

Browser	Desktop	Mobile
Chrome	Yes	Yes (Android)
Firefox	Yes	Yes (Android)
Edge	Yes	-
Safari 16+	Yes (macOS)	Yes (iOS)

Channel	Setup	Speed	Best For
Web Push	None	Instant	Personal monitoring, time-sensitive changes
Email	None	Minutes	Searchable archive, batch review
Slack	Webhook URL	Instant	Team collaboration
Discord	Webhook URL	Instant	Community monitoring
Teams	Webhook URL	Instant	Enterprise environments
Telegram	Chat ID	Instant	Mobile-first users

Capability	Description
Side-by-side pricing	See all retailer prices for a product in one place via the Matched Pages panel
Comparison alerts	Get notified when a price becomes the cheapest, most expensive, or when the spread exceeds a threshold
Cross-retailer export	Download a spreadsheet with one row per product and columns per retailer
Smart suggestions	When linking monitors, PageCrawl suggests the most relevant candidates
Automatic grouping	Monitors are grouped automatically when product identifiers match
Reference labels	Manually group monitors using labels with a shared prefix
Google Sheets integration	Include comparison data and label-based columns in automated Google Sheets exports

Alert Type	When It Fires	Configuration
Cheapest	This monitor's price is the lowest in the group	No additional configuration needed
Most Expensive	This monitor's price is the highest in the group	No additional configuration needed
Price Spread	The gap between the lowest and highest price in the group exceeds a percentage	Set the spread threshold percentage

Column	Description
Product	Product name from page metadata, or monitor name as fallback
GTIN	Global Trade Item Number if detected
SKU	Stock Keeping Unit if detected
Brand	Product brand if detected
[retailer domain]	One column per unique retailer domain, containing the current tracked value

Monitor	Label
Laptop X on Amazon	`ref:LAPTOP-X-2024`
Laptop X on Best Buy	`ref:LAPTOP-X-2024`
Laptop X on Walmart	`ref:LAPTOP-X-2024`

Setting	Description
Prefix Columns	List of prefixes to expose as export/Google Sheets columns (max 10)
Comparison Prefix	The prefix used for product comparison grouping (default: `ref`)