<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/vendor/feed/atom.xsl" type="text/xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US">
                        <id>https://pagecrawl.io/help/feed</id>
                                <link href="https://pagecrawl.io/help/feed" rel="self"></link>
                                <title><![CDATA[PageCrawl.io Help Center]]></title>
                    
                                <subtitle>PageCrawl.io Help Center rss feed.</subtitle>
                                                    <updated>2026-06-21T12:06:10+00:00</updated>
                        <entry>
            <title><![CDATA[Cancel or Upgrade Account]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/account-settings/article/cancel-or-upgrade-account" />
            <id>https://pagecrawl.io/1</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Cancel or Upgrade Account</h1>
<h3>Changing plan or billing interval</h3>
<p>If you would like to change or upgrade your plan, just go to your <a href="/app/settings/subscription">Subscription settings</a> and choose a plan you want to switch to.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-subscription.png" alt="Subscription settings with the Choose Your Plan section and a monthly or yearly billing toggle">
</div>
Upgrades/downgrades are prorated, meaning, that the unused time will be applied as a credit for the next payment. 
e.g. you subscribed to $8/mo plan but you only used it for half-a-month and decided to upgrade to $30/mo plan. When upgrading, 4$ will be credited back and the remaining half-of-the-month of $30/mo plan will only cost you 11$. 
<h3>Canceling or Suspending your account</h3>
<p>You can cancel your subscription by going to your <a href="/app/settings/subscription">Subscription settings</a> and clicking on the red <strong>"Downgrade to Free"</strong> button. This will open a multi-step confirmation modal where you can optionally provide feedback about why you are canceling. To complete the cancellation, you will need to type "CANCEL" and confirm.</p>
<p>Once confirmed, your subscription will not end immediately. You will retain full access to your paid features until the end of your current billing period (grace period). After that date, your account will automatically downgrade to the Free plan.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How to Change Email Address]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/account-settings/article/how-to-change-email-address" />
            <id>https://pagecrawl.io/2</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How to Change Email Address</h1>
<p>Unfortunately, for security and to prevent service abuse, email addresses cannot be changed directly by users.</p>
<div class="kb-figure">
  <img src="/images/knowledge/account-your-account.png" alt="Your Account section in Account Settings showing the read-only email field with a note to contact support to change it">
</div>
<p>To change your email address please contact support at <a href="mailto:help_me@pagecrawl.io">help_me@pagecrawl.io</a> from your originally registered email address. We will verify the information and get back to you as soon as possible.</p>
<p><em>Email address for 'Free Forever' plan users cannot be changed to prevent service abuse.</em></p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Can I pay by Paypal?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/can-i-pay-using-paypal" />
            <id>https://pagecrawl.io/3</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Can I pay by Paypal?</h1>
<p>Unfortunately, it is not yet possible to pay via Paypal. </p>
<p>We support subscription billing by credit/debit card, Apple Pay, and Google Pay for monthly and annual billing intervals.</p>]]>
            </summary>
                                    <updated>2026-05-06T14:16:21+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How do I get invoices?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/how-do-i-get-invoices" />
            <id>https://pagecrawl.io/4</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How do I get invoices?</h1>
<p>You can find all your invoices <a href="/app/settings/subscription">here</a>.</p>
<p>If you wish to receive invoices to your email each month/year, enter your email address in the billing details section.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Is it possible to pay by a bank transfer or purchase order?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/is-it-possible-to-pay-by-bank-transfer" />
            <id>https://pagecrawl.io/5</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Is it possible to pay by a bank transfer or purchase order?</h1>
<p>We accept all major credit and debit cards for subscriptions.</p>
<p>For <strong>Ultimate plans paid annually</strong>, we also support:</p>
<ul>
<li>Bank transfers (wire/ACH)</li>
<li>Purchase orders (PO)</li>
<li>Invoicing</li>
</ul>
<p>If you would like to arrange an alternative payment method, please contact support at <a href="mailto:support@pagecrawl.io">support@pagecrawl.io</a>.</p>]]>
            </summary>
                                    <updated>2026-03-05T10:31:13+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Why does my card keep getting declined?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/why-does-my-card-keep-getting-declined" />
            <id>https://pagecrawl.io/6</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Why does my card keep getting declined?</h1>
<p>The most common reasons for a failed transaction include insufficient funds, incorrect card details, and suspicions of fraud.</p>
<p>In case of a transaction failure first, check if the card details you entered are correct and make sure that there are enough funds in your account to make a purchase.</p>
<p>If the transaction keeps getting declined try using another card or contact your card issuer. In most cases your card issuer will be able to remove the block and allow the transaction to go through.</p>
<p>Common reasons for a payment failure:</p>
<ul>
<li>Insufficient funds</li>
<li>Your card has expired</li>
<li>Incorrectly entered information</li>
<li>Account flagged for fraud</li>
<li>Credit limit has been maxed out</li>
<li>Transaction blocked</li>
<li>Your card doesn't allow international transactions</li>
<li>Wrong billing address</li>
</ul>]]>
            </summary>
                                    <updated>2026-03-05T10:31:13+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Tracking Text Changes in PDF Files]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/can-pagecrawl-detect-changes-in-pdf" />
            <id>https://pagecrawl.io/8</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Tracking Text Changes in PDF Files</h1>
<p>PageCrawl can monitor PDF files hosted online and notify you when the text content changes. It extracts text from the PDF, compares it against the previous version, and highlights exactly what was added, removed, or modified.</p>
<h3>How It Works</h3>
<ol>
<li>PageCrawl downloads the PDF file at your configured check frequency</li>
<li>Text is extracted from the PDF</li>
<li>The extracted text is compared against the previous version</li>
<li>If changes are detected, you receive a notification with a diff showing exactly what changed</li>
</ol>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the direct URL to the PDF file</li>
<li>PageCrawl automatically detects it as a PDF and shows the appropriate configuration options</li>
<li>Choose your check frequency and notification preferences</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste the direct URL to a PDF file to start monitoring it">
</div>
<h3>Password-Protected PDFs</h3>
<p>PDFs behind login authentication are also supported. Configure an <a href="/help/features/article/can-i-track-password-protected-websites">authentication setup</a> first, then select it when adding the PDF to monitor.</p>
<h3>PDF vs File Checksum</h3>
<table>
<thead>
<tr>
<th>Method</th>
<th>What It Detects</th>
<th>Diff Available</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>PDF text tracking</strong></td>
<td>Text content changes (additions, deletions, edits)</td>
<td>Yes, line-by-line diff</td>
</tr>
<tr>
<td><strong>File checksum</strong></td>
<td>Any modification to the file (including metadata, images)</td>
<td>No, only detects that something changed</td>
</tr>
</tbody>
</table>
<p>Use PDF text tracking when you need to see exactly what text changed. Use <a href="/help/file-tracking/article/file-checksum-hash-monitoring">file checksum monitoring</a> when you need to detect any modification, including non-text changes.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/file-checksum-hash-monitoring">File Checksum Monitoring</a> - Detect any file modification using SHA-256</li>
<li><a href="/help/tutorials/article/tracking-changes-in-pdf-files">Tracking PDF Files (Tutorial)</a> - Step-by-step PDF monitoring guide</li>
<li><a href="/help/file-tracking/article/track-changes-in-excel-files">Excel Spreadsheets</a> - Monitor Excel file changes</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Send SMS message when website change is detected]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/does-pagecrawl-have-sms-notifications" />
            <id>https://pagecrawl.io/11</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Send SMS message when website change is detected</h1>
<div class="kb-figure">
  <img src="/images/blog/sms-message.webp" alt="sms message notifications">
</div>
<p>While SMS messages can be useful for mission-critical applications, to avoid increasing the subscription costs, we do not include native SMS notifications in our subscription plans. </p>
<p>For personal use, we suggest using Telegram Messenger as an alternative of the SMS notifications. It is free of charge, and you only need Internet connection on your mobile phone, which you most likely already have and will need to review what has changed on your monitored page.</p>
<h3>Send SMS via Zapier Integration</h3>
<p>If you really need to receive change notifications by SMS, you can receive them by setting up <a href="https://zapier.com/apps/sms/integrations">Zapier integration</a> to send SMS messages. Zapier allows integrating our application to over 7,000 services easily (for an additional cost and there may be a limit for the number of SMS each month).</p>
<h3>Other notification channels</h3>
<p>We have integrations with other notification channels, visit <a href="/help/integrations">PageCrawl.io Integrations</a> to learn more.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[What is the difference between Enterprise Support and Standard Support?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/difference-between-premium-and-standard-suport" />
            <id>https://pagecrawl.io/14</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>What is the difference between Enterprise Support and Standard Support?</h1>
<p>We aim to respond to your inquiries promptly but sometimes due to an increased number of support requests Enterprise customer requests/emails are prioritized over the Standard customers. Therefore, the response time is faster, also you may expect a 'higher level' of support in case you are not able to set up the page the way you want.</p>
<p>For technical support our response times are prioritized according to your subscription plan:</p>
<ul>
<li>Free Forever Plan: Technical support not offered</li>
<li>Standard Plan: Within 72 hours (excluding weekends)</li>
<li>Enterprise Plan: Within 24 hours (excluding weekends)</li>
<li>Ultimate Plan: Within 24 hours (excluding weekends)</li>
</ul>]]>
            </summary>
                                    <updated>2026-05-06T14:16:21+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Is there any limit to how many websites we can add to monitor?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/is-there-limit-how-many-websites-i-can-add-to-monitor" />
            <id>https://pagecrawl.io/15</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Is there any limit to how many websites we can add to monitor?</h1>
<p>No. There is <strong>no limit on the number of distinct websites or domains</strong> you can monitor. Your plan limit is based on the number of <strong>pages</strong> you track, not how many sites they belong to. You can spread your pages across as many different websites as you like.</p>
<h3>What counts as "1 page"?</h3>
<p>A page is <strong>one tracked URL</strong> (one monitor). For example:</p>
<ul>
<li><code>example.com/pricing</code> is 1 page.</li>
<li><code>example.com/pricing</code> and <code>example.com/blog</code> are 2 pages, even though they're on the same website.</li>
<li>A product page and a competitor's product page on a different site are 2 pages.</li>
</ul>
<p>Tracking several elements on the same URL (for example a product's price, availability, and rating together) still counts as <strong>1 page</strong> — it's the URL that counts, not the number of tracked elements on it.</p>
<p>If you need to track more pages, you can upgrade your plan at any time.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Is there a limit to the number of checks in the plan?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/is-there-limit-of-checks-in-standard-plan" />
            <id>https://pagecrawl.io/16</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Is there a limit to the number of checks in the plan?</h1>
<p>You can see exactly how many checks you've used, your tracked-page count, and when your usage resets on the <strong>Usage</strong> page (<strong>Settings → Usage</strong>):</p>
<div class="kb-figure">
  <img src="/images/knowledge/usage-page.png" alt="Usage page showing tracked pages, checks this period, estimated checks, approximate checks per month, notify ratio, and when usage resets">
</div>
<p>The Standard plan includes 15,000 checks, the Enterprise plan allows for 100,000 checks, and the Ultimate plan also includes 100,000 checks each month. All paid plans can be purchased in multiples if you require more pages checked or more frequent checks.</p>
<h3>How many checks I need?</h3>
<p>It all depends on how many pages you want to track and how frequently. Also, <a href="/help/features/article/page-check-schedule">adjusting your schedule</a> may reduce the number of checks needed. You may start with the Standard plan and upgrade if you notice that you need more.</p>
<p>A few rules of thumb:</p>
<ol>
<li>A page checked daily will require 30 checks each month.</li>
<li>A page checked every hour will require 720 checks each month.</li>
<li>A page checked every 5 minutes will require 8,640 checks each month.</li>
</ol>
<h3>Estimating based on current usage</h3>
<p>If your estimated number of checks for this period will be over the limit, you will see an alert. You can check your <a href="/app/settings/team/stats">usage statistics</a> to find out your current estimate.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How to Delete My Account]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/account-settings/article/how-to-delete-my-account" />
            <id>https://pagecrawl.io/17</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How to Delete My Account</h1>
<p><strong>Deletion of your account will result in loss of ALL data associated with it.</strong></p>
<p>To delete your account go to the <strong>Account Settings</strong>, scroll to the bottom of the page, press <strong>Permanently delete my account</strong>, and proceed with the instructions.</p>
<div class="kb-figure">
  <img src="/images/knowledge/account-deletion.png" alt="Account Deletion section at the bottom of Account Settings with the Permanently delete my account button">
</div>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Send Website Change Detection Notifications to Microsoft Teams channel]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/send-microsoft-teams-notification-when-changes-detected" />
            <id>https://pagecrawl.io/19</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Send Website Change Detection Notifications to Microsoft Teams channel</h1>
<div class="kb-figure">
  <img src="/images/blog/microsoftteams.jpeg" alt="microsoft teams change detection notifications">
</div>
<p>PageCrawl.io monitors websites for changes and sends instant notifications through your preferred channels. This guide walks you through connecting PageCrawl.io with Microsoft Teams to receive alerts directly in your Teams channels.</p>
<h2>What You'll Need</h2>
<p><strong>Before starting, ensure you have:</strong></p>
<ol>
<li>
<p><strong>A PageCrawl.io account</strong><br />
→ <a href="https://pagecrawl.io/app/auth/register">Sign up here</a> if you don't have one yet</p>
</li>
<li>
<p><strong>Microsoft 365 For Business subscription</strong><br />
Basic Teams plans don't support external webhooks - you need a Business plan</p>
</li>
</ol>
<h2>Setting Up the Integration</h2>
<h3>Step 1: Create a Teams Webhook</h3>
<p><strong>1.1</strong> In your Teams channel, click the <strong>Workflows</strong> menu</p>
<div class="kb-figure">
  <img src="/images/blog/step-1-teams.png" alt="microsoft teams workflows webhook setup">
</div>
<p><strong>1.2</strong> Select <strong>"Post to a channel when a webhook request is received"</strong></p>
<div class="kb-figure">
  <img src="/images/blog/step-2-teams.png" alt="microsoft teams incoming webhook location">
</div>
<p><strong>1.3</strong> Click <strong>Next</strong> and name your workflow
Use a descriptive name like "PageCrawl Website Monitoring"</p>
<div class="kb-figure">
  <img src="/images/blog/step-3-teams.png" alt="microsoft teams configure webhook">
</div>
<p><strong>1.4</strong> Copy the generated webhook URL</p>
<div class="kb-figure">
  <img src="/images/blog/step-4-teams.png" alt="Microsoft Teams workflows URL">
</div>
<h3>Step 2: Connect to PageCrawl.io</h3>
<p><strong>Choose your notification scope:</strong></p>
<p><strong>Option A: Monitor All Pages</strong><br />
→ Go to <strong>Settings</strong> &gt; <strong>Integrations</strong> and click <strong>Setup</strong> on the Microsoft Teams card<br />
→ Paste the Teams webhook URL<br />
→ Save changes</p>
<div class="kb-figure">
  <img src="/images/knowledge/notif-teams-setup.png" alt="Connect Microsoft Teams dialog in PageCrawl with the webhook URL field">
</div>
<p><strong>Option B: Monitor Specific Pages</strong><br />
→ Open settings for individual pages<br />
→ Add the Teams webhook URL<br />
→ Save changes</p>
<p><strong>Note</strong>: Set a default webhook for all pages, then override for specific ones that need special handling.</p>
<p><strong>Not working?</strong> Check that:</p>
<ul>
<li>The webhook URL was copied correctly</li>
<li>Your Microsoft 365 plan supports webhooks</li>
<li>The monitored page actually changed</li>
</ul>
<h2>More Notification Options</h2>
<h3>Other supported notification channels</h3>
<p>We do have more supported notification channels to suit everyone's preferences.</p>
<ul>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-telegram-notifications">Be notified about website changes via Telegram</a></li>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-discord-notifications">Be notified about website changes via Discord</a></li>
<li><a href="/help/integrations/article/send-slack-notification-when-changes-detected">Be notified about website changes via Slack</a></li>
<li>Be notified about website changes via Email</li>
<li>Be notified about website changes via Webhook</li>
<li>Be notified about website changes via Zapier</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Send Website Change Detection Notifications to Discord channel]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/track-website-changes-integrate-with-discord-notifications" />
            <id>https://pagecrawl.io/20</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Send Website Change Detection Notifications to Discord channel</h1>
<div class="kb-figure">
  <img src="/images/blog/discord.png" alt="discord change detection notifications">
</div>
<p>PageCrawl allows you to track changes in websites and get notified instantly via your preferred method. In this article we will discuss how you can setup PageCrawl to receive notifications in Discord.</p>
<h2>Prerequisites</h2>
<p>You need a PageCrawl.io account. This works in both Free and Paid accounts. If you don't already have one, <a href="https://pagecrawl.io/app/auth/register">go here to register an account</a>.</p>
<h2>Retrieve Discord Webhook URL</h2>
<p>Follow the steps below to retrieve a Discord Webhook URL</p>
<h3>1. You should go to a server and click "Edit Channel" (e.g. see below).</h3>
<div class="kb-figure">
  <img src="/images/blog/edit-discord.png" alt="discord edit channel">
</div>
<h3>2. Click on "Integrations" and press "New Webhook" button</h3>
<div class="kb-figure">
  <img src="/images/blog/integrations.png" alt="discord add new webhook">
</div>
<h3>3. Finally, click on "Copy Webhook URL"</h3>
<div class="kb-figure">
  <img src="/images/blog/new-webhook.png" alt="discord copy webhook link">
</div>
<h2>Set Webhook URL in PageCrawl.io</h2>
<p>Open <strong>Settings</strong> &gt; <strong>Integrations</strong>, click <strong>Setup</strong> on the Discord card, and paste the webhook URL.</p>
<div class="kb-figure">
  <img src="/images/knowledge/notif-discord-setup.png" alt="Connect Discord dialog in PageCrawl with the webhook URL field">
</div>
<p>If you would like to receive notifications for all tracked pages, simply paste webhook URL in <a href="/app/settings/workspace/notifications">user notification preferences</a>.</p>
<p>If you only want a single page to be notified about in Discord. Just set this Webhook URL in a specific page.</p>
<h2>Troubleshooting</h2>
<p><strong>What if I can't edit the server?</strong> 
You should ensure you have permissions from the server owner to edit channel.</p>
<p><strong>I didn't receive a notification</strong> 
Please wait for page to change. We will only send a notification when we detect a change.</p>
<p><strong>I receive too many notifications? What can I do?</strong> 
You may setup notification rules to be notified only when e.g. text disappears, number increases, etc.</p>
<h3>Other supported notification channels</h3>
<p>We do have more supported notification channels to suit everyone's preferences.</p>
<ul>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-telegram-notifications">Be notified about website changes via Telegram</a></li>
<li><a href="/help/integrations/article/send-microsoft-teams-notification-when-changes-detected">Be notified about website changes via Microsoft Teams</a></li>
<li><a href="/help/integrations/article/send-slack-notification-when-changes-detected">Be notified about website changes via Slack</a></li>
<li><a href="/help/notifications/article/email-notifications">Be notified about website changes via Email</a></li>
<li><a href="/help/integrations/article/webhook-integration">Be notified about website changes via Webhook</a></li>
<li><a href="/help/integrations/article/pagecrawl-zapier-integration">Be notified about website changes via Zapier</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Send Website Change Detection Notifications to Telegram group or channel]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/track-website-changes-integrate-with-telegram-notifications" />
            <id>https://pagecrawl.io/21</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Send Website Change Detection Notifications to Telegram group or channel</h1>
<div class="kb-figure">
  <img src="/images/blog/telegram.png" alt="telegram change detections">
</div>
<p>PageCrawl.io allows you to track changes in websites and get notified instantly via your preferred method. In this article we will discuss how you can setup PageCrawl to receive notifications in Telegram.</p>
<h2>Prerequisites</h2>
<p>You need a PageCrawl.io account. This works in both Free and Paid accounts. If you don't already have one, <a href="/app/auth/register">go here to register an account</a>.</p>
<h2>Retrieve Telegram Chat ID</h2>
<p>Follow the steps below to retrieve a Telegram Chat ID. This is needed so you could receive notifications in a 1-to-1 chat, channel or a group conversation.</p>
<h3>Start 1-to-1 conversation with @PageCrawlBot, invite to a Channel, or add to group conversation.</h3>
<h4>1-to-1 conversation</h4>
<p>Simply begin a conversation with <a href="https://t.me/PageCrawlBot">@PageCrawlBot</a> and you will receive instructions how to configure it.</p>
<div class="kb-figure">
  <img src="/images/blog/telegram1.jpg" alt="telegram start conversation">
</div>
<h4>Include in a Channel or Group conversation</h4>
<p>Instructions for Channels and Groups are identical. To include the bot in the Channel or Group you should invite <a href="https://t.me/PageCrawlBot">@PageCrawlBot</a> to the channel. You may likely also need to adjust bot permissions, so it could read and send messages. To get instructions what code you should put in PageCrawl.io settings, send a /start message to the bot: <code>@PageCrawlBot /start</code></p>
<div class="kb-figure">
  <img src="/images/blog/telegram2.jpg" alt="telegram bot setup">
</div>
<p>Keep in mind that Channels or Group conversations have a <strong>negative</strong> chat id! 1-to-1 conversations - always positive chat id.</p>
<h2>Configure in PageCrawl.io</h2>
<p>Open <strong>Settings</strong> &gt; <strong>Integrations</strong>, click <strong>Setup</strong> on the Telegram card, and enter the Chat ID you obtained.</p>
<div class="kb-figure">
  <img src="/images/knowledge/notif-telegram-setup.png" alt="Connect Telegram dialog in PageCrawl with the Chat ID field">
</div>
<p>If you would like to receive notifications for all tracked pages, enter the Chat ID you obtained in previously in <a href="/app/settings/workspace/notifications">user notification preferences</a>.</p>
<p>If you only want a single page to be notified about in Telegram. Just set this Chat ID in a specific page.</p>
<h2>Troubleshooting</h2>
<p><strong>What if the bot doesn't respond?</strong> 
Make sure you have started a chat with the bot and sent it at least one message first. The bot can only message you after you have initiated the conversation.</p>
<p><strong>I didn't receive a notification</strong> 
Please wait for page to change. We will only send a notification when we detect a change.</p>
<p><strong>I receive too many notifications? What can I do?</strong> 
You may setup notification rules to be notified only when e.g. text disappears, number increases, etc.</p>
<h3>Other supported notification channels</h3>
<p>We do have more supported notification channels to suit everyone's preferences.</p>
<ul>
<li><a href="/help/integrations/article/send-microsoft-teams-notification-when-changes-detected">Be notified about website changes via Microsoft Teams</a></li>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-discord-notifications">Be notified about website changes via Discord</a></li>
<li><a href="/help/integrations/article/send-slack-notification-when-changes-detected">Be notified about website changes via Slack</a></li>
<li>Be notified about website changes via Email</li>
<li>Be notified about website changes via Webhook</li>
<li>Be notified about website changes via Zapier</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitoring password-protected pages]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/can-i-track-password-protected-websites" />
            <id>https://pagecrawl.io/23</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitoring password-protected pages</h1>
<div class="kb-figure">
  <img src="/images/knowledge/website-authentication-setup.png" alt="Website Authentication form showing the login page URL, the username/password/login-button field selectors, and the Credentials section where the username and password are entered">
</div>
<p>If you're looking to track pages on websites that require login authentication, the answer is yes – it is possible. Please note that this feature is only available on paid plans.</p>
<h2>How It Works</h2>
<p>Monitoring password-protected pages is a two-step process:</p>
<ol>
<li><strong>Configure authentication</strong> - Set up your login credentials once</li>
<li><strong>Select when monitoring</strong> - Choose the configuration when adding a page to monitor</li>
</ol>
<h2>Step 1: Configure Authentication</h2>
<p>Before you can monitor password-protected pages, you need to set up an authentication configuration:</p>
<ol>
<li>Go to <a href="/app/settings/workspace/authentication">Authentication Settings</a></li>
<li>Click "Add Authentication Configuration"</li>
<li>Fill in the required details:<ul>
<li><strong>Name</strong> - A friendly name to identify this configuration (e.g., "My Company Portal")</li>
<li><strong>Login URL</strong> - The URL of the login page</li>
<li><strong>Username/Email</strong> - Your login credentials</li>
<li><strong>Password</strong> - Your password</li>
<li><strong>Form fields</strong> - CSS selectors for the username field, password field, and submit button</li>
</ul>
</li>
<li>Save the configuration</li>
</ol>
<p>You can create multiple authentication configurations for different websites.</p>
<h2>Step 2: Add a Page to Monitor</h2>
<p>Once your authentication is configured:</p>
<ol>
<li>Go to add a new page to monitor</li>
<li>Enter the URL of the password-protected page you want to track</li>
<li>If an authentication configuration exists for that website's domain, a <strong>"Login Authentication"</strong> option will appear</li>
<li>Select the appropriate authentication configuration from the dropdown</li>
<li>Complete the rest of the setup as usual</li>
</ol>
<p>The system automatically detects and shows only authentication configurations that match the domain of the URL you're monitoring. For example, if you're monitoring <code>https://app.example.com/dashboard</code>, it will show authentication configs set up for <code>example.com</code>.</p>
<h2>Can You Also Track Files Behind Login Authentication?</h2>
<p>If you want to track files such as PDFs, Excel spreadsheets, CSVs, or Word documents, you're in luck. These types of files can also be tracked, even if they are behind login authentication. Simply provide the link to the file and select the appropriate authentication configuration.</p>
<h2>Logins That Require a One-Time Code (2FA)</h2>
<p>If a login asks for a one-time code (a two-factor step) after the password, PageCrawl can complete it automatically. On the authentication configuration, turn on <strong>Reuse login session</strong>, then turn on <strong>Sign in with OTP</strong> and choose a <strong>Code source</strong>:</p>
<ul>
<li><strong>Authenticator app (TOTP):</strong> Paste the authenticator secret from the site's two-factor settings (the "enter this key manually" text shown under the QR code, or the full <code>otpauth://</code> link). PageCrawl generates the current code at sign-in, the same way an authenticator app does.</li>
<li><strong>Emailed code:</strong> Forward the login-code emails to the dedicated address PageCrawl generates for you, or set the site account's email to that address. PageCrawl reads the code and finishes signing in. Set "Only accept codes from" to the sender's email or domain so nothing else is mistaken for a code.</li>
</ul>
<p>Because the session is reused between checks, the code step only runs on the first sign-in or when the session expires, not on every check.</p>
<p><strong>Note:</strong> Emailed codes add a short wait to each sign-in while the forwarded message arrives. The Standard plan has a tighter per-check time budget, so a slow email can cause the check to time out before the code lands. For emailed codes, use the <strong>Enterprise</strong> or <strong>Ultimate</strong> plans, which allow longer checks. Authenticator-app codes are generated instantly with no wait, so they work on any paid plan, including Standard.</p>
<p>Authenticator apps and emailed codes are the supported two-factor methods. SMS text-message codes and physical security keys are not supported.</p>
<h2>HTTP Basic Authentication</h2>
<div class="kb-figure">
  <img src="/images/blog/http-basic.png" alt="http basic authentication setup">
</div>
<p>In case the website is using "HTTP Basic Authentication" (the browser popup that asks for credentials), you can enter the credentials under "Advanced Settings" when setting up your monitored page. This is different from form-based login authentication.</p>]]>
            </summary>
                                    <updated>2026-06-21T12:06:10+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[PageCrawl API & Webhooks]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/does-pagecrawl-support-api" />
            <id>https://pagecrawl.io/24</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>PageCrawl API &amp; Webhooks</h1>
<p>PageCrawl provides three ways to integrate with external systems: a REST API, webhooks, and RSS feeds.</p>
<p><em>API and webhooks are available on paid plans.</em></p>
<h3>API</h3>
<p>The REST API lets you manage monitors programmatically, including creating pages, retrieving change history, and triggering checks. Find your API key in <strong>Settings</strong> &gt; <strong>API</strong>.</p>
<p>See the <a href="/help/features/article/api-webhooks-for-custom-integrations">API &amp; Webhooks guide</a> for endpoints and authentication details. For the full endpoint reference and schemas, see <a href="/developers">pagecrawl.io/developers</a>.</p>
<h3>Webhooks</h3>
<p>Webhooks send HTTP POST requests to your endpoint whenever a change is detected or an error occurs. Configure them in <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Integrations</strong> &gt; <strong>Webhooks</strong>.</p>
<p>See the <a href="/help/integrations/article/webhook-integration">Webhook Integration guide</a> for setup, payload fields, and example payloads.</p>
<h3>RSS Feeds</h3>
<p>Access recent changes in Atom RSS format. Generate a public RSS URL for a single page or for all pages in the workspace.</p>
<p>See the <a href="/help/features/article/page-monitoring-rss-feeds">RSS Feeds guide</a> for setup instructions.</p>]]>
            </summary>
                                    <updated>2026-05-06T14:16:21+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Changes in CSV Files]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/track-changes-in-csv-files" />
            <id>https://pagecrawl.io/25</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Changes in CSV Files</h1>
<p>PageCrawl can monitor CSV (comma-separated values) files hosted online and notify you when their content changes. It retrieves the file, compares the data against the previous version, and shows exactly what rows or values were added, removed, or modified.</p>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the direct URL to the CSV file</li>
<li>PageCrawl detects the file type and shows the appropriate configuration</li>
<li>Choose your check frequency and notification preferences</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste the direct URL to a CSV file to start monitoring it">
</div>
<h3>Password-Protected Files</h3>
<p>CSV files behind login authentication are supported. Configure an <a href="/help/features/article/can-i-track-password-protected-websites">authentication setup</a> first, then select it when adding the file.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/track-changes-in-excel-files">Excel Spreadsheets</a> - Monitor Excel file changes</li>
<li><a href="/help/file-tracking/article/monitor-changes-in-google-sheets">Google Docs &amp; Sheets</a> - Monitor Google Sheets and Docs</li>
<li><a href="/help/file-tracking/article/file-checksum-hash-monitoring">File Checksum Monitoring</a> - Detect any file modification</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Changes in Excel Spreadsheets (xls, xlsx, ods)]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/track-changes-in-excel-files" />
            <id>https://pagecrawl.io/26</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Changes in Excel Spreadsheets (xls, xlsx, ods)</h1>
<p>PageCrawl can monitor Excel files hosted online and notify you when their content changes. It extracts text and data from the spreadsheet, compares it against the previous version, and shows exactly what was added, removed, or modified.</p>
<h3>Supported File Types</h3>
<p><strong>xls</strong>, <strong>xlsx</strong>, <strong>ods</strong></p>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the direct URL to the Excel file</li>
<li>PageCrawl detects the file type and shows the appropriate configuration</li>
<li>Choose your check frequency and notification preferences</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste the direct URL to an Excel file to start monitoring it">
</div>
<h3>Password-Protected Files</h3>
<p>Excel files behind login authentication are supported. Configure an <a href="/help/features/article/can-i-track-password-protected-websites">authentication setup</a> first, then select it when adding the file.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/track-changes-in-csv-files">CSV Files</a> - Monitor CSV file changes</li>
<li><a href="/help/file-tracking/article/monitor-changes-in-google-sheets">Google Docs &amp; Sheets</a> - Monitor Google Sheets and Docs</li>
<li><a href="/help/file-tracking/article/file-checksum-hash-monitoring">File Checksum Monitoring</a> - Detect any file modification</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Changes in PowerPoint Presentations]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/track-changes-in-powerpoint-files" />
            <id>https://pagecrawl.io/27</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Changes in PowerPoint Presentations</h1>
<p>PageCrawl can monitor PowerPoint presentations hosted online and notify you when their text content changes. It extracts text from the slides, compares it against the previous version, and shows exactly what was added, removed, or modified.</p>
<h3>Supported File Types</h3>
<p><strong>pptx</strong></p>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the direct URL to the PowerPoint file</li>
<li>PageCrawl detects the file type and shows the appropriate configuration</li>
<li>Choose your check frequency and notification preferences</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste the direct URL to a PowerPoint file to start monitoring it">
</div>
<h3>Password-Protected Files</h3>
<p>PowerPoint files behind login authentication are supported. Configure an <a href="/help/features/article/can-i-track-password-protected-websites">authentication setup</a> first, then select it when adding the file.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/track-changes-in-word-files">Word Documents</a> - Monitor Word document changes</li>
<li><a href="/help/file-tracking/article/track-changes-in-excel-files">Excel Spreadsheets</a> - Monitor Excel file changes</li>
<li><a href="/help/file-tracking/article/file-checksum-hash-monitoring">File Checksum Monitoring</a> - Detect any file modification</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Changes in Word Documents (doc, docx, odt)]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/track-changes-in-word-files" />
            <id>https://pagecrawl.io/28</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Changes in Word Documents (doc, docx, odt)</h1>
<p>PageCrawl can monitor Word documents hosted online and notify you when their text content changes. It extracts text from the document, compares it against the previous version, and shows exactly what was added, removed, or modified.</p>
<h3>Supported File Types</h3>
<p><strong>doc</strong>, <strong>docx</strong>, <strong>odt</strong></p>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the direct URL to the Word document</li>
<li>PageCrawl detects the file type and shows the appropriate configuration</li>
<li>Choose your check frequency and notification preferences</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste the direct URL to a Word document to start monitoring it">
</div>
<h3>Password-Protected Files</h3>
<p>Word files behind login authentication are supported. Configure an <a href="/help/features/article/can-i-track-password-protected-websites">authentication setup</a> first, then select it when adding the file.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/can-pagecrawl-detect-changes-in-pdf">PDF Changes</a> - Monitor PDF file changes</li>
<li><a href="/help/file-tracking/article/track-changes-in-powerpoint-files">PowerPoint Files</a> - Monitor PowerPoint presentations</li>
<li><a href="/help/file-tracking/article/file-checksum-hash-monitoring">File Checksum Monitoring</a> - Detect any file modification</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Send Website Change Detection Notifications to Slack channel]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/send-slack-notification-when-changes-detected" />
            <id>https://pagecrawl.io/29</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Send Website Change Detection Notifications to Slack channel</h1>
<div class="kb-figure">
  <img src="/images/blog/slack-features.jpeg" alt="slack web change detection notifications">
</div>
<p>PageCrawl.io allows you to track changes in websites and get notified instantly via your preferred method. In this article we will discuss how you can setup PageCrawl to receive notifications in Slack. </p>
<h2>Prerequisites</h2>
<ul>
<li>You need a PageCrawl.io account. If you don't already have one, <a href="https://pagecrawl.io/app/auth/register">go here to register an account</a> and setup pages you wish to track.</li>
<li>You need a Slack account.</li>
</ul>
<h2>Create Incoming Webhook Connector</h2>
<p>Follow the steps below to create a new Incoming Webhook connector</p>
<h3>1. Install "Incoming Webhooks" integration in your Slack workspace</h3>
<p>Visit <a href="https://slack.com/apps/A0F7XDUAZ-incoming-webhooks">https://slack.com/apps/A0F7XDUAZ-incoming-webhooks</a> to enable "Incoming WebHooks" for your workspace.</p>
<p>Please note, this is a legacy custom integration - an outdated way for teams to integrate with Slack. You may create <a href="https://api.slack.com/start">Slack app</a> instead, but the setup procedure of "Slack app" is significantly longer so we suggest using the legacy integration. </p>
<h3>2. Click "Add to Slack" to continue</h3>
<p>Simply click "Add to Slack" button. You may be prompted to sign in to your Slack account.</p>
<div class="kb-figure">
  <img src="/images/blog/slack-incoming-webhook.png" alt="slack add incoming webhook">
</div>
<h3>3. Select channel or create a new channel.</h3>
<p>Here you will need to select a Slack channel where the messages from PageCrawl.io bot should be sent to and press "Add Incoming Webhook integration"</p>
<div class="kb-figure">
  <img src="/images/blog/slack-post-to-channel.png" alt="slack select channel for incoming webhook">
</div>
<h3>4. Copy the "URL".</h3>
<p>Finally you should receive URL address. Copy it and paste in the notification settings as indicated below.</p>
<div class="kb-figure">
  <img src="/images/blog/slack-final.png" alt="Slack copy incoming webhook url">
</div>
<h3>Set Webhook URL in PageCrawl.io</h3>
<p>Open <strong>Settings</strong> &gt; <strong>Integrations</strong>, click <strong>Setup</strong> on the Slack card, and paste the webhook URL. You can optionally set a custom sender name and icon.</p>
<div class="kb-figure">
  <img src="/images/knowledge/notif-slack-setup.png" alt="Connect Slack dialog in PageCrawl with the Webhook URL field and custom branding options">
</div>
<p>If you would like to receive notifications for all tracked pages, simply paste webhook URL in <a href="/app/settings/workspace/notifications">user notification preferences</a>.</p>
<p>If you only want a single page to be notified via Slack. Just set this Webhook URL for a specific page.</p>
<h3>Troubleshooting</h3>
<p><strong>What if I can't install the app?</strong>
You should ensure you have permissions from the Slack workspace owner.</p>
<p><strong>I didn't receive a notification</strong>
Please wait for page to change. We will only send a notification when we detect a change.</p>
<p><strong>I receive too many notifications? What can I do?</strong>
You may setup notification rules to be notified only when e.g. text disappears, number increases, etc.</p>
<h3>Other supported notification channels</h3>
<p>We do have more supported notification channels to suit everyone's preferences.</p>
<ul>
<li>
<p><a href="/help/integrations/article/track-website-changes-integrate-with-telegram-notifications">Be notified about website changes via Telegram</a></p>
</li>
<li>
<p><a href="/help/integrations/article/send-microsoft-teams-notification-when-changes-detected">Be notified about website changes via Microsoft Teams</a></p>
</li>
<li>
<p><a href="/help/integrations/article/track-website-changes-integrate-with-discord-notifications">Be notified about website changes via Discord</a></p>
</li>
<li>
<p>Be notified about website changes via Email</p>
</li>
<li>
<p>Be notified about website changes via Webhook</p>
</li>
<li>
<p>Be notified about website changes via Zapier</p>
</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Blocking Cookies and Ads in Your Monitored Pages]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/reduce-false-positives/article/blocking-cookies-and-ads-track-changes" />
            <id>https://pagecrawl.io/30</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Blocking Cookies and Ads in Your Monitored Pages</h1>
<div class="kb-figure">
  <img src="/images/knowledge/settings-actions.png" alt="Actions section of the page editor with the Block cookie banners and ads action added">
</div>
<p>Monitoring tracked pages can sometimes result in frequent false-positive notifications, often stemming from pesky cookie popups. To address this issue and enhance your monitoring experience, we provide the "Block cookie banners &amp; ads" action. This action effectively handles the majority of cookie windows and blocks ads, minimizing unnecessary notifications. Here are some considerations and alternatives to optimize your monitoring experience.</p>
<h3>The "Block Cookie Banners &amp; Ads" Action</h3>
<p>To mitigate false positives, we highly recommend implementing the "Block cookie banners &amp; ads" action on all tracked pages. This action has proven to be remarkably effective, successfully handling approximately 99% of cookie popups and preventing ad content from triggering notifications.</p>
<h3>Alternative approach</h3>
<p>In specific cases, if the tracked page is accessed from a location outside of Europe, cookie popups might not be displayed. As an alternative approach, you can opt to perform checks from a different country to avoid encountering cookie-related notifications.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Excluding Dates in the Monitored Pages]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/reduce-false-positives/article/excluding-dates" />
            <id>https://pagecrawl.io/31</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Excluding Dates in the Monitored Pages</h1>
<div class="kb-figure">
  <img src="/images/knowledge/action-types-dropdown.png" alt="Action type dropdown in the page editor showing the Remove dates option under Block and Hide">
</div>
<p>Frequently, you encounter text like "updated 1 month ago" or "last changed 1 hour ago" that continually updates on your monitored pages. While this information might seem informative, it often leads to false-positive notifications.</p>
<h3>The "Remove dates" action</h3>
<p>To address this issue and improve your monitoring experience, we recommend applying the "Remove Dates" action to your tracked page. This action will intelligently detect and replace all date-related text with a standardized [DATE REMOVED] tag.</p>
<h4>Supported Date Formats</h4>
<p>The "Remove Dates" action is designed to handle a wide range of common date formats, including:</p>
<ul>
<li>30 min ago</li>
<li>1 day ago</li>
<li>19 August 2022</li>
<li>01-01-2020</li>
<li>Sat Aug 17 2020 18:40:39 GMT+0000 (GMT)</li>
<li>and many more...</li>
</ul>
<h3>The "Ignore numbers" filter</h3>
<p>Instead of replacing dates with [DATE REMOVED] placeholders you may completely ignore all changes in numbers by adding "Ignore numbers" filters to "Conditions/Filters" section. Only use this if you are not interested in numeric changes.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Excluding a Part of the Page from Triggering Notifications]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/reduce-false-positives/article/how-to-exclude-page-section" />
            <id>https://pagecrawl.io/32</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Excluding a Part of the Page from Triggering Notifications</h1>
<div class="kb-figure">
  <img src="/images/knowledge/action-remove-element.png" alt="Remove page element action in the page editor with a CSS or XPath selector field and visual selector button">
</div>
<p>In certain situations, you may wish to exclude or remove a specific section on the page to prevent (false positive) notifications, especially when the content changes frequently. For instance, you might want to exclude a sidebar containing new blog posts or a Twitter feed at the bottom of the page.</p>
<p>When your tracked element type is "Full page" you may choose to track <strong>Everything on the page</strong> or <strong>Content only</strong>. If you choose <strong>Content only</strong>, text in header, sidebar, footer will not be tracked.</p>
<p>If you would like more control on what is removed, we recommend using the "Remove page element" action to exclude sections that do not interest you. You can either utilize the visual selector to remove the area or add the selector manually. Below you will find a few suggested selectors you can use.</p>
<h3>Commonly Excluded Sections</h3>
<div class="kb-figure">
  <img src="/images/blog/remove-common.png" alt="Commonly Excluded Sections">
</div>
<p>Frequently, there are areas where tracking changes may not be of interest, including:</p>
<ul>
<li>Sidebars (commonly placed within <aside> HTML elements)</li>
<li>Footers (commonly placed within <footer> HTML elements)</li>
<li>Navigation menus (commonly placed within <nav> HTML elements)</li>
</ul>
<p>You can use the following selector (which you can paste into the "CSS/XPath selector") to exclude the mentioned elements: <code>nav,aside,footer,.footer,header</code></p>
<h3>The Selector Didn't Work?</h3>
<p>Unfortunately, not all websites adhere to the content sectioning guidelines. In such cases, you may need to use the visual selector to identify the area or manually input the selector.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Using Custom Proxies to Monitor Pages]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/custom-proxies" />
            <id>https://pagecrawl.io/33</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Using Custom Proxies to Monitor Pages</h1>
<p>PageCrawl provides built-in proxy locations and supports custom proxy servers for pages that require specific geographic access or have IP-based restrictions.</p>
<h3>Built-in Proxy Locations</h3>
<p>PageCrawl offers multiple proxy locations across North America, Europe, and the Middle East, plus a residential proxy option. Select a proxy location per page from the <strong>Location</strong> dropdown, or apply one to multiple pages via <a href="/help/features/article/bulk-edit-pages">Bulk Edit</a>. You can also choose <strong>Random</strong> to rotate between locations automatically.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-crawling-preferences.png" alt="Crawling Preferences with the Location dropdown for choosing a proxy location per page">
</div>
<h3>Proxy Pools (Bring Your Own Proxies)</h3>
<p>When the built-in locations are not enough, bring your own proxies as a <strong>Proxy Pool</strong>: a reusable, named set of proxy servers you manage in one place and reference from any page or template.</p>
<p><strong>1. Create a pool.</strong> Go to <strong>Settings → Proxy Pools → Add New</strong>, give it a name, and paste your proxy servers, one per line:</p>
<pre><code>host:port
username:password@host:port</code></pre>
<p><strong>2. Use it.</strong> Open the <strong>Location</strong> dropdown on a page (or a template, or via <a href="/help/features/article/bulk-edit-pages">Bulk Edit</a>) and pick your pool. PageCrawl rotates across the pool on each check and automatically rests proxies that are currently failing, so a few bad proxies do not take down your monitoring.</p>
<p><strong>3. Keep it healthy.</strong> Open a pool to see each proxy's recent success rate and status. You can <strong>disable</strong> or <strong>remove</strong> individual proxies that are failing; removed proxies are kept for reference so you have a record of what you took out. Add proxies at any time and every monitor using that pool picks them up immediately.</p>
<table>
<thead>
<tr>
<th>Apply to</th>
<th>How</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>A single page</strong></td>
<td>Edit the page → set <strong>Location</strong> to your pool</td>
</tr>
<tr>
<td><strong>Multiple pages</strong></td>
<td>Select pages → <a href="/help/features/article/bulk-edit-pages">Bulk Edit</a> → <strong>Location</strong> → your pool</td>
</tr>
<tr>
<td><strong>A template</strong></td>
<td>Set the template's <strong>Location</strong> to your pool; pages using the template inherit it</td>
</tr>
</tbody>
</table>
<p>If you set the same proxy list on several monitors, PageCrawl links them to a single shared pool automatically, so you maintain the list once.</p>
<h3>Automatic Engine Switching</h3>
<p>When a page is blocked (timeout, 403, or 401), PageCrawl automatically switches to <a href="/help/features/article/what-is-real-browser-page-monitoring">Stealth mode</a> in addition to the proxy configuration. This combination resolves most access issues.</p>
<h3>Premium Residential Proxies</h3>
<p>For pages that require residential proxies, PageCrawl offers <a href="/help/features/article/residential-proxies">Premium Residential Proxies</a> with pay-as-you-go bandwidth starting at $10/GB. Purchase bandwidth in your account settings and select "Premium Residential" as the proxy location on your monitors. See the <a href="/help/features/article/residential-proxies">residential proxies guide</a> for details on pricing, geo-targeting, and setup.</p>
<h3>Choosing a Proxy Provider</h3>
<p>Most pages work fine without any proxy configuration. You only need a custom proxy if a website is actively blocking bots or restricting access by geographic location. Start without a proxy, and only set one up if you are seeing access errors (403, bot protection blocks, empty pages).</p>
<p>If the built-in proxy locations are not enough for your needs, you can use a third-party proxy provider. Here is what to look for and some popular options.</p>
<p><strong>Understanding bandwidth usage:</strong></p>
<p>Each page check downloads the full page without caching, so bandwidth adds up quickly. An average web page uses 2-3 MB per check. Heavier pages (news sites, e-commerce, image-heavy pages) can use 5-10 MB or more. For example, monitoring 50 pages every 30 minutes at 3 MB each would use roughly 7 GB per day, or around 216 GB per month. Because of this, avoid proxy providers that charge per GB of traffic. Those plans are designed for one-off scraping, not ongoing monitoring.</p>
<p><strong>What to look for:</strong></p>
<ul>
<li><strong>Unlimited bandwidth</strong> - This is the most important factor. Look for plans priced per proxy/port or as a flat monthly rate, not per GB.</li>
<li><strong>Username/password authentication</strong> - PageCrawl connects to proxies dynamically, so IP-based allowlists will not work. Choose a provider that supports <code>username:password@host:port</code> authentication.</li>
<li><strong>Rotating IPs</strong> - Providers that rotate IPs automatically reduce the chance of being blocked over time.</li>
<li><strong>Geographic coverage</strong> - Pick a provider with servers in the regions your monitored pages target.</li>
<li><strong>HTTP/HTTPS support</strong> - PageCrawl requires standard HTTP proxies. SOCKS proxies are not supported.</li>
</ul>
<p><strong>Datacenter vs. residential proxies:</strong></p>
<p>Datacenter proxies with unlimited bandwidth are the most cost-effective option for monitoring. They work well for most websites. Residential proxies (using real ISP addresses) are only needed for sites with strict bot detection that blocks datacenter IPs. If you need residential proxies, look for providers that offer them with unlimited bandwidth or per-IP pricing rather than per-GB billing.</p>
<p><strong>Popular proxy providers that work with PageCrawl:</strong></p>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Type</th>
<th>Pricing Model</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://www.webshare.io">Webshare</a></td>
<td>Datacenter, Residential</td>
<td>Per proxy, unlimited bandwidth</td>
<td>Free tier available, good for testing. Paid datacenter plans include unlimited bandwidth.</td>
</tr>
<tr>
<td><a href="https://iproyal.com">IPRoyal</a></td>
<td>Datacenter, Static residential</td>
<td>Per proxy (datacenter)</td>
<td>Datacenter proxies with unlimited traffic. Static residential proxies available per IP.</td>
</tr>
<tr>
<td><a href="https://www.proxy-cheap.com">Proxy-Cheap</a></td>
<td>Datacenter, Static residential</td>
<td>Per proxy, unlimited bandwidth</td>
<td>Budget-friendly static residential and datacenter proxies with no traffic limits.</td>
</tr>
<tr>
<td><a href="https://www.proxyrack.com">ProxyRack</a></td>
<td>Datacenter, Residential</td>
<td>Flat monthly rate</td>
<td>Unlimited bandwidth on most plans. Rotating and geo-targeted options.</td>
</tr>
</tbody>
</table>
<p>These are independent providers and not affiliated with PageCrawl. Prices and features may change.</p>
<p><strong>Not every provider works for every website.</strong> A proxy that works perfectly for one site may get blocked on another. This depends on the website's bot detection, the proxy provider's IP reputation, and the type of proxies used. Always test a provider against your specific pages before committing to a long-term plan. Most providers offer short trial periods or small starter plans for this purpose.</p>
<p><strong>Country-specific access:</strong> Some websites restrict content to visitors from a specific country (geo-blocking). Government portals, local news sites, and region-locked services often require an IP address from that country to load correctly. If you are monitoring pages like these, make sure the proxy provider offers proxies in the required country. Check the provider's location list before purchasing, as coverage varies significantly between providers, especially for smaller countries.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Most providers give you a proxy endpoint in the <code>username:password@host:port</code> format. Paste it into a Proxy Pool under Settings → Proxy Pools. If the provider offers rotating proxies through a single gateway endpoint, you only need to add one line.
</div>
<h3>Avoiding Free Proxies</h3>
<p>Free proxy servers are unreliable, slow, and frequently stop working. They should not be used for monitoring pages where uptime matters. Use the built-in proxy locations, your own paid proxy service, or contact us for residential proxy options.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/what-is-real-browser-page-monitoring">Real Browser Mode</a> - Engine selection including Stealth mode</li>
<li><a href="/help/features/article/monitoring-pages-behind-cloudflare-bot-protection">Monitoring Pages Behind Bot Protection</a> - Handling bot-protected pages</li>
<li><a href="/help/features/article/bulk-edit-pages">Bulk Edit</a> - Apply proxy settings to multiple pages at once</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-19T09:27:55+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitoring Pages That Show CAPTCHA]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/troubleshooting/article/bypass-captcha-tracked-pages" />
            <id>https://pagecrawl.io/34</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitoring Pages That Show CAPTCHA</h1>
<p>Some websites use CAPTCHA challenges to block automated access. PageCrawl offers an optional integration with <a href="https://2captcha.com">2Captcha</a>, a third-party CAPTCHA-solving service, that you can connect to handle these challenges on your behalf.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Available on Enterprise and Ultimate plans.
</div>
<h3>How It Works</h3>
<ol>
<li>PageCrawl encounters a CAPTCHA when checking a page</li>
<li>The challenge is forwarded to your connected 2Captcha account</li>
<li>2Captcha returns a solution</li>
<li>PageCrawl submits the solution and accesses the page content</li>
<li>The page is checked for changes as normal</li>
</ol>
<h3>Setup</h3>
<ol>
<li>Create an account at <a href="https://2captcha.com">2captcha.com</a> and add funds</li>
<li>Copy your 2Captcha API key from the 2Captcha dashboard</li>
<li>In PageCrawl, go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Integrations</strong></li>
<li>Enter your 2Captcha API key and save</li>
<li>Pages that encounter CAPTCHAs will now be solved automatically</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/integ-2captcha-setup.png" alt="2Captcha Configuration dialog in PageCrawl with the 2Captcha API Key field">
</div>
<h3>Supported CAPTCHA Types</h3>
<p>2Captcha handles most common CAPTCHA types including reCAPTCHA v2, reCAPTCHA v3, hCaptcha, and image-based challenges.</p>
<h3>Cost</h3>
<p>CAPTCHA solving is billed by 2Captcha separately from your PageCrawl subscription. Typical costs are $1-3 per 1,000 CAPTCHAs solved. Check <a href="https://2captcha.com">2captcha.com/pricing</a> for current rates.</p>
<h3>Tips</h3>
<ul>
<li>Not all blocked pages use CAPTCHA. If you see 403 errors or bot protection challenges, try <a href="/help/troubleshooting/article/monitoring-pages-behind-cloudflare-bot-protection">Stealth mode</a> first</li>
<li>CAPTCHA solving adds a few seconds to each check while waiting for the solution</li>
<li>If a page always shows CAPTCHA, consider reducing check frequency to minimize costs</li>
</ul>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/troubleshooting/article/monitoring-pages-behind-cloudflare-bot-protection">Bot Protection</a> - Handle bot-protected pages</li>
<li><a href="/help/troubleshooting/article/common-issues-with-page-not-loading">Page Loading Issues</a> - Common loading problems and solutions</li>
<li><a href="/help/features/article/custom-proxies">Custom Proxies</a> - Use proxy servers to avoid blocks</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Common Problems and Solutions for Page Loading Issues]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/troubleshooting/article/common-issues-with-page-not-loading" />
            <id>https://pagecrawl.io/35</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Common Problems and Solutions for Page Loading Issues</h1>
<div class="kb-figure">
  <img src="/images/knowledge/action-types-dropdown.png" alt="Action type dropdown with Mark Check as Failed for handling pages that fail to load">
</div>
<p>There may be various reasons why a page fails to open. This guide describes the most common problems and suggests solutions to help you overcome these issues.</p>
<h3>Timeout</h3>
<p>A timeout occurs when the page takes too long to respond. This may be a temporary issue with the page, or the page may be loading very slowly. Timeout limits vary depending on your plan:</p>
<ul>
<li>Free plan: 45 seconds</li>
<li>Standard plan: 90 seconds</li>
<li>Enterprise plan: 180 seconds</li>
<li>Ultimate plan: 180 seconds</li>
</ul>
<p>To avoid timeouts please consider subscribing to a paid plan or upgrading your plan.</p>
<h3>Selector not found</h3>
<p>This error will be shown if the page has changed significantly and element with configured XPath/CSS selector could not be found. In this case, you should review the page and update selector if needed.</p>
<h3>Page blocked</h3>
<p>Some pages may use site protection features to block scrapers and website tracking tools like PageCrawl.io. Different pages may use different blocking mechanisms, but here are the most common ones:</p>
<ul>
<li>
<p><strong>Access Restricted to Specific Countries</strong> Page may be configured to only allow visitors from a specific country.</p>
<ul>
<li><strong>Solution</strong>: Specify a proxy location from a country that is not blocked. If you cannot find an available proxy, consider purchasing a proxy service for a specific country and <a href="/help/features/article/custom-proxies">configuring custom proxy in PageCrawl.io</a>.</li>
</ul>
</li>
<li>
<p><strong>Proxy Location blocked</strong> The website may block the IP address of the proxy server PageCrawl is using.</p>
<ul>
<li><strong>Solution</strong>: Use a residential proxy to avoid being blocked. PageCrawl has a built-in Residential Proxy option available on Enterprise and Ultimate plans. Alternatively, you can purchase a third-party proxy service and <a href="/help/features/article/custom-proxies">configure a custom proxy in PageCrawl.io</a>.</li>
</ul>
</li>
</ul>
<h3>401 or 403 Error</h3>
<p>Most often indicates that PageCrawl.io Bot was not allowed to access the website. Use "Residential proxy pool" to avoid being blocked. </p>
<h3>404 Page Not Found</h3>
<p>In most cases this error indicates that page is no longer available to view. You should check and update the page URL.</p>
<h3>500 Series error</h3>
<p>500, 502, 503, 504 indicates that website server is not responsive, overloaded, currently in maintenance or experiencing server issues. If such error occurs, our bots will retry page check later.</p>
<h3>Page Unreachable</h3>
<p>The page can't be opened. In most cases website is down or the website in only reachable from a specific country</p>
<h3>Site Protected with CAPTCHA</h3>
<p>Pages may use CAPTCHA to protect the website from bots. To bypass this, you can use a service like 2Captcha which will use human workers to solve the captcha for you. PageCrawl.io has an <a href="/help/integrations/article/bypass-captcha-tracked-pages">integration with 2Captcha</a> (you must be subscribed to an Enterprise or Ultimate plan) you can sign up for and configure the API token generated from 2Captcha.</p>
<h3>Unknown Error</h3>
<p>In some cases there could be an unexpected error that causes the PageCrawl bot to fail to check the page for changes. In case this error does not go away after a while, please contact support to notify us about the problem so we could prioritize the issue.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How to Easily Find XPath or CSS Selector in Major Browsers]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/find-xpath-css-selector-in-chrome" />
            <id>https://pagecrawl.io/36</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How to Easily Find XPath or CSS Selector in Major Browsers</h1>
<div class="kb-figure">
  <img src="/images/blog/simple-create-specific.png" alt="Quick Setup showing Specific Area monitoring with selector input">
</div>
<p>If you encounter a problem with PageCrawl's visual selector and are unable to open the page you are trying to access, there is another option you can try. You can manually copy the selector by opening the desired page in your preferred web browser. This manual method may be more time-consuming, but it can provide a reliable solution if the visual selector is not functioning properly. Additionally, by manually copying the selector, you can have greater control over the elements on the page and the data you want to extract.</p>
<p>This guide will show you how to do it quickly and easily for Chrome, Firefox and Safari browsers. </p>
<h3>XPath vs CSS Selector: Which One to Choose for Tracking?</h3>
<p>When it comes to web scraping, finding the right element on a webpage can be a challenge. This is where expression languages like XPath and CSS Selector come in handy. These two powerful tools help you locate elements on a webpage, and choosing between them can be difficult.</p>
<h3>Understanding XPath and CSS Selector</h3>
<p>CSS Selectors are favored by many web developers as they are easy to learn if you already know CSS syntax. On the other hand, XPath Selectors offer greater power and flexibility, such as the ability to find elements that contain specific text. However, the learning curve for XPath can be steeper.</p>
<p>For those just starting out, CSS Selectors are the recommended choice due to their simplicity and versatility. Most advanced selectors can be written in CSS, making it a good option for web scraping beginners.</p>
<h3>Relative vs Absolute Selector</h3>
<p>When it comes to CSS and XPath Selectors, there are two ways to generate them: relative and absolute.</p>
<p><strong>Relative selectors are preferred in most cases, as they are less prone to break.</strong>  </p>
<p>Absolute selectors, on the other hand, are useful when tracking a large number of pages, and you are only interested in specific elements. However, with even a slight change in page layout, the selector will break. If an element is added or removed from a page, the absolute XPath will need to be updated to continue tracking the page contents.</p>
<p>Relative selectors tend to be short, while absolute selectors can be lengthy. Here are some examples of relative and absolute selectors for both CSS and XPath:</p>
<ul>
<li><strong>Relative XPath selector:</strong> //h2[@id='get-started']//span[1]</li>
<li><strong>Relative CSS selector:</strong> h2[id='get-started'] span</li>
<li><strong>Absolute XPath selector:</strong> //*[@id="root"]/section/section/main/div/main/div/div[5]/div/div/div/div/div[1]/div/table/tbody/tr[20]</li>
<li><strong>Absolute CSS selector:</strong> #root &gt; section &gt; section &gt; main &gt; div &gt; main &gt; div &gt; div:nth-child(6) &gt; div &gt; div &gt; div &gt; div &gt; div.ant-table-container &gt; div &gt; table &gt; tbody &gt; tr:nth-child(20)</li>
</ul>
<h3>Let PageCrawl pick the selector for you</h3>
<p>The easiest way is to not write a selector by hand at all. PageCrawl can generate a stable selector for you:</p>
<ul>
<li><strong>Visual selector in the page editor</strong> - when adding or editing a monitor, use the point-and-click visual selector to choose an element directly on a preview of the page. PageCrawl generates the selector for you.</li>
<li><strong><a href="/help/features/article/browser-extension-guide">PageCrawl browser extension</a></strong> - click the element you want to track right on the live page and send it straight into a new monitor, with a sensible selector already filled in.</li>
</ul>
<p>You can also ask an AI assistant like ChatGPT or Claude to write a CSS or XPath selector for a given element, then paste it in and use the <strong>Test</strong> button to confirm it.</p>
<h3>Generating Selectors by Inspecting the Page</h3>
<p>If you prefer to find the selector yourself, you can inspect an element in your browser's DevTools. In most cases, you will get an absolute selector, and if the page content changes, you will need to update the selector.</p>
<p>In conclusion, choosing between XPath and CSS Selectors for web scraping comes down to your personal preference and level of experience. Both offer powerful tools for locating elements on a webpage, and with a little practice, you can become an expert in no time!</p>
<h4>Steps to Find XPath or CSS Selector in Chrome Browser:</h4>
<ol>
<li>Right-click on the element on the web page you want to select.</li>
<li>Choose the "Inspect" option from the context menu.</li>
<li>The "Elements" tab in the DevTools window will open, displaying the HTML code for the page.</li>
<li>Right-click on the HTML code for the element you want to select and choose "Copy" from the context menu.</li>
<li>Choose "Copy XPath" or "Copy selector" to copy the XPath or CSS selector for that element.</li>
<li>If you selected "Copy full XPath", it will copy the absolute XPath (Check in section above: Relative vs Absolute Selector).</li>
<li>Paste the generated selector in PageCrawl.io Tracked Element field.</li>
</ol>
<h4>Steps to Find XPath or CSS Selector in Firefox Browser:</h4>
<ol>
<li>Right-click on the element on the web page you want to select.</li>
<li>Choose the "Inspect Element" option from the context menu.</li>
<li>The "Developer Tools" window will open, displaying the HTML code for the page.</li>
<li>Right-click on the HTML code for the element you want to select and choose "Copy XPath" or "Copy CSS Path" from the context menu.</li>
<li>Paste the generated selector in PageCrawl.io Tracked Element field.</li>
</ol>
<h4>Steps to Find XPath or CSS Selector in Safari Browser:</h4>
<ol>
<li>Enable the "Develop" menu in Safari by going to Safari &gt; Settings &gt; Advanced, and checking the "Show Develop menu in menu bar" option (called Preferences in older macOS versions).</li>
<li>Right-click on the element on the web page you want to select.</li>
<li>Choose the "Inspect Element" option from the context menu.</li>
<li>The "Web Inspector" will open, displaying the HTML code for the page.</li>
<li>Right-click on the HTML code for the element you want to select and choose "Copy XPath" or "Copy CSS Path" from the context menu.</li>
<li>Paste the generated selector in PageCrawl.io Tracked Element field.</li>
</ol>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Dealing with Website Language Changes When Monitoring Page for Updates]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/troubleshooting/article/monitored-page-language-keeps-changing" />
            <id>https://pagecrawl.io/37</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Dealing with Website Language Changes When Monitoring Page for Updates</h1>
<p>If you are reading this, you may have experienced the frustration of the language suddenly switching on your monitored page, causing false positive notifications. Unfortunately, the language behavior of a website is determined by the site developers, and there are several approaches they may use. Some websites base their language on the browser or system settings, which is the best option. Others guess the language based on the country information from the IP address, while others use a mixed approach. There are two approaches you can use to prevent the page language from changing.</p>
<h3>Set the browser language</h3>
<p>To prevent language switching from occurring when monitoring a website for changes, there are a few things you can do. One option is to set the browser language to a specific language, such as "Danish", in "Advanced Settings" by editing the tracked page configuration in PageCrawl. However, keep in mind that some bot detection services can detect this, so use this option only if absolutely necessary.</p>
<p>If you are using "Stealth Mode", be aware that setting the browser language may cause issues. Overriding the browser language can be inconsistent with what bot detection services expect, which may trigger blocks.</p>
<h3>Use fixed IP address</h3>
<p>Another option is to access the website from a fixed IP address by setting Proxy Location to "Fixed IP". This ensures that the same IP is used to check for changes on the page. However, if the proxy location gets blocked, PageCrawl may not be able to bypass the blocks and displays a crawl error.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-crawling-preferences.png" alt="Crawling Preferences with the Location dropdown where a fixed proxy location can be selected">
</div>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitoring SEO Tags for Changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/tracking-seo-keywords-for-each-website-page" />
            <id>https://pagecrawl.io/38</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitoring SEO Tags for Changes</h1>
<div class="kb-figure">
  <img src="/images/knowledge/settings-tracked-elements.png" alt="Tracked Elements section where the SEO Tags tracking type is selected from the TYPE dropdown">
</div>
<p>Optimizing your website for search engines requires effective monitoring of SEO tags. PageCrawl makes it easy to track changes to title tags, meta descriptions, canonical URLs, robots directives, Open Graph tags, and headings.</p>
<h3>One-Click SEO Monitoring</h3>
<p>The fastest way to monitor SEO tags is with the built-in <strong>SEO Tags</strong> tracking mode:</p>
<ol>
<li>Log in to your PageCrawl account.</li>
<li>Click on <strong>Track New Page</strong> and enter the page URL.</li>
<li>Select <strong>SEO Tags</strong> as the tracking type.</li>
<li>Save and start monitoring.</li>
</ol>
<p>PageCrawl will automatically extract and track:</p>
<ul>
<li><strong>Title</strong> tag</li>
<li><strong>Meta description</strong></li>
<li><strong>Meta keywords</strong> (if present)</li>
<li><strong>Canonical URL</strong></li>
<li><strong>Robots</strong> directive</li>
<li><strong>H1</strong> heading</li>
<li><strong>Open Graph</strong> tags (og:title, og:description, og:image, og:url, og:type)</li>
</ul>
<p>When any of these fields change, you will see exactly which tag was modified and what the previous and new values are.</p>
<p>If you plan to monitor SEO tags for multiple pages, we recommend creating a <strong>Template</strong> with the SEO Tags tracking type. This lets you reuse the configuration across many pages without repeating setup.</p>
<h3>Advanced: Track Individual SEO Elements</h3>
<p>If you only need to monitor specific SEO tags (rather than all of them), you can create individual tracked elements using CSS or XPath selectors.</p>
<p>Use "Text" for the following tracked elements:</p>
<p><strong>SEO</strong></p>
<ul>
<li>Title: <code>title</code></li>
<li>Meta description: <code>/html/head/meta[@name="description"]/@content</code></li>
<li>Meta keywords: <code>/html/head/meta[@name="keywords"]/@content</code></li>
<li>Meta robots: <code>/html/head/meta[@name="robots"]/@content</code></li>
<li>Meta viewport: <code>/html/head/meta[@name="viewport"]/@content</code></li>
</ul>
<p><strong>Social Media Tags</strong></p>
<ul>
<li>og:title: <code>/html/head/meta[@property="og:title"]/@content</code></li>
<li>og:type: <code>/html/head/meta[@property="og:type"]/@content</code></li>
<li>og:image: <code>/html/head/meta[@property="og:image"]/@content</code></li>
<li>og:url: <code>/html/head/meta[@property="og:url"]/@content</code></li>
</ul>
<p>Use "Text (all matches)" for the following tracked elements:</p>
<p><strong>Headings</strong></p>
<ul>
<li>h1 tags: <code>h1</code></li>
<li>h2 tags: <code>h2</code></li>
<li>h3 tags: <code>h3</code></li>
<li>h4 tags: <code>h4</code></li>
<li>h5 tags: <code>h5</code></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Tracking (outgoing) links for changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/tracking-link-on-page" />
            <id>https://pagecrawl.io/39</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Tracking (outgoing) links for changes</h1>
<div class="kb-figure">
  <img src="/images/knowledge/settings-tracked-elements.png" alt="Tracked Elements section where you set a Text (all matches) element with an XPath selector to capture links, with a Test button">
</div>
<p>You may also wish to track outgoing links that exist on the page. We suggest using "Text (all matches, sorted)" to capture links to other pages. You may use these selectors to track:</p>
<h3>All links on the page</h3>
<p>Use the following selector to track all links on a web page:</p>
<ul>
<li><code>//a/@href</code></li>
</ul>
<h3>External Links</h3>
<p>To track only external links (those not belonging to a specific website), use this selector:</p>
<ul>
<li><code>//@href[not(contains(.,'not-this-website.com'))]</code>
<em>Note: You should substitute 'not-this-website.com' with the website URL.</em></li>
</ul>
<h3>Links with Specific Keywords in the URL</h3>
<p>If you want to track links containing specific keywords in their URLs, use this selector as an example:</p>
<ul>
<li><code>//a[contains(@href,'/download/oursoftware_')]/@href</code></li>
</ul>
<h3>PDF Links</h3>
<p>To specifically track links leading to PDF documents, you can use this selector:</p>
<ul>
<li><code>//a[contains(@href,'.pdf')]/@href</code></li>
</ul>
<h3>Links with Text as Anchor Text</h3>
<ul>
<li><code>//a[contains(text(),'Download')]/@href</code>
<em>Note: This selector is case-sensitive. e.g. if the text actually is "download", it will not be found</em></li>
</ul>
<h3>Links with Specific CSS Classes</h3>
<p>If you want to track links with specific CSS classes, use this selector:</p>
<ul>
<li><code>//a[contains(@class,'your-class-name')]/@href</code>
<em>Note: You should substitute 'your-class-name' with the class.</em></li>
</ul>
<h3>Links with Specific Attributes</h3>
<p>To track links with specific attributes (other than href), use this selector and replace "attribute-name" with the name of the attribute you're interested in:</p>
<ul>
<li><code>//a[@attribute-name='attribute-value']/@href</code>
<em>Note: You should substitute 'attribute-name' and 'attribute-value' with the relevant attribute values.</em></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitoring Pages Behind Bot Protection]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/troubleshooting/article/monitoring-pages-behind-cloudflare-bot-protection" />
            <id>https://pagecrawl.io/40</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitoring Pages Behind Bot Protection</h1>
<div class="kb-figure">
  <img src="/images/knowledge/settings-power-user.png" alt="Engine selection in Power User settings, including Stealth mode for bot-protected sites">
</div>
<p>Over 30% of websites now use bot protection services like Cloudflare, Akamai, and similar tools that block automated access. This means your monitored pages can stop returning data without warning.</p>
<p>PageCrawl provides multiple layers of protection to keep your monitors working, most of which happen automatically.</p>
<h3>What Happens Automatically</h3>
<p>PageCrawl handles most bot protection automatically. When a check fails, PageCrawl detects the block and adjusts its approach on the next attempt, including automatic retries.</p>
<p>For most pages, you do not need to configure anything. The steps below are only needed if automatic handling does not resolve the issue.</p>
<h3>How Do I Know If My Page Is Blocked?</h3>
<p>PageCrawl will show a warning on the page if it detects a block. You may also notice that the captured content is empty, shows an error code (403, 401), or looks different from what you see when you visit the page yourself.</p>
<h3>Troubleshooting Guide</h3>
<p>Note: The settings below require <strong>Advanced</strong> mode. To enable it, click <strong>Edit</strong> on any page and toggle <strong>Advanced</strong> at the bottom of the form.</p>
<p>Follow these steps in order. After each step, wait for the check to complete before moving on.</p>
<h4>Step 1: Enable Stealth Mode</h4>
<p>This is the first thing to try and resolves most blocking issues.</p>
<ol>
<li>Open the blocked page in PageCrawl</li>
<li>Click <strong>Edit</strong></li>
<li>Scroll down and enable <strong>Advanced</strong> mode</li>
<li>Change <strong>Engine</strong> from "Default" to <strong>Stealth</strong></li>
<li>Click <strong>Save</strong> - a check will trigger automatically</li>
<li>Wait for the check to complete and review the result</li>
</ol>
<p>If the content now loads correctly, you are done. Stealth mode will be used for all future checks on this page.</p>
<h4>Step 2: Change Proxy Location</h4>
<p>If Stealth mode alone does not work, the site may be blocking the specific IP address or region.</p>
<ol>
<li>Open the page and click <strong>Edit</strong></li>
<li>Under <strong>Proxy Location</strong>, select <strong>Random</strong></li>
<li>Click <strong>Save</strong> - a check will trigger automatically</li>
</ol>
<p>Random proxy rotation means each check comes from a different IP address, making IP-based blocking ineffective.</p>
<p>You can also try specific locations (London, New York, San Francisco, Toronto, Frankfurt) if you know the site serves content differently by region.</p>
<h4>Step 3: Use Residential Proxies</h4>
<p>For sites with the strictest protections, residential proxies are the most effective option. These route requests through real consumer internet connections, making them virtually indistinguishable from regular visitors.</p>
<ol>
<li>Open the page and click <strong>Edit</strong></li>
<li>Under <strong>Proxy Location</strong>, select <strong>Residential</strong></li>
<li>Select a <strong>country</strong> for the residential proxy</li>
<li>Click <strong>Save</strong> - a check will trigger automatically</li>
</ol>
<p>Residential proxy traffic is available as an add-on. You can <a href="/residential-proxies">purchase residential proxy traffic</a> directly from your PageCrawl account.</p>
<p>Note: Residential proxies consume traffic from your purchased balance. Each check uses a small amount of traffic depending on the page size.</p>
<h4>Step 4: Use Your Own Proxies</h4>
<p>If none of the built-in options work, bring your own proxies as a Proxy Pool.</p>
<ol>
<li>Go to <strong>Settings → Proxy Pools → Add New</strong> and paste your proxy servers (format: <code>user:password@host:port</code>, one per line)</li>
<li>Open the page, click <strong>Edit</strong>, and set the <strong>Location</strong> to your pool</li>
<li>Click <strong>Save</strong> and trigger a manual check</li>
</ol>
<p>This is useful when you need a proxy from a specific country or provider, or when you already have a proxy subscription. See <a href="/help/features/article/custom-proxies">Custom Proxies</a> for more details.</p>
<h3>Quick Reference</h3>
<table>
<thead>
<tr>
<th>Solution</th>
<th>How to Enable</th>
<th>When to Use</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Stealth mode</strong></td>
<td>Edit &gt; Advanced &gt; Engine: Stealth</td>
<td>First thing to try for any blocked page</td>
</tr>
<tr>
<td><strong>Proxy rotation</strong></td>
<td>Edit &gt; Proxy: Random</td>
<td>When a specific IP is blocked</td>
</tr>
<tr>
<td><strong>Residential proxy</strong></td>
<td>Edit &gt; Proxy: Residential</td>
<td>For the strictest access controls</td>
</tr>
<tr>
<td><strong>Your own proxies</strong></td>
<td>Settings &gt; Proxy Pools, then set the page Location to your pool</td>
<td>When you need a specific provider or location</td>
</tr>
</tbody>
</table>
<h3>Still Blocked?</h3>
<p>If you have tried all the steps above and the page is still not loading:</p>
<ul>
<li><strong>Double-check the URL</strong> - Make sure the URL is correct and the page is publicly accessible. Try opening it in a private/incognito browser window to confirm.</li>
<li><a href="/residential-proxies">Purchase residential proxy traffic</a> directly from PageCrawl if you have not already. This is the most effective solution for heavily protected sites.</li>
<li>Try a <a href="/help/features/article/custom-proxies">custom proxy</a> from a third-party provider if you need a specific geographic location or a different proxy type.</li>
<li><strong>Contact support</strong> - Email <a href="mailto:support@pagecrawl.io">support@pagecrawl.io</a> with the page URL and a description of what you see. We can review the specific page and suggest the best configuration.</li>
</ul>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/what-is-real-browser-page-monitoring">Real Browser Mode</a> - Engine selection including Stealth mode</li>
<li><a href="/help/features/article/custom-proxies">Custom Proxies</a> - Configure proxy servers</li>
<li><a href="/residential-proxies">Residential Proxies</a> - Purchase residential proxy traffic</li>
<li><a href="/help/troubleshooting/article/common-issues-with-page-not-loading">Page Loading Issues</a> - Other common loading problems</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-19T09:27:55+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Changes in XML Files]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/tracking-changes-in-xml-files" />
            <id>https://pagecrawl.io/41</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Changes in XML Files</h1>
<pre><code class="language-xml">&lt;?xml version="1.0"?&gt;
&lt;catalog&gt;
    &lt;book id="bk101"&gt;
        &lt;author&gt;Gambardella, Matthew&lt;/author&gt;
        &lt;title&gt;XML Developer's Guide&lt;/title&gt;
        &lt;genre&gt;Computer&lt;/genre&gt;
        &lt;price&gt;44.95&lt;/price&gt;
        &lt;publish_date&gt;2000-10-01&lt;/publish_date&gt;
        &lt;description&gt;An in-depth look at creating applications with XML.&lt;/description&gt;
    &lt;/book&gt;
&lt;/catalog&gt;</code></pre>
<p><a href="https://pagecrawl.io/">pagecrawl.io</a> offers an efficient way to monitor and track changes in XML files. Instead of sifting through the whole XML file for changes, which can be overwhelming due to frequent updates, you can focus on specific things that matter. This helps you avoid getting flooded with unnecessary alerts for minor changes like 'updated at' dates.</p>
<p>This guide will walk you through the process of setting up and utilizing this feature to simplify your tracking experience.</p>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste the URL of an XML file to start monitoring it">
</div>
<p>To reduce the number of false positive you may want to monitor a specific attribute (or multiple attributes), whether it was added, removed or changed.</p>
<h3>Step 1: Getting Started</h3>
<p>To begin tracking changes in XML files, follow these steps:</p>
<ol>
<li>
<p>Access PageCrawl: Log in to your PageCrawl account or sign up if you're new to the platform.</p>
</li>
<li>
<p>Create a Monitored Page: Once logged in, navigate to the dashboard and click on the "Track New Page" button. This will initiate the setup process for monitoring pages for changes.</p>
</li>
</ol>
<h3>Step 2: Choosing Attributes to Track</h3>
<p>Instead of monitoring the entire XML file, you can narrow down your focus to specific attributes that are relevant to you. For example, you might want to track changes in book names within an XML catalog.</p>
<h4>Example XML File</h4>
<p>Consider the <a href="/downloads/books.xml">following example xml file</a> structure:</p>
<pre><code class="language-xml">&lt;?xml version="1.0"?&gt;
&lt;catalog&gt;
    &lt;book id="bk101"&gt;
        &lt;title&gt;XML Developer's Guide&lt;/title&gt;
        &lt;!-- Other book details... --&gt;
    &lt;/book&gt;
    &lt;book id="bk102"&gt;
        &lt;title&gt;Dummy XML Developer's Guide&lt;/title&gt;
        &lt;!-- Other book details... --&gt;
    &lt;/book&gt;
    &lt;!-- More book entries... --&gt;
&lt;/catalog&gt;</code></pre>
<h3>Step 3: Configuring Tracking Elements</h3>
<div class="kb-figure">
  <img src="/images/blog/xml-monitoring.png" alt="xml file monitoring">
</div>
<p>Follow these steps to configure tracking elements for your XML file:</p>
<ol>
<li>
<p>Select Tracked Element: Within the PageCrawl setup interface, choose the "Text (all matches)" as tracking element type.</p>
</li>
<li>
<p>Specify Element to Track: In this step, you'll specify the exact element within the XML that you want to track. For instance, if you're interested in changes to book titles, you'll set the element as <code>title</code>.</p>
</li>
</ol>
<p>In this case, by focusing on the <code>title</code> element, you'll receive notifications only when book titles change, new is added or removed, filtering out less significant updates.</p>
<p><em>If you would like to also keep the full history of what has changed in the XML document but only be notified when a specific attribute changes, you can also add "Full Page" as the Tracked Element and then add a condition to be notified when the monitored attribute changes.</em></p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Daily, Weekly or Monthly Change Monitoring Reports]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/monitoring-reports-tracked-pages" />
            <id>https://pagecrawl.io/42</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Daily, Weekly or Monthly Change Monitoring Reports</h1>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> This feature is available on paid plans only.
</div>
<div class="kb-figure">
  <img src="/images/knowledge/reports-overview.png" alt="Scheduled Summary Reports page with the Add Report button and a gallery of report templates for competitors, pricing, compliance, product launches, and more">
</div>
<p>With <a href="https://pagecrawl.io">pagecrawl.io</a>, you can group your monitors into scheduled briefings that compile every detected change into a single digest, delivered automatically on the cadence each audience expects. Reports turn raw change notifications into something stakeholders actually open and read: a clean, AI-summarized email with the most important items at the top.</p>
<p>Instant alerts on every change flood inboxes until people mute the channel. Scheduled reports solve that without losing the safety net for genuinely urgent items, which still escalate immediately to your channel of choice.</p>
<h3>Why Scheduled Reports</h3>
<p>A single workspace often serves several audiences with very different appetites for detail:</p>
<ul>
<li><strong>Marketing</strong> wants Monday morning competitor intel.</li>
<li><strong>Legal</strong> wants a monthly compliance roundup.</li>
<li><strong>Sales</strong> wants daily price movements across competitors.</li>
<li><strong>Product</strong> wants weekly competitor product launches.</li>
<li><strong>Your team</strong> wants to be paged the moment something critical lands.</li>
</ul>
<p>Reports let you serve all of these from the same monitor set. Group monitors by tag, folder, or domain, then deliver a tailored briefing to each audience on their preferred schedule. High-priority changes still escalate instantly; everything else lands in the next digest.</p>
<h3>When to Use Reports vs Instant Notifications</h3>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Recommended</th>
</tr>
</thead>
<tbody>
<tr>
<td>Monitoring a handful of critical pages</td>
<td>Instant notifications</td>
</tr>
<tr>
<td>Tracking 50+ competitor pages for pricing</td>
<td>Scheduled report (daily or weekly)</td>
</tr>
<tr>
<td>Legal/compliance pages that rarely change</td>
<td>Scheduled report (weekly or monthly)</td>
</tr>
<tr>
<td>Stock availability that needs immediate action</td>
<td>Instant notifications with escalation</td>
</tr>
<tr>
<td>Executive stakeholder updates</td>
<td>Scheduled report with AI summary</td>
</tr>
<tr>
<td>Onboarding a non-PageCrawl recipient (CEO, board, client)</td>
<td>Scheduled report (public share link)</td>
</tr>
</tbody>
</table>
<p>You can mix both approaches. Monitors not assigned to any report continue to send instant notifications as usual. Monitors assigned to a report will only appear in digests, unless priority escalation is configured for urgent changes.</p>
<h3>How Reports Work</h3>
<p>Each report has four moving parts:</p>
<ol>
<li><strong>Scope.</strong> Which monitors are included. Match by tag, folder, domain, specific monitors, or all monitors in the workspace.</li>
<li><strong>Schedule.</strong> When the digest is generated and sent. Daily, weekdays only, weekly, monthly, or on-demand.</li>
<li><strong>Recipients.</strong> Who gets the email or notification. Workspace members, additional Cc emails, and channel webhooks.</li>
<li><strong>Content.</strong> What goes in the digest: AI summary style, importance threshold, failing pages section, escalation rules, attachments.</li>
</ol>
<p>Each generated digest is stored as a record in the workspace and gets a unique public share link, so anyone with the URL can view it in their browser without a PageCrawl account.</p>
<div class="kb-figure">
  <img src="/images/knowledge/reports-config.png" alt="New report configuration with a name, scope options (All monitors, By tag, By folder, By website, Specific monitors), and the send schedule and time">
</div>
<h3>Setting Up Your First Report</h3>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Alerts &amp; Reports</strong>.</li>
<li>Select the <strong>Scheduled Summary Reports</strong> tab.</li>
<li>Click <strong>Add Report</strong> and give it a name, color, and (optionally) an icon.</li>
<li>Pick the <strong>Scope</strong>: choose tag, folder, domain, specific monitors, or all monitors.</li>
<li>Pick the <strong>Schedule</strong> and delivery hour.</li>
<li>Add <strong>Recipients</strong>: workspace members and additional Cc emails. You can also wire in Slack, Teams, Discord, Telegram, or a custom webhook.</li>
<li>Configure <strong>Content</strong>: AI summary style, importance threshold, failing pages, attachments.</li>
<li>Save. Use <strong>Generate now</strong> to preview the next digest immediately.</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-scheduled-reports.png" alt="Scheduled Summary Reports page with the Add Report button and prebuilt report templates">
</div>
<p>For step-by-step instructions on each option, see the <a href="/help/notifications/article/scheduled-reports">Scheduled Reports setup guide</a>.</p>
<h3>Choosing What to Include (Scope)</h3>
<p>The scope determines which monitors feed into the digest. Five options are available:</p>
<ul>
<li><strong>All monitors</strong>: every monitor in the workspace. Useful for a single executive summary.</li>
<li><strong>By tag</strong>: monitors carrying a specific tag, e.g. <code>#competitors</code>, <code>#pricing</code>, <code>#legal</code>. Easiest way to slice cross-cutting topics.</li>
<li><strong>By folder</strong>: monitors inside a folder (and its sub-folders). Best when monitors are already organized hierarchically.</li>
<li><strong>By domain</strong>: monitors whose tracked URL matches one or more domains. Useful when you want a per-vendor view.</li>
<li><strong>Specific monitors</strong>: hand-picked list. For very small or very high-stakes reports.</li>
</ul>
<p>Tag and folder scopes are dynamic: any monitor that picks up the tag or moves into the folder later will start appearing in the next digest, with no report change required.</p>
<h3>Available Schedule Options</h3>
<table>
<thead>
<tr>
<th>Schedule</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Daily</strong></td>
<td>Receive a digest every day at your chosen hour</td>
</tr>
<tr>
<td><strong>Weekdays only</strong></td>
<td>Monday through Friday</td>
</tr>
<tr>
<td><strong>Weekly</strong></td>
<td>On a specific day of the week</td>
</tr>
<tr>
<td><strong>Monthly</strong></td>
<td>On a specific day of the month</td>
</tr>
<tr>
<td><strong>On-demand only</strong></td>
<td>Only generated when you manually click "Generate now"</td>
</tr>
</tbody>
</table>
<p>All times are based on your workspace timezone.</p>
<h3>Delivery Channels</h3>
<p>A single report can ship to multiple channels at once. Each channel can override the workspace defaults if you want this report to use a different webhook or email list.</p>
<ul>
<li><strong>Email</strong>: primary recipients, plus Cc and Bcc lists. Recipients do not need a PageCrawl account.</li>
<li><strong>Slack</strong>: posts a formatted message to a channel webhook.</li>
<li><strong>Microsoft Teams</strong>: posts to an incoming webhook URL.</li>
<li><strong>Discord</strong>: posts to a server webhook.</li>
<li><strong>Telegram</strong>: sends to a chat or group via bot token.</li>
<li><strong>Custom webhook</strong>: full JSON payload for your own automations or n8n / Zapier flows.</li>
</ul>
<p>Channels can be enabled or disabled per report. If a channel is disabled or its webhook is missing, that channel is skipped without affecting the others.</p>
<h3>AI Executive Summary</h3>
<p>Every digest can include an AI-written summary at the top of the report. The summary is generated each time the digest is built, using the latest changes plus your workspace-level focus prompt for context. You choose the style that fits the audience.</p>
<div class="kb-figure">
  <img src="/images/knowledge/reports-ai-summary.png" alt="AI executive summary settings with a style selector (Patterns), an example summary, and a 'What matters in this report' focus prompt">
</div>
<p><strong>Eight summary styles are available:</strong></p>
<ul>
<li><strong>Headline</strong>: one short sentence (max 20 words) that captures the single most important takeaway. Best for chat notifications and SMS-style alerts.</li>
<li><strong>Patterns</strong>: a concise paragraph (2-4 sentences) focused on cross-monitor trends, e.g. "three competitors raised prices". The default. Good for general updates.</li>
<li><strong>Action briefing</strong>: leads with what the reader should DO (review, respond, monitor, ignore). Best for sales and ops teams.</li>
<li><strong>Detailed executive summary</strong>: a thorough multi-paragraph breakdown with section headings, notable individual changes, affected areas, and likely causes. Best for weekly and monthly executive briefings.</li>
<li><strong>Bullets</strong>: a markdown bullet list (5-10 bullets) with bolded category labels. Best when scanning matters more than narrative flow.</li>
<li><strong>Changelog</strong>: a chronological log, newest first, formatted like a release-notes file. Best for product and engineering audiences.</li>
<li><strong>Risk assessment</strong>: groups changes into High / Medium / Low risk with explanations of why each matters. Best for legal, compliance, and security teams.</li>
<li><strong>Brief</strong>: a plain-text summary under 280 characters. Designed for chat notifications where formatting is stripped.</li>
</ul>
<p>You can change the style per report at any time. The next digest will reflect the new style.</p>
<h3>AI-Generated Dynamic Title</h3>
<p>In addition to the summary, every digest gets a short, content-aware title generated by AI. Examples:</p>
<ul>
<li>"5 price movements across tracked SKUs"</li>
<li>"2 new pages, 1 redesign, 1 announcement"</li>
<li>"4 policy updates this month"</li>
<li>"3 high-priority competitor changes this week"</li>
</ul>
<p>The title appears in three places:</p>
<ol>
<li>The <strong>email subject line</strong>, prefixed with the report name: <code>Competitor Intel: 3 high-priority competitor changes this week</code>.</li>
<li>The <strong>digest header</strong> as a sub-heading on the public web view and PDF.</li>
<li>The <strong>email body</strong> as a quick visual landmark above the AI summary.</li>
</ol>
<p>Stakeholders can tell at a glance whether this digest is worth opening, without having to scroll.</p>
<h3>Priority Escalation</h3>
<p>Reports batch changes by design, but some changes are urgent enough that waiting for the next digest is unacceptable, like a competitor dropping prices 30% or a page being taken down. Priority escalation handles this.</p>
<p>When you enable escalation on a report:</p>
<ul>
<li>Each change is scored by AI for importance.</li>
<li>If a change exceeds the threshold you set (e.g. score ≥ 90), it triggers an immediate notification through your chosen escalation channel: Slack, Teams, email, Discord, Telegram, or webhook.</li>
<li>The change still appears in the next digest, with a label indicating it was already escalated.</li>
</ul>
<p>This means stakeholders subscribed to the digest channel are not woken up at 3am, but the people who need to act on critical changes are paged the moment they happen.</p>
<h3>Importance Threshold and Content Filters</h3>
<p>Each report can filter what makes the cut:</p>
<ul>
<li><strong>Minimum importance threshold</strong>: drop everything scored below a certain priority. Useful for executive reports where only important changes belong.</li>
<li><strong>Collapse to latest</strong>: if a monitor changed multiple times during the period, show only the most recent change. Avoids cluttering the digest with intermediate states.</li>
<li><strong>Group by domain</strong>: present changes grouped by website host, instead of priority. Best for digests that watch many vendors.</li>
<li><strong>Workspace AI focus prompt</strong>: a free-text prompt the AI uses to bias importance scoring and summary generation toward your team's specific concerns ("focus on enterprise pricing changes", "ignore design system updates").</li>
</ul>
<h3>Pages Currently Failing</h3>
<p>Most digests focus on what changed. The "Pages Currently Failing" section flips that and shows what isn't being checked successfully right now: monitors stuck on timeouts, blocked by bot protection, returning server errors, or hitting SSL issues. 404s are not included, since broken pages have their own dedicated section with replacement suggestions.</p>
<p>The list shows the page name, the current status (timeout, blocked, server error, etc.), and how long ago the last attempt happened. It's a single block at the bottom of the digest, off by default for new reports but easy to enable per report.</p>
<p>This is the difference between thinking your monitor is healthy because no email arrived, and knowing it's silently broken.</p>
<h3>Comments and Inline Feedback</h3>
<p>Every change in the digest carries a thumbs-up / thumbs-down pair, plus a comment field. Recipients (whether they have a PageCrawl account or not) can:</p>
<ul>
<li>Mark a change as <strong>important</strong> (thumbs up). The AI uses this signal to bias future scoring on similar changes.</li>
<li>Mark a change as <strong>noise</strong> (thumbs down). Future similar changes get a lower priority and may be filtered out automatically.</li>
<li>Leave a <strong>comment</strong> on a specific change or on the digest as a whole. Other recipients see comments inline when they open the digest.</li>
</ul>
<p>For teams that want a structured workflow, enable <strong>review board actions</strong>. Each change shows a board selector (To Review / Flagged / Reviewed) so the team can triage changes directly inside the digest without opening the dashboard.</p>
<h3>Public Share Links</h3>
<p>Every digest gets a unique URL that anyone can open in their browser without signing in. The URL is included in every email, and you can copy it from the digest header for pasting into Slack, a Notion doc, or a board deck.</p>
<p>Sharing options:</p>
<ul>
<li><strong>Public</strong>: anyone with the link can view.</li>
<li><strong>Authenticated</strong>: only signed-in workspace members.</li>
</ul>
<p>Links can be rotated (invalidates the old URL) or revoked at any time. Default expiry is 30 days.</p>
<h3>Print, PDF, and Excel Export</h3>
<p>Each digest is laid out for print. Open it in your browser, hit print, and you get a clean, paginated PDF for board decks, audit archives, or quarterly reviews.</p>
<p>Beyond print:</p>
<ul>
<li><strong>Excel export</strong>: a spreadsheet with every change, score, monitor, URL, timestamp, and AI summary, plus an overview sheet. Available as an email attachment (toggle per report) or on-demand from the digest page.</li>
<li><strong>CSV</strong>: same data in a simpler format for analytics pipelines.</li>
<li><strong>PDF</strong>: a one-click download from the digest page.</li>
</ul>
<p>The Excel attachment is automatically skipped if the file would be too large for SMTP delivery, so the email itself never bounces.</p>
<h3>Managing Recipients</h3>
<p>Recipients live on the report, not the workspace. You can mix:</p>
<ul>
<li><strong>Workspace members</strong>: picked from a dropdown of users in the workspace.</li>
<li><strong>Additional Cc Emails</strong>: any email address, no PageCrawl account required. The address must be verified once before it can receive reports.</li>
<li><strong>Channel routes</strong>: Slack channels, Teams channels, Discord servers, Telegram chats, custom webhooks.</li>
</ul>
<p>Each recipient slot can be set as <strong>To</strong>, <strong>Cc</strong>, or <strong>Bcc</strong>. Use Bcc when you want to send to a long list without exposing the recipient list (e.g., a board distribution).</p>
<p>If a report is misconfigured (no recipients, broken webhook), the workspace owner is notified after the first failure. Subsequent failures don't re-notify, so a broken report can't spam the owner.</p>
<h3>Real-World Examples</h3>
<p><strong>Marketing, weekly competitor briefing.</strong> Scope: tag <code>#competitors</code>. Schedule: weekly, Monday 8am. Recipients: marketing team + CMO. Style: detailed executive summary. Escalation: enabled, threshold 90, channel Slack <code>#competitor-intel</code>.</p>
<p><strong>Sales, daily pricing watch.</strong> Scope: tag <code>#pricing</code>. Schedule: daily, weekdays only, 7am. Recipients: VP Sales, RevOps lead. Style: bullets. Importance threshold: 50. Group by domain: on.</p>
<p><strong>Legal, monthly compliance roundup.</strong> Scope: folder <code>/vendor-legal</code>. Schedule: monthly, 1st of the month, 9am. Recipients: General Counsel, compliance@. Style: risk assessment. Excel attachment: on.</p>
<p><strong>Product, weekly launch radar.</strong> Scope: domains <code>competitor1.com, competitor2.com, competitor3.com</code>. Schedule: weekly, Friday 4pm. Recipients: product team + CEO. Style: action briefing. Priority escalation: on, threshold 80, Slack <code>#product-radar</code>.</p>
<p><strong>Executive, monthly board pack.</strong> Scope: all monitors. Schedule: monthly, last Friday. Recipients: board distribution (Bcc). Style: detailed executive summary. Failing pages: on. Public share link: rotated each month.</p>
<h3>Best Practices</h3>
<ul>
<li><strong>Start with one report per audience, not per monitor set.</strong> Reports are cheap to add. Adding more later is easier than collapsing too-granular ones.</li>
<li><strong>Use tags to slice across folders.</strong> Tags don't conflict with your folder hierarchy and can express cross-cutting concerns (<code>#enterprise-watch</code>, <code>#regulatory</code>).</li>
<li><strong>Match the AI summary style to the audience.</strong> Headlines for execs, bullets for ops, risk assessment for legal, action briefing for sales.</li>
<li><strong>Always enable priority escalation on the operational reports.</strong> Even daily digests miss things that need a same-hour response.</li>
<li><strong>Use Cc for accountability, Bcc for distribution.</strong> Cc keeps everyone visible; Bcc protects long lists.</li>
<li><strong>Test with "Generate now" before going live.</strong> A dry run catches misconfigured webhooks and unexpected scope before stakeholders see it.</li>
<li><strong>Rotate the public share link if you suspect leakage.</strong> Old links stop working immediately.</li>
</ul>
<h3>How Reports Interact with Instant Notifications</h3>
<p>When a monitor is assigned to any scheduled report, its instant workspace-level notifications are bypassed. Changes are collected and delivered in the next digest instead.</p>
<p>Exceptions: escalation alerts still fire immediately, and public subscriber notifications are unaffected. If you delete or disable a report, the monitors it covered go back to receiving instant notifications automatically.</p>
<h3>Plan Limits</h3>
<p>Standard plans include up to 2 reports. Higher-tier plans include unlimited reports plus on-demand generation, the full eight summary styles, custom AI focus prompts per report, the failing-pages section, and Excel attachments.</p>
<p>For exact limits, see the <a href="https://pagecrawl.io/pricing">pricing page</a>.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/notifications/article/scheduled-reports">Scheduled Reports setup guide</a></li>
<li><a href="/help/features/article/ai-powered-change-detection">AI-powered change detection</a></li>
<li><a href="/help/integrations/article/send-slack-notification-when-changes-detected">Send Slack notifications when changes detected</a></li>
<li><a href="/help/integrations/article/send-microsoft-teams-notification-when-changes-detected">Microsoft Teams notifications</a></li>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-discord-notifications">Discord notifications</a></li>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-telegram-notifications">Telegram notifications</a></li>
<li><a href="/help/features/article/api-webhooks-for-custom-integrations">API webhooks for custom integrations</a></li>
<li><a href="/help/features/article/review-board">Review board for changes</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Check Scheduling]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/page-check-schedule" />
            <id>https://pagecrawl.io/43</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Check Scheduling</h1>
<p>Control when PageCrawl runs checks on your monitored pages by setting active days, hours, and check frequency. This is configured per workspace.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Available on paid plans.
</div>
<h3>Check Frequency</h3>
<p>Set how often each page is checked in the <strong>Check Frequency</strong> section of the page editor.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-check-frequency.png" alt="Check Frequency selector in the page editor with the daily interval dropdown">
</div>
<p>Available frequencies depend on your plan:</p>
<table>
<thead>
<tr>
<th>Plan</th>
<th>Minimum Interval</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Free</strong></td>
<td>Every hour</td>
</tr>
<tr>
<td><strong>Standard</strong></td>
<td>Every 15 minutes</td>
</tr>
<tr>
<td><strong>Enterprise</strong></td>
<td>Every 5 minutes</td>
</tr>
<tr>
<td><strong>Ultimate</strong></td>
<td>Every 2 minutes</td>
</tr>
</tbody>
</table>
<p>Full frequency options: every 2 min, 3 min, 5 min, 15 min, 30 min, 45 min, hourly, every 2 hours, 3 hours, 6 hours, twice daily, daily, every 2 days, every 3 days, weekly, every 2 weeks, and monthly.</p>
<h3>Workspace Schedule</h3>
<p>Limit checks to specific days and times for an entire workspace:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Scheduling</strong></li>
<li>Select which days of the week to run checks (Monday through Sunday)</li>
<li>Set the active hours (start and end time)</li>
<li>Hours are automatically converted to UTC based on your workspace timezone</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-schedule.png" alt="Workspace scheduling configuration with quick presets, day selection, an active-hours slider, and a schedule summary">
</div>
<p>When outside the scheduled hours or on inactive days, PageCrawl pauses checks for all pages in the workspace. Checks resume automatically when the next active period begins.</p>
<h3>Email Digest</h3>
<p>Instead of receiving individual notifications for each change, you can configure a daily email digest:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Notifications</strong></li>
<li>Enable the daily digest</li>
<li>Choose the day and time for delivery</li>
</ol>
<p>The digest summarizes all changes detected since the last digest was sent.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/bulk-edit-pages">Bulk Edit</a> - Change frequency and schedule settings across multiple pages</li>
<li><a href="/help/subscription/article/is-there-limit-of-checks-in-standard-plan">Check Limits</a> - Understand plan check quotas</li>
<li><a href="/help/features/article/advanced-configuration">Advanced Configuration</a> - Power User mode and per-page settings</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[PageCrawl.io + Zapier integration]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/pagecrawl-zapier-integration" />
            <id>https://pagecrawl.io/45</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>PageCrawl.io + Zapier integration</h1>
<div class="kb-figure">
  <img src="/images/knowledge/integrations-overview.png" alt="PageCrawl Integrations settings page showing Zapier and other available connections">
</div>
<p>The integration of PageCrawl.io with Zapier takes web monitoring to the next level by automating tasks and connecting your web monitoring data to countless other applications. In this guide, we'll explore how to set up this powerful integration and unlock a world of possibilities.</p>
<h3>Why Integrate PageCrawl.io with Zapier?</h3>
<p>Zapier is an automation platform that connects your favorite apps and services, allowing them to work together seamlessly. By integrating PageCrawl.io with Zapier, you can:</p>
<ol>
<li><strong>Automate Workflow</strong>: Create "Zaps" to automate tasks triggered by changes detected by PageCrawl.io.</li>
<li><strong>Extend Integration</strong>: Connect PageCrawl.io data to a vast array of other applications, enhancing its usefulness and allowing for more extensive analysis.</li>
<li><strong>Improve Efficiency</strong>: Eliminate manual data entry and automate processes, saving time and reducing the risk of errors.</li>
</ol>
<h3>Setting Up PageCrawl.io + Zapier Integration</h3>
<p>Here's a step-by-step guide to help you integrate PageCrawl.io with Zapier and enhance your web monitoring capabilities:</p>
<h4>Step 1: Sign in to PageCrawl.io</h4>
<p>If you're not already a PageCrawl.io user, sign up for an account.</p>
<h4>Step 2: Configure A Page To Monitor</h4>
<p>Set up the monitoring settings for the web page you're interested in tracking. Customize the elements you want to monitor and your notification preferences.</p>
<h4>Step 3: Enable Zapier Integration</h4>
<p>Visit the <a href="/app/settings/workspace/integrations">Integrations page</a> and click <strong>Setup</strong> on the Zapier integration. In the modal that opens, click <strong>Open on Zapier</strong> to set up the Zapier + PageCrawl.io integration.</p>
<div class="kb-figure">
  <img src="/images/knowledge/integ-zapier-setup.png" alt="Zapier integration dialog in PageCrawl with prebuilt workflow templates for change detection">
</div>
<h4>Step 4: Create a Zap in Zapier</h4>
<ol>
<li>Create a new Zap by clicking "Make a Zap.". </li>
<li>Search for "PageCrawl.io" and select it as your trigger app.</li>
<li>Choose the trigger event, such as "New Change Detected"</li>
</ol>
<h4>Step 5: Set Up Zap Actions</h4>
<p>Define the actions you want to take when a trigger event occurs. This can include sending notifications, updating other apps, or performing custom actions.</p>
<h4>Step 6: Activate Your Zap</h4>
<p>Once you're satisfied with the setup, activate your Zap, and it will start automating tasks based on changes detected by PageCrawl.io.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/integrations/article/pagecrawl-n8n-integration">n8n Integration</a> - Open-source workflow automation</li>
<li><a href="/help/integrations/article/webhook-integration">Webhook Integration</a> - Send change data to any endpoint</li>
<li><a href="/help/features/article/api-webhooks-for-custom-integrations">API &amp; Webhooks</a> - Programmatic access</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Store Website Changes on Google Sheets]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/sync--monitored-pages-to-google-sheets" />
            <id>https://pagecrawl.io/46</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Store Website Changes on Google Sheets</h1>
<div class="kb-figure">
  <img src="/images/blog/google-sheets-integration.png" alt="website change detections syncing to Google Sheets">
</div>
<p>Managing and tracking changes on websites is essential for various purposes, from monitoring competitors to ensuring your web services are running smoothly. PageCrawl.io simplifies this process by allowing you to effortlessly monitor web page changes and integrate the data directly into Google Sheets. In this guide, we'll explore how to set up this powerful integration to store website change history efficiently.</p>
<h3>Why Store Website Change History on Google Sheets?</h3>
<p>Google Sheets offers a versatile and collaborative platform for storing and analyzing data. By integrating PageCrawl.io with Google Sheets, you can keep all your web page change history in one place for easy access and analysis.</p>
<h3>Setting Up PageCrawl.io Integration with Google Sheets</h3>
<p>Here's a step-by-step guide to help you integrate PageCrawl.io with Google Sheets and start storing website change history effortlessly:</p>
<ol>
<li>Log in to your PageCrawl account.</li>
<li>Navigate to the Settings -&gt; <strong><a href="/app/settings/workspace/integrations">Integrations</a></strong> section.</li>
<li>Click <strong>Setup</strong> on the Google Sheets integration. In the modal that opens, select your Google account and enter a spreadsheet name, then click <strong>Connect</strong>.</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/integ-google-sheets-setup.png" alt="Connect Google Sheets dialog in PageCrawl with the Login with Google button">
</div>
<ol start="4">
<li>Once new changes are detected a new row will automatically be created in your Google Sheets document.</li>
</ol>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Common XPath Selectors to Use For Monitoring Websites Changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/common-xpath-selectors" />
            <id>https://pagecrawl.io/47</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Common XPath Selectors to Use For Monitoring Websites Changes</h1>
<p>XPath selectors are powerful tools that help you identify and extract specific elements on a web page. In this guide, we'll explore common XPath selectors that you can use when monitoring websites for changes to make your web monitoring efforts more effective.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> You don't have to write selectors by hand. Paste the page's HTML (or its URL) into an AI assistant like ChatGPT or Claude and ask it to "write a CSS or XPath selector for [the element you want]". It will usually produce a working selector you can drop straight into PageCrawl. Use the <strong>Test</strong> button to confirm it captures the right content, then tweak from there.
</div>
<div class="kb-figure">
  <img src="/images/knowledge/settings-tracked-elements.png" alt="Tracked Elements section where you enter a CSS or XPath selector and use Test to verify what it captures">
</div>
<h3>Why Not CSS Selector?</h3>
<p>CSS Selectors are favored by many web developers as they are easy to learn if you already know CSS syntax. On the other hand, XPath Selectors offer greater power and flexibility, such as the ability to find elements that contain specific text. However, the learning curve for XPath can be steeper. If you already know CSS - that's good, you should be able to use it for most use cases. If you don't know any, we recommend starting with XPath, since it can be more flexible.</p>
<h3>XPath Cheat sheet</h3>
<p>Here, you'll find a convenient 'cheat sheet' that comprehensively covers the most commonly used XPath selectors for your reference. We suggest taking a quick look through this list before proceeding to the <a href="#common-xpath-selectors-for-web-monitoring">Common XPath Selectors for Web Monitoring</a> section below.</p>
<h4>HTML Basics</h4>
<p>Before we start, you should familiarize yourself with some fundamental concepts to better understand the terminology and functionality. Here are a few key terms:</p>
<ol>
<li>
<p><strong>Attribute</strong>: An attribute provides additional information about an HTML element. It is always specified in the start tag of an element and usually comes in name/value pairs like <code>name="value"</code>. For example, in <code>&lt;a href="https://example.com"&gt;</code>, <code>href</code> is an attribute name and <code>https://example.com</code> is its value.</p>
</li>
<li>
<p><strong>Element</strong>: An HTML element is an individual component of an HTML document or web page. It is written with a start tag, with an optional end tag, and content in between. For example, <code>&lt;p&gt;This is a paragraph&lt;/p&gt;</code>; here, <code>&lt;p&gt;</code> is the start tag, <code>&lt;/p&gt;</code> is the end tag, and <code>This is a paragraph</code> is the content.</p>
</li>
<li>
<p><strong>ID</strong>: The <code>id</code> attribute is used to specify a unique id for an HTML element. You cannot have more than one element with the same id in an HTML document. It is used for identifying and targeting the element with CSS and JavaScript. For example, <code>&lt;div id="header"&gt;</code> defines a division with a unique id of <code>header</code>.</p>
</li>
<li>
<p><strong>Class</strong>: The <code>class</code> attribute is used for specifying a class name for an HTML element. Unlike the <code>id</code> attribute, the same class can be used on multiple elements. This is useful for applying the same styling or behavior to different elements. For example, <code>&lt;span class="highlight"&gt;</code> assigns the <code>highlight</code> class to a span element, which can be targeted with CSS or JavaScript.</p>
</li>
</ol>
<h4>How to test the selector?</h4>
<div class="kb-figure">
  <img src="/images/blog/console-xpath-test.png" alt="test xpath selector">
</div>
<p>You might wonder where you can try the selector before pasting it in PageCrawl.io You should open browser console and use following commands to test your selector.</p>
<p><strong>XPath</strong></p>
<pre><code>$x('//a')</code></pre>
<p><strong>CSS</strong></p>
<pre><code>document.querySelectorAll('a')</code></pre>
<h4>XPath Selector Basics</h4>
<ul>
<li><code>//</code>: Selects all matching elements anywhere in the document.</li>
<li><code>/</code>: Selects from the root element.</li>
<li><code>element</code>: Selects elements with the specified name.</li>
<li><code>[@attribute]</code>: Selects elements with the specified attribute.</li>
</ul>
<h4>Advanced XPath Selectors</h4>
<ul>
<li><code>[@attribute='value']</code>: Selects elements with a specific attribute value.</li>
<li><code>[@attribute!='value']</code>: Selects elements with an attribute value not equal to 'value'.</li>
<li><code>[starts-with(@attribute,'prefix')]</code>: Selects elements with an attribute starting with 'prefix'.</li>
<li><code>[substring(@attribute, string-length(@attribute) - string-length('suffix') + 1) = 'suffix']</code>: Selects elements with an attribute ending with 'suffix'. Note: there is no direct <code>ends-with()</code> function in XPath 1.0, so this workaround is needed.</li>
<li><code>[contains(@attribute,'substring')]</code>: Selects elements with an attribute containing 'substring'.</li>
<li><code>[@attribute1='value1' and @attribute2='value2']</code>: Selects elements that meet multiple attribute conditions.</li>
<li><code>[@attribute1='value1' or @attribute2='value2']</code>: Selects elements that meet at least one of the attribute conditions.</li>
<li><code>not(expression)</code>: Negates a condition.</li>
</ul>
<h4>Text and Content Selection</h4>
<ul>
<li><code>text()</code>: Selects the text content of an element.</li>
<li><code>contains(text(),'substring')</code>: Selects elements containing specific text.</li>
<li><code>starts-with(text(),'prefix')</code>: Selects elements with text starting with 'prefix'.</li>
<li><code>substring(text(), string-length(text()) - string-length('suffix') + 1) = 'suffix'</code>: Selects elements with text ending with 'suffix'. Note: <code>ends-with()</code> is an XPath 2.0 function and is NOT supported in browsers (which only support XPath 1.0). Use this <code>substring()</code> workaround instead.</li>
</ul>
<h4>Navigation and Hierarchy</h4>
<ul>
<li><code>/parent::element</code>: Selects the parent of the current element.</li>
<li><code>/child::element</code>: Selects the children of the current element.</li>
<li><code>/ancestor::element</code>: Selects ancestors of the current element.</li>
<li><code>/descendant::element</code>: Selects descendants of the current element.</li>
<li><code>[position()=1]</code>: Selects the first matching element.</li>
<li><code>[last()]</code>: Selects the last matching element.</li>
<li><code>[position()&gt;2]</code>: Selects elements after the first two.</li>
</ul>
<h4>Wildcards and Dynamic Selection</h4>
<ul>
<li><code>*</code>: Selects all elements.</li>
<li><code>element[*]</code>: Selects elements with at least one child element.</li>
<li><code>element[@*]</code>: Selects elements with at least one attribute.</li>
<li><code>element[contains(@attribute,'value')]</code>: Selects elements with attributes containing 'value'.</li>
</ul>
<h4>Functions</h4>
<ul>
<li><code>count(expression)</code>: Counts the number of matching elements.</li>
<li><code>sum(expression)</code>: Sums numeric values within matching elements.</li>
<li><code>concat(string1, string2)</code>: Combines two strings.</li>
<li><code>substring(string, start, length)</code>: Extracts a substring.</li>
<li><code>normalize-space(string)</code>: Removes leading/trailing spaces and collapses internal spaces.</li>
</ul>
<h3>Common XPath Selectors for Web Monitoring</h3>
<p>Here are some common XPath selectors that you can employ when monitoring websites for changes. Initially, basic XPath selectors will be covered, and we will then proceed to more advanced examples.</p>
<h4>1. Selecting Text</h4>
<p>XPath allows you to target specific text elements on a webpage, which is useful for tracking changes in content, headlines, or paragraphs. For example:</p>
<pre><code class="language-xpath">//h1       // Selects all h1 headers on the page.
//p        // Selects all paragraph elements.
//div[@class='content'] // Selects text within div elements with a specific class.</code></pre>
<h4>2. Tracking Links</h4>
<p>XPath selectors help you monitor links, whether you want to track all links on a page, external links, or links with specific text. For instance:</p>
<pre><code class="language-xpath">//a[@href]                  // Selects all links with an href attribute.
//@href[not(contains(.,'example.com'))] // Selects external links (replace 'example.com' with the target domain).
//a[contains(text(),'Download')]   // Selects links with specific anchor text, case-sensitive.</code></pre>
<p>To view more examples with links, visit <a href="/help/tutorials/article/tracking-link-on-page">Tracking links with text</a> tutorial.</p>
<h4>3. Checking Images</h4>
<p>To monitor images on a webpage, you can use XPath selectors to identify images by their source (src) attribute or alt text. For example:</p>
<pre><code class="language-xpath">//img               // Selects all image elements.
//img/@src          // Selects the src attribute of all images.
//img[contains(@alt,'logo')] // Selects images with specific alt text.</code></pre>
<h4>4. Handling Tables</h4>
<p>XPath selectors are particularly useful for extracting data from tables, which are commonly used on websites for displaying structured information. For example:</p>
<pre><code class="language-xpath">//table                // Selects all tables on the page.
//table//tr             // Selects all table rows.
//table//tr/td[2]       // Selects the second column (td) in all rows.</code></pre>
<h4>5. Monitoring Specific Elements</h4>
<p>You can target elements with specific attributes or attributes containing certain values using XPath selectors. For instance:</p>
<pre><code class="language-xpath">//*[@id='specificId'] // Selects elements with a specific ID attribute.
//*[@class='highlight'] // Selects elements with a specific class attribute.</code></pre>
<h4>6. Monitoring Elements where Text contains in Class or ID</h4>
<div class="kb-figure">
  <img src="/images/blog/randomized_classnames.png" alt="class name example">
</div>
<p>To monitor elements when their class or ID contains a part of text, you can use XPath selectors with the contains() function. For example:</p>
<pre><code class="language-xpath">//*[contains(@class, 'partial-text')] // Selects elements with a class containing 'partial-text'.
//*[contains(@id, 'partial-text')]    // Selects elements with an ID containing 'partial-text'.
//input[starts-with(@name, 'user_')] // Selects input elements with names starting with 'user_'.
//input[contains(@id, 'search')]  // Selects input elements with IDs containing 'search'.
//button[contains(@class, 'btn-')] // Selects buttons with class names containing 'btn-'.
</code></pre>
<p><strong>This XPath selector is particularly valuable, especially when dealing with CSS classes that include unpredictable or random text fragments.</strong></p>
<p>For instance, suppose you want to extract the text 'Quality Choice' from an image, as shown in the example above. However, the CSS class, such as <code>productTile_urgencyMessaging__V5DTS</code> includes a suffix like <code>__V5DTS</code> that is prone to change with each website update.</p>
<p>To avoid having to update the selector each time website updates, you may employ the XPath contains() function to select an element.</p>
<pre><code class="language-xpath">//*[contains(@class, 'productTile_urgencyMessaging')] // Retrieve 'Quality Choice' text</code></pre>
<h4>7. Using Logical Operators</h4>
<p>XPath supports logical operators for combining conditions. This is particularly useful for complex selections. For example:</p>
<pre><code class="language-xpath">//a[@class='external' or @class='external-link'] // Selects links with class 'external' or 'external-link'.
//div[@class='important' and contains(text(),'Alert')] // Selects divs with class 'important' containing 'Alert'.
</code></pre>
<h4>8. Complex Expressions</h4>
<p>You can create complex XPath expressions by combining multiple conditions and functions. This provides immense flexibility in your selections. For example:</p>
<pre><code class="language-xpath">//div[@class='content' and (contains(text(),'Important') or contains(text(),'Alert'))]
//table[not(@class='hidden')]/tbody/tr[td[2]='Completed']/td[3]
</code></pre>
<h3>Using XPath Selectors in PageCrawl.io</h3>
<p>To leverage these advanced XPath selectors effectively for website monitoring, you can integrate them with web monitoring tools such as PageCrawl.io:</p>
<ol>
<li>Log in to your PageCrawl account.</li>
<li>Click on <strong>Track New Page</strong>, fill in the page URL then select <strong>Tracked Elements</strong> to track.</li>
<li>Select "Text" as tracked element and then specifying XPath selector to track.</li>
<li>Save &amp; start monitoring page for changes.</li>
</ol>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Common Problems With the Visual Selector]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/troubleshooting/article/common-problems-with-visual-selector" />
            <id>https://pagecrawl.io/48</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Common Problems With the Visual Selector</h1>
<p>The visual selector lets you point and click an element on a preview of the page, and PageCrawl turns it into a selector for you. Occasionally a page or a selector needs a little extra care. This guide covers the most common situations and how to resolve them.</p>
<h3>Problem: The page won't load in the picker</h3>
<p>Some pages are slow, heavily scripted, or actively block automated browsers, so the live preview may fail to load or render incompletely.</p>
<p><strong>Solutions:</strong></p>
<ul>
<li><strong>Switch the engine.</strong> Try <strong>Stealth mode</strong> for sites that block bots, or <strong>Fast mode</strong> for simple static pages. See <a href="/help/features/article/what-is-real-browser-page-monitoring">Real Browser Mode</a>.</li>
<li><strong>Paste a selector instead.</strong> You don't need the picker to load the page. Find the selector in your own browser (<a href="/help/tutorials/article/find-xpath-css-selector-in-chrome">how to find a selector</a>) and paste it straight into the element's selector field, then use <strong>Test</strong> to confirm it captures the right content.</li>
<li><strong>Report it.</strong> If a page consistently fails, contact support so we can improve compatibility.</li>
</ul>
<h3>Problem: The selector breaks when the website changes</h3>
<p>Some sites generate randomized class names or add suffixes that change on every deploy, which makes a selector go stale.</p>
<p><strong>Solution:</strong> Match on the stable part of the class instead of the full name. For example, a class like <code>productTile_urgencyMessaging__V5DTS</code> has a volatile <code>__V5DTS</code> suffix. Use an XPath <code>contains()</code> match on the stable prefix:</p>
<pre><code class="language-xpath">//*[contains(@class, 'productTile_urgencyMessaging')]</code></pre>
<p>See the <a href="/help/tutorials/article/common-xpath-selectors">XPath tutorial for common selectors</a> for more patterns like this.</p>
<h3>Let PageCrawl or an AI assistant write the selector</h3>
<p>If you're unsure which selector to use, you have a few easy options:</p>
<ul>
<li>Use the <strong>visual selector</strong> to point and click the element, and let PageCrawl generate the selector.</li>
<li>Use the <strong><a href="/help/features/article/browser-extension-guide">PageCrawl browser extension</a></strong> to pick an element on the live page and send it into a new monitor.</li>
<li>Paste the page's HTML or URL into an AI assistant like ChatGPT or Claude and ask it to "write a CSS or XPath selector for [the element]".</li>
</ul>
<p>Whichever route you take, always press <strong>Test</strong> to confirm the selector captures exactly what you expect before saving.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Complete Guide to Reducing False Positive Notifications When Monitoring Websites for Changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/reduce-false-positives/article/reduce-false-positives-monitoring-website-for-changes" />
            <id>https://pagecrawl.io/49</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Complete Guide to Reducing False Positive Notifications When Monitoring Websites for Changes</h1>
<p>False positive notifications can be frustrating when monitoring websites. These alerts signal changes that are either irrelevant or nonexistent, leading to wasted time and reduced efficiency.</p>
<p>When using PageCrawl to monitor website changes, the rate of false-positive alerts is typically low if pages are correctly configured. However, some detected changes may not be relevant to your specific monitoring needs. This comprehensive guide will show you how to effectively reduce unnecessary alerts and ensure you only receive notifications for meaningful changes.</p>
<h3>1. Choose the Right Element to Track</h3>
<div class="kb-figure">
  <img src="/images/knowledge/simple-what-to-track.png" alt="What to Track panel with element type options and content levels (Everything, Content only, Reader mode)">
</div>
<p>Selecting the wrong type of element to monitor is one of the most common causes of false positives. With multiple monitoring options available, it's easy to get overwhelmed, especially if you're new to website monitoring.</p>
<h4>Getting Started</h4>
<p>Begin by tracking the <strong>text of the full page</strong>. This approach works best as a starting point for most monitoring scenarios, particularly when you need to monitor a large number of websites. If you notice frequent false positives, you can always revisit your setup and focus on specific page sections instead.</p>
<h4>Optimizing Full Page Text Tracking</h4>
<div class="kb-figure">
  <img src="/images/blog/full-text-tracking-options.png" alt="monitor reader mode">
</div>
<p>Monitoring <strong>Content Only</strong> is the first step to reduce false positives. This option filters out common page elements like headers, navigation menus, sidebars, and footers, focusing only on the main content area of the page. It's an effective way to eliminate noise from less relevant sections while still capturing most important content changes.</p>
<p><strong>Reader mode</strong> takes content filtering a step further, similar to the reader mode you <a href="https://support.apple.com/en-gb/guide/iphone/iphdc30e3b86/ios">may have used on your phone</a>. This mode monitors only the primary article text, using advanced algorithms to identify and extract the core content while filtering out everything else.</p>
<p>Reader mode is more restrictive than "Content Only" and works best for:</p>
<ul>
<li><strong>News articles</strong> and blog posts with clear article structure</li>
<li><strong>Documentation pages</strong> with structured content</li>
<li><strong>Research papers</strong> and academic content</li>
<li><strong>Press releases</strong> and announcements</li>
<li><strong>Tutorial and how-to articles</strong></li>
<li><strong>Terms of service</strong> and privacy policy pages</li>
<li><strong>Legal documents</strong> and policy updates</li>
</ul>
<p>However, Reader mode may not work well on:</p>
<ul>
<li><strong>Landing pages</strong> with mixed content types</li>
<li><strong>E-commerce product pages</strong> with specifications, reviews, and pricing</li>
<li><strong>Dashboard pages</strong> with multiple data sections</li>
<li><strong>Pages with pricing tables, feature lists, or comparison charts</strong></li>
<li><strong>Forum discussions</strong> or comment sections</li>
<li><strong>Complex layouts</strong> with multiple content blocks</li>
</ul>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> If you find that important content changes are being missed, consider switching back to "Content Only" for broader coverage.
</div>
<h4>When to Be More Selective</h4>
<div class="kb-figure">
  <img src="/images/blog/text-tracked-element.png" alt="tracking text element">
</div>
<p>If tracking "Content Only" or "Reader mode" still results in unnecessary notifications, switch to the "Text" tracked element type and use our "Visual Selector" (click on the blue button) to pinpoint the exact area you want to monitor. Be aware that significant page redesigns can cause these selectors to stop working.</p>
<p><strong>Advanced Tips:</strong></p>
<ul>
<li><strong>AI Suggest feature:</strong> You may use "AI Suggest" when adding a new page to monitor. Describe what you want to monitor (e.g., "product price" or "availability status"), and PageCrawl's AI will suggest an optimal monitoring configuration for you.</li>
<li><strong>Manual selectors:</strong> For maximum precision, <a href="/help/tutorials/article/common-xpath-selectors">manually create CSS or XPath selectors</a> to track specific sections of the page. This approach works best for users with a technical background, but you can also use tools like ChatGPT to craft selectors by pasting the relevant HTML code.</li>
</ul>
<h3>2. Filter Out Irrelevant Updates</h3>
<div class="kb-figure">
  <img src="/images/blog/date-example.png" alt="monitor footer">
</div>
<p>Websites frequently undergo minor updates, such as date changes, without substantial alterations to their content. These small updates can create unnecessary alerts that distract from meaningful changes. Here's how to avoid them.</p>
<h4>Ignore Repeatedly Changing Text</h4>
<div class="kb-figure">
  <img src="/images/blog/select-timeline-text.png" alt="select text to ignore in timeline">
</div>
<p>In Timeline, when reviewing detected changes, you can select irrelevant text and ignore any line that contains the selected text. For example, if a page has a section with a latest news headline like "Latest News: Bitcoin has reached a new all-time high," you can select "Latest News" and all lines containing this text will be ignored in future change detections. If you monitor multiple pages on the same website, this will be applied to all pages with the same domain name.</p>
<p>Alternatively, you can add an "Ignore Text" condition or create a global filter (update your team settings) to ignore it across all pages. Use % as a wildcard to indicate that any line containing a %specific word% or sentence should be ignored.</p>
<h4>Remove Specific Page Elements</h4>
<div class="kb-figure">
  <img src="/images/blog/remove-element.png" alt="action remove page elements">
</div>
<p>If a specific page area keeps triggering change detections, add a "Remove page element" action and select an area to suppress it completely.</p>
<h4>Remove Dates</h4>
<div class="kb-figure">
  <img src="/images/blog/remove-dates.png" alt="action remove dates">
</div>
<p>Use the <strong>"Remove dates"</strong> action to replace dates with placeholders like [DATE REMOVED]. This prevents alerts for irrelevant updates like "updated 3 minutes ago" or publication timestamps such as "Updated at: 2025-02-25" that change frequently even when nothing was updated on the page.</p>
<h4>Set a Change Threshold</h4>
<div class="kb-figure">
  <img src="/images/blog/change-threshold.png" alt="change detection threshold">
</div>
<p>You can configure a threshold to be alerted only when significant changes occur (e.g., when more than 1% of the page content changes). Before setting the threshold, review historic changes in Timeline to avoid setting it too high and missing important updates.</p>
<h4>Ignore Numbers</h4>
<div class="kb-figure">
  <img src="/images/blog/ignore-numbers.png" alt="ignore changed numbers">
</div>
<p>If numeric changes aren't relevant to you, you can add an "Ignore numbers" condition in the "Conditions &amp; Filters" section to prevent number changes from triggering change detections. This is particularly useful for pages with counters, view counts, or other metrics that change frequently.</p>
<h3>3. Let AI Help Reduce False Positives</h3>
<p>PageCrawl uses AI to analyze every detected change and help you focus on what matters most.</p>
<h4>How AI Analysis Works</h4>
<p>When a change is detected, our AI:</p>
<ul>
<li><strong>Summarizes the change</strong> in plain language so you can quickly understand what happened</li>
<li><strong>Assigns a priority score</strong> to indicate how important the change likely is</li>
<li><strong>Sorts your notifications</strong> so the most significant changes appear first</li>
</ul>
<h4>Provide Feedback on Changes</h4>
<p>Use the feedback buttons to tell us which changes matter to you:</p>
<div style="display: inline-flex; gap: 8px; align-items: center; padding: 8px 12px; background: #fafafa; border-radius: 4px; margin: 12px 0;">
  <span style="cursor: pointer; padding: 2px 4px; color: #bfbfbf; font-size: 14px;" title="Useful change">
    <svg width="14" height="14" viewBox="0 0 512 512" fill="currentColor"><path d="M313.4 32.9c26 5.2 42.9 30.5 37.7 56.5l-2.3 11.4c-5.3 26.7-15.1 52.1-28.8 75.2H464c26.5 0 48 21.5 48 48c0 18.5-10.5 34.6-25.9 42.6C497 275.4 504 288.9 504 304c0 23.4-16.8 42.9-38.9 47.1c4.4 7.3 6.9 15.8 6.9 24.9c0 21.3-13.9 39.4-33.1 45.6c.7 3.3 1.1 6.8 1.1 10.4c0 26.5-21.5 48-48 48H294.5c-19 0-37.5-5.6-53.3-16.1l-38.5-25.7C176 420.4 160 390.4 160 358.3V320 272 247.1c0-29.2 13.3-56.7 36-75l7.4-5.9c26.5-21.2 44.6-51 51.2-84.2l2.3-11.4c5.2-26 30.5-42.9 56.5-37.7zM32 192H96c17.7 0 32 14.3 32 32V448c0 17.7-14.3 32-32 32H32c-17.7 0-32-14.3-32-32V224c0-17.7 14.3-32 32-32z"/></svg>
  </span>
  <span style="cursor: pointer; padding: 2px 4px; color: #bfbfbf; font-size: 14px;" title="Not useful">
    <svg width="14" height="14" viewBox="0 0 512 512" fill="currentColor"><path d="M313.4 479.1c26-5.2 42.9-30.5 37.7-56.5l-2.3-11.4c-5.3-26.7-15.1-52.1-28.8-75.2H464c26.5 0 48-21.5 48-48c0-18.5-10.5-34.6-25.9-42.6C497 236.6 504 223.1 504 208c0-23.4-16.8-42.9-38.9-47.1c4.4-7.3 6.9-15.8 6.9-24.9c0-21.3-13.9-39.4-33.1-45.6c.7-3.3 1.1-6.8 1.1-10.4c0-26.5-21.5-48-48-48H294.5c-19 0-37.5 5.6-53.3 16.1L202.7 73.8C176 91.6 160 121.6 160 153.7V192v48 24.9c0 29.2 13.3 56.7 36 75l7.4 5.9c26.5 21.2 44.6 51 51.2 84.2l2.3 11.4c5.2 26 30.5 42.9 56.5 37.7zM32 384H96c17.7 0 32-14.3 32-32V128c0-17.7-14.3-32-32-32H32c-17.7 0-32 14.3-32 32V352c0 17.7 14.3 32 32 32z"/></svg>
  </span>
</div>
<ul>
<li><strong>Thumbs up</strong>: This change is useful or important</li>
<li><strong>Thumbs down</strong>: This change is noise or irrelevant</li>
</ul>
<p>You can provide feedback:</p>
<ul>
<li>On the <strong>page view</strong> when reviewing changes</li>
<li>Directly from <strong>email notifications</strong> using the quick-action links</li>
</ul>
<h3>4. Handling Dynamic Content</h3>
<p>Dynamic websites load or update parts of their content after the initial page load. For example, prices, stock availability, or user-specific recommendations might load dynamically, leading to unnecessary notifications. Here's how to handle these scenarios.</p>
<h4>Expand Collapsed Sections and Hidden Content</h4>
<div class="kb-figure">
  <img src="/images/blog/reveal-hidden-text.png" alt="reveal hidden text">
  <img src="/images/blog/accordion.png" alt="collapsed sections">
</div>
<p>PageCrawl only captures text that is visible when in "Full-page text" mode. This can be problematic if the page contains collapsible sections (accordions, panels, etc.) that are only revealed when clicked.</p>
<p>To address this, add the "Reveal hidden text" action, which will automatically expand any collapsed sections on the page before capturing content.</p>
<h4>Wait Until Page is Fully Loaded</h4>
<div class="kb-figure">
  <img src="/images/blog/wait-until-actions.png" alt="wait until page loaded">
</div>
<p>PageCrawl waits until the page is fully loaded. However, in some situations, certain page elements only appear after additional time or after specific actions are executed (clicking, form submission, redirects, etc.).</p>
<p>You can add wait actions to ensure the page is completely ready before capturing content. Multiple "Wait" actions are available:</p>
<ul>
<li><strong>"Wait for Text to appear":</strong> Waits until specific text appears on the page.</li>
<li><strong>"Wait for Text to disappear":</strong> Waits until specific text disappears from the page.</li>
<li><strong>"Wait for page element to appear":</strong> Waits for a specific page element to become visible.</li>
<li><strong>"Wait for Redirect":</strong> Waits for page redirects to complete. This is especially helpful when redirects are not immediate and take longer to process.</li>
<li><strong>"Wait for Seconds":</strong> Waits between 1 to 9 seconds (least recommended option).</li>
</ul>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Actions wait for the page to settle before continuing. To avoid unnecessarily long waits, each plan has a maximum page-load timeout, with higher plans allowing more time. If a page takes longer than its timeout to load, the check results in a timeout error.
</div>
<h3>5. Changes in Headers, Footers, and Sidebars</h3>
<div class="kb-figure">
  <img src="/images/blog/changing-footer.webp" alt="monitor footer">
</div>
<p>Frequently updated areas like footers, headers, and sidebars can result in irrelevant notifications. These sections often include changing elements such as timestamps, menus, or recent updates that are unrelated to the main content.</p>
<h4>How to Avoid This</h4>
<ol>
<li><strong>Switch to "Content Only":</strong> When tracking the full page, this option automatically filters out these less important areas. Change the Element from "Everything on the page" to "Content Only."</li>
<li><strong>Remove Specific Elements:</strong> Use the "Remove page element" action with the selector <code>nav,aside,footer,.footer,header</code> to exclude them. This directly alters the page, and these areas will not be visible in screenshots. You may want to use this approach when using a Tracked Element other than "Full page text."</li>
<li><strong>Focus on the Main Section:</strong> Track only the main content using the "Text" tracked element and the <code>main</code> selector. If no such element exists (e.g., the website is not semantically structured), you will see a "No selector found" error.</li>
</ol>
<h3>6. Page Errors or Blank Content</h3>
<div class="kb-figure">
  <img src="/images/blog/bot-protection-guard.png" alt="handling monitoring errors">
</div>
<p>Occasionally, a monitored page may fail to load properly, leading to blank content or error messages. While PageCrawl detects these situations in most cases, it can still trigger false positives. This often happens when a website doesn't report errors properly, relies on external data sources that fail to load, or when dynamic content is not displayed correctly.</p>
<h4>How to Avoid This</h4>
<p>Use the <strong>"Mark Check as Failed When"</strong> action to flag a page as failed without recording changes. For example:</p>
<ul>
<li>If a product's price unexpectedly drops to $0 due to an error and a message such as "Not available" is shown, PageCrawl can mark the page as failed instead of notifying you about a false change from $9.99 to $0.00.<ul>
<li>Add "Mark Check as Failed When" with "Text Contains" set to "Not available"</li>
</ul>
</li>
</ul>
<p>Additionally, customize the "Report Errors" setting to trigger only after a certain number of consecutive failures (e.g., after 10 consecutive failed checks) to avoid being overwhelmed by temporary issues.</p>
<p>If you check pages frequently, ensure the "Delay when Failed" setting is deactivated (in Advanced preferences) to prevent page failures from reducing the page-checking frequency.</p>
<h3>7. Appearing/Disappearing Content</h3>
<div class="kb-figure">
  <img src="/images/blog/conditional.png" alt="monitor conditional pages">
</div>
<p>Websites may display varying content based on user sessions, location, or elements that frequently appear and disappear. This can lead to false positive notifications.</p>
<h4>Smart Suggestions</h4>
<p>Once sufficient sample data is collected, PageCrawl will automatically suggest filters to reduce false triggers. Look for the <strong>"Frequently changing content detected"</strong> panel on your monitored page.</p>
<div class="kb-figure">
  <img src="/images/blog/suggested-actions.png" alt="suggested actions">
</div>
<p>You can:</p>
<ul>
<li><strong>Click on text fragments</strong> to add them to your ignore list</li>
<li><strong>Click "Ignore all above"</strong> to ignore all suggested items at once</li>
<li><strong>Use "Ignore all numbers"</strong> if numeric changes aren't relevant</li>
</ul>
<h4>Provide Feedback</h4>
<p>For changes that slip through, use the <strong>thumbs down</strong> button to mark them as noise.</p>
<h4>Additional Solutions</h4>
<ol>
<li><strong>Ensure the page is fully loaded</strong>: Add a "Wait" action until specific text or elements appear on the page before capturing content.</li>
<li><strong>Consider deactivating "Intelligent Reconnect"</strong> if the page content changes depending on the user's location or session (found under Advanced Preferences).</li>
</ol>
<h3>8. Cookie Banners and Overlay Popups <em>(Default Settings)</em></h3>
<div class="kb-figure">
  <img src="/images/blog/webcookies.jpeg" alt="blocking cookies">
</div>
<p>By default, PageCrawl enables <strong>"Block cookie banners and ads"</strong> and <strong>"Hide website overlays and popups"</strong> actions to reduce unnecessary notifications. However, you can disable these settings if not needed. </p>
<div class="kb-figure">
  <img src="/images/blog/actions-cookies.png" alt="blocking cookies action">
</div>
<h4>Cookie Banners</h4>
<p>Cookie banners often appear dynamically after the page loads, altering the content and triggering false positives.</p>
<ul>
<li><strong>Default Setting</strong>: Cookie banners are automatically suppressed during monitoring.</li>
<li><strong>Optional</strong>: You can disable this feature in your settings if necessary.</li>
</ul>
<h4>Overlay Popups</h4>
<p>Overlay popups, such as ads or newsletter subscription prompts, may appear sporadically and interfere with accurate monitoring.</p>
<ul>
<li><strong>Default Setting</strong>: PageCrawl hides overlay popups by default to ensure they don’t trigger false positives.</li>
<li><strong>Optional</strong>: This feature can also be turned off if not required.</li>
</ul>
<p>These default settings simplify the monitoring process but can be adjusted based on your specific needs.</p>
<h3>9. Scroll-Triggered Content</h3>
<p>Sometimes pages use animations to reveal content sections that only appear as you scroll down the page.</p>
<h4>Solutions</h4>
<ol>
<li><strong>Use the "Scroll to Bottom" action</strong> to automatically scroll to the bottom of the page before capturing content.</li>
<li><strong>Use the "Disable JavaScript" action</strong> which will likely disable all animations. Note that this may cause issues with loading dynamic content on some websites.</li>
</ol>
<hr />
<h2>Conclusion</h2>
<p>By implementing these strategies, you can significantly reduce false positive notifications when monitoring websites with PageCrawl.</p>
<p><strong>Quick wins for reducing false positives:</strong></p>
<ol>
<li>Start with "Content Only" or "Reader mode" for text tracking</li>
<li>Use the <strong>thumbs down button</strong> to mark irrelevant changes</li>
<li>Review and apply <strong>suggested filters</strong> when they appear</li>
<li>Set up appropriate filters for dates, numbers, and repeated text</li>
</ol>
<p>Remember:</p>
<ul>
<li>AI analysis helps prioritize important changes</li>
<li>Regularly review your settings and filters</li>
<li>Use the suggested actions when they appear</li>
<li>Test different approaches to find what works best for your specific use case</li>
</ul>
<p>With proper configuration and ongoing fine-tuning, you'll achieve efficient and reliable website change monitoring.</p>
<p>If you're still experiencing issues with false positives after trying these solutions, don't hesitate to contact our support team for personalized assistance with your specific monitoring setup.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How to Track All Pages Within a Website]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/track-all-pages-within-website-for-changes" />
            <id>https://pagecrawl.io/50</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How to Track All Pages Within a Website</h1>
<p>PageCrawl.io is a powerful website changes monitoring tool designed to help you keep track of all the pages within your website effortlessly. One of its standout features is the ability to crawl and automatically discover all pages within a website, much like Google's indexing process. This article will guide you through the process of utilizing PageCrawl.io to effectively track and manage all pages within your website.</p>
<p>Creating a template within PageCrawl.io is the initial step to enable auto-discovery for tracking all pages within a website.</p>
<h4>Setting Up Automatic Page Discovery</h4>
<div class="kb-figure">
  <img src="/images/knowledge/discovery-add-website.png" alt="Add Website for Discovery dialog with the website URL, what-to-track options, and scan frequency">
</div>
<ol>
<li>
<p><strong>Create a Template:</strong> </p>
<ul>
<li>
<p><strong>Provide Sample URL:</strong> Sample URL helps to automatically setup common parameters such as Base Discovery URL, filters and automatically detect sitemaps within the site.</p>
</li>
<li>
<p><strong>Activate Automatic Page Discovery:</strong> Enable this feature to automatically uncover new pages as they're added to the site.</p>
</li>
<li>
<p><strong>Choose Your Crawling Method:</strong></p>
<ul>
<li>
<p><strong>Sitemap only:</strong> Perfect if tracked site has a sitemap.xml file detailing all pages.</p>
</li>
<li>
<p><strong>Homepage links only:</strong> Start the crawl from your provided URL, discovering pages through links on the homepage.</p>
</li>
<li>
<p><strong>Follow links 2 levels deep</strong> / <strong>Follow links 3 levels deep:</strong> Opt for an extensive exploration, ensuring maximum page coverage by following links across multiple levels. These options are available on Enterprise/Ultimate plans only.</p>
</li>
<li>
<p><strong>Automatic (recommended):</strong> Uses all available methods for page discovery.</p>
</li>
</ul>
</li>
</ul>
</li>
<li>
<p><strong>Configuration:</strong> Fine-tune additional settings like tracked elements to monitor, update frequency, and specific directories for inclusion or exclusion.</p>
</li>
<li>
<p><strong>Apply and Save:</strong> Save your template settings and apply them to the relevant projects within your PageCrawl.io account.</p>
</li>
<li>
<p><strong>Wait</strong> for newly discovered pages to appear in your PageCrawl.io account.</p>
</li>
</ol>
<h4>Leveraging Automatic Page Discovery for Thorough Tracking</h4>
<p>Once your template is in place, PageCrawl.io systematically discovers and indexes all available pages within your website.</p>
<ul>
<li>
<p><strong>Review Discovered Pages:</strong> Once page discovery completes, navigate through a detailed list of discovered URLs within the dashboard.</p>
</li>
<li>
<p><strong>Customized Monitoring:</strong> Set up tailored monitoring for specific pages or sections, configuring alerts to notify you of any modifications.</p>
</li>
<li>
<p><strong>Content Change Insights:</strong> Review content changes over time to spot updates, removals, or additions across your monitored pages.</p>
</li>
<li>
<p><strong>Optimization:</strong> Employ the insights gathered to optimize your website, refining user experience, enhancing SEO strategies, and rectifying any issues spotted during the crawl.</p>
</li>
</ul>
<h4>In Conclusion</h4>
<p>PageCrawl.io's automatic page discovery feature simplifies the process of monitoring all pages within a website. By following these steps, efficiently manage, monitor, and stay updated on your website's content, ensuring an informed approach to website management.</p>
<p>For further guidance or inquiries, consult PageCrawl.io's support resources or reach out to their customer service team.</p>
<p>Happy tracking!</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Hiding Popup Overlays When Monitoring Pages for Changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/reduce-false-positives/article/automatically-hiding-overlays-to-avoid-popups-from-triggering-notifications" />
            <id>https://pagecrawl.io/51</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Hiding Popup Overlays When Monitoring Pages for Changes</h1>
<div class="kb-figure">
  <img src="/images/blog/overlay-example.png" alt="block cookies">
</div>
<p>When you visit a website for the first time, you may sometimes encounter an annoying ad or offer that overlays the content. While this is usually not a problem when monitoring websites for changes, it can still sometimes cause false-positive alerts if screenshots capture the content overlaid with the popup. These popups may only appear once, or for specific visitors or geographic locations.</p>
<h4>The "Hide Website Overlays &amp; Popups" Action</h4>
<p>To mitigate false positives, we highly recommend using the "Hide website overlays &amp; popups" action on affected pages. Keep in mind that this may not work on all pages.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-actions.png" alt="Actions section of the page editor with the Hide website overlays and popups action added">
</div>
<h4>Alternative Approach</h4>
<p>If the "Hide website overlays &amp; popups" action did not work, or if all content on the page becomes invisible, you can manually target the overlay with the <a href="/help/reduce-false-positives/article/how-to-exclude-page-section">"Remove page element" action</a> to exclude it.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Keep an HTML Record of a Page Without Being Notified of Minor Changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/reduce-false-positives/article/keep-html-record-but-not-be-notified-of-changes" />
            <id>https://pagecrawl.io/52</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Keep an HTML Record of a Page Without Being Notified of Minor Changes</h1>
<div class="kb-figure">
  <img src="/images/knowledge/settings-tracked-elements.png" alt="Tracked Elements section in Advanced mode showing the TYPE, ELEMENT, and THRESHOLD columns for each element">
</div>
<p>When monitoring web pages, you might find it useful to keep a historical HTML record for future reference. However, minor changes, such as dynamic updates to attributes, styles, or tags, can often trigger unnecessary alerts. These changes, while technically present in the HTML, might not affect the visual representation or the substantive content of the page.</p>
<p><strong>Focus on Text Content</strong>: By monitoring the text content of a page rather than its HTML structure, you can significantly reduce the number of false alerts. Text content changes are more likely to represent meaningful updates to the page.</p>
<p><strong>Use Multiple Tracked Elements</strong>: You can add several tracked elements to a single page. This lets you record HTML changes for reference while only receiving notifications for the elements that matter most.</p>
<h3>Set the "Do Not Trigger" Threshold</h3>
<div class="kb-figure">
  <img src="/images/blog/filters-do-not-trigger.png" alt="monitor html but not trigger notifications">
</div>
<p>The "Do not trigger" option is a threshold setting on individual tracked elements. It records changes for that element without sending any notifications. Here is how to set it up:</p>
<ol>
<li>Open the page editor and go to the <strong>Tracked Elements</strong> section.</li>
<li>Make sure you have <strong>at least two tracked elements</strong> (for example, a Text element and an HTML element). The "Do not trigger" option only appears when more than one tracked element is configured.</li>
<li>On the HTML tracked element, open the <strong>Threshold</strong> dropdown.</li>
<li>Select <strong>"Do not trigger"</strong>.</li>
<li>Save the page.</li>
</ol>
<p>With this configuration, PageCrawl will continue recording HTML changes in the timeline for future reference, but only changes on your other tracked elements (such as Text) will trigger notifications.</p>
<p>By carefully adjusting your monitoring settings, you can ensure that you are alerted only to significant changes that impact the content's meaning or visual presentation. This approach helps maintain the effectiveness of your monitoring efforts without the distraction of frequent, unnecessary notifications.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Can I pay by Crypto?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/can-i-pay-using-crypto" />
            <id>https://pagecrawl.io/53</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Can I pay by Crypto?</h1>
<p>Yes, we support cryptocurrency payments for <strong>Ultimate plans paid annually</strong>.</p>
<p>To arrange payment, please contact support at <a href="mailto:support@pagecrawl.io">support@pagecrawl.io</a>.</p>]]>
            </summary>
                                    <updated>2026-03-05T10:31:13+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Automatically Discover New Pages To Track]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/page-discovery" />
            <id>https://pagecrawl.io/54</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Automatically Discover New Pages To Track</h1>
<p>PageCrawl is designed to make website change monitoring and management seamless. Page Discovery automatically finds new pages on a website and starts monitoring them for you, so your coverage stays up to date as the site grows. The fastest way to get started is to add a whole website from the <strong>Discovered Pages</strong> page, then fine-tune how it works from the template that gets created for you.</p>
<h3>Add a Website to Discover</h3>
<p>Start from the <a href="/app/discovered-pages">Discovered Pages</a> page and click <strong>Add Website</strong>. This is the quickest way to begin: enter a website and PageCrawl handles the rest.</p>
<div class="kb-figure kb-figure--narrow">
  <img src="/images/knowledge/discovery-add-website-new.png" alt="Add Website for Discovery dialog with the website URL, what to track options, check frequency, and notification settings">
</div>
<p>In the dialog, set:</p>
<ol>
<li><strong>Website URL</strong> - the site you want to monitor (for example, a competitor's store or your own marketing site).</li>
<li><strong>What to track</strong> - choose to track every discovered page, only top-level pages (like <code>/pricing</code> and <code>/about</code>), or review pages yourself before anything is monitored.</li>
<li><strong>Check Frequency</strong> - how often discovered pages are checked for changes.</li>
<li><strong>Notify me via</strong> - the channels that should receive change alerts.</li>
</ol>
<p>Click <strong>Add Website</strong>. PageCrawl scans the site, lists everything it finds on the Discovered Pages page, and creates a <strong>template</strong> with sensible defaults that controls how this website is monitored.</p>
<h3>Adjust Advanced Configuration via the Template</h3>
<p>Adding a website automatically creates a configured <strong>template</strong> for it. The template is where you fine-tune scanning, filters, and what gets tracked. Open it from <a href="/app/settings/workspace/templates">Templates settings</a> (or the "Templates settings" link in the Add Website dialog) and edit the template for your website.</p>
<h4>Choose a Scanning Method</h4>
<p>The template's <strong>Discover New Pages</strong> setting controls how PageCrawl looks for new pages. The default is <strong>Automatic (recommended)</strong>, which combines methods to find pages using the best approach for each website:</p>
<ul>
<li><strong>Automatic (recommended)</strong>: Combines sitemap and link discovery to find pages using the best method for the website. This is the default and recommended setting.</li>
<li><strong>Homepage Links Only</strong>: Discover new links by following links on the homepage. Available as a daily or weekly check. Useful if you want to focus on pages directly linked from the main page.</li>
<li><strong>Sitemap Only</strong>: Discover pages listed in the website's sitemap. Most websites have a sitemap to help search engines find their pages, making this an efficient method for large sites.</li>
<li><strong>Follow Links 2 Levels Deep</strong>: Follows links on the homepage, then follows links on those pages too. Available as a weekly check. Note: Only available on Enterprise and Ultimate plans.</li>
<li><strong>Follow Links 3 Levels Deep</strong>: Follows links on the homepage, then follows links two more levels deep. Available as a weekly check. Note: Only available on Enterprise and Ultimate plans.</li>
<li><strong>Deep Scan</strong>: Conduct a comprehensive analysis by visiting every accessible page on your website. This ensures that no new links go unnoticed, even on deeply nested pages. Note: Only available on Enterprise and Ultimate plans.</li>
</ul>
<h4>Apply Include and Exclude Filters</h4>
<p>Use the template's filters to control exactly which discovered pages are monitored, so you avoid tracking irrelevant pages.</p>
<div class="kb-figure">
  <img src="/images/knowledge/discovery-template-filters.png" alt="Discovery filter controls in the template: include and exclude rules, Track All Pages, and a tracked page limit">
</div>
<ul>
<li><strong>Include rules</strong>: Specify keywords or patterns that a page must match to be tracked. Useful for tracking specific types of content, such as <code>/product/</code> pages only.</li>
<li><strong>Exclude rules</strong>: Define keywords or patterns that should be skipped. Ideal for ignoring pages you do not care about, such as <code>/tag/</code> or <code>/archive/</code> URLs.</li>
<li><strong>Track All Pages</strong>: Track every discovered page without filtering.</li>
<li><strong>Tracked Page Limit</strong>: Cap how many pages are auto-tracked to keep usage under control.</li>
</ul>
<h4>Configure Tracked Elements</h4>
<p>The template also defines what is tracked on each discovered page. You can monitor all pages, or only those with a specific structure (for example, only product pages).</p>
<ol>
<li>To monitor whole pages, set the tracked element to <strong>Full-page Text</strong>.</li>
<li>To monitor pages with a specific layout, configure multiple tracked elements, such as product title, price, and description. If these elements do not exist on a discovered page, that page is simply skipped.</li>
</ol>
<p>Save the template, and PageCrawl will keep discovering and monitoring matching pages automatically. If too many irrelevant pages are discovered, tighten the include/exclude filters and remove the pages you do not want.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[File Checksum Monitoring]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/file-checksum-hash-monitoring" />
            <id>https://pagecrawl.io/55</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>File Checksum Monitoring</h1>
<p>File Checksum Monitoring detects when any online file has been modified by comparing its SHA-256 hash. Unlike text-based monitoring, this works with any file type, including zip archives, images, videos, and binary files. When a change is detected, the original file is stored so you can download and compare versions.</p>
<h3>What is SHA-256?</h3>
<p>SHA-256 is a cryptographic hash function that produces a unique fingerprint for a file. If even a single byte changes, the hash changes completely, making it reliable for detecting modifications.</p>
<h3>How It Works</h3>
<ol>
<li>You provide the URL of the file to monitor</li>
<li>PageCrawl downloads the file and calculates its SHA-256 checksum</li>
<li>On each subsequent check, the checksum is recalculated and compared</li>
<li>If the checksum differs, you receive a notification</li>
<li>The previous version of the file is saved for manual comparison</li>
</ol>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the direct URL to the file</li>
<li>PageCrawl detects the file and shows checksum monitoring options</li>
<li>Choose your check frequency and notification preferences</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/file-checksum-simplecreate.png" alt="When you paste a direct file URL (e.g. a .zip) into PageCrawl, it detects a file and selects File Checksum, since text cannot be extracted">
</div>
<h3>Supported File Types</h3>
<p>Any file accessible via URL, including: zip, rar, psd, video, audio, images, and more. Maximum file size is <strong>15 MB</strong>. Contact support if you need to monitor larger files.</p>
<h3>Checksum vs Text Monitoring</h3>
<table>
<thead>
<tr>
<th>Method</th>
<th>Best For</th>
<th>Shows Exact Changes</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>File checksum</strong></td>
<td>Any file type (binary, images, archives)</td>
<td>No, only that the file changed</td>
</tr>
<tr>
<td><strong>Text monitoring</strong></td>
<td>PDF, Excel, Word, CSV, PowerPoint</td>
<td>Yes, line-by-line diff</td>
</tr>
</tbody>
</table>
<p>If you need to see exactly what text changed in a document, use the dedicated text monitoring for <a href="/help/file-tracking/article/can-pagecrawl-detect-changes-in-pdf">PDF</a>, <a href="/help/file-tracking/article/track-changes-in-excel-files">Excel</a>, <a href="/help/file-tracking/article/track-changes-in-word-files">Word</a>, <a href="/help/file-tracking/article/track-changes-in-csv-files">CSV</a>, or <a href="/help/file-tracking/article/track-changes-in-powerpoint-files">PowerPoint</a> files instead.</p>
<h3>FAQ</h3>
<ul>
<li><strong>How often are files checked?</strong> You can set the frequency from every 5 minutes to monthly, depending on your plan.</li>
<li><strong>What if the file is no longer accessible?</strong> You will be notified with an error status.</li>
<li><strong>Can I stop monitoring a file?</strong> Yes, disable or delete it at any time.</li>
</ul>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/can-pagecrawl-detect-changes-in-pdf">PDF Changes</a> - Monitor PDF text changes</li>
<li><a href="/help/file-tracking/article/track-changes-in-excel-files">Excel Spreadsheets</a> - Monitor spreadsheet text changes</li>
<li><a href="/help/file-tracking/article/track-changes-in-word-files">Word Documents</a> - Monitor Word text changes</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Changes in Google Sheets, Docs, and Drive Files]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/monitor-changes-in-google-sheets" />
            <id>https://pagecrawl.io/56</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Changes in Google Sheets, Docs, and Drive Files</h1>
<p>PageCrawl can monitor publicly shared Google Sheets, Google Docs, and other Google Drive files for text changes. When content is added, edited, or removed, you receive a notification with a diff showing exactly what changed.</p>
<h3>Requirements</h3>
<p>The Google file must be accessible via a shareable link. In Google Drive, set the sharing to <strong>"Anyone with the link can view"</strong> to allow PageCrawl to access the content.</p>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the shareable link to your Google Sheet, Doc, or Drive file</li>
<li>PageCrawl detects the file type and shows the appropriate configuration</li>
<li>Choose your check frequency and notification preferences</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste a shareable Google Sheets, Docs, or Drive link to start monitoring it">
</div>
<h3>Supported File Types</h3>
<table>
<thead>
<tr>
<th>File Type</th>
<th>What Is Tracked</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Google Sheets</strong></td>
<td>Cell text content across all sheets</td>
</tr>
<tr>
<td><strong>Google Docs</strong></td>
<td>Full document text</td>
</tr>
<tr>
<td><strong>Google Drive files</strong></td>
<td>Text content (for supported formats like PDF, DOCX)</td>
</tr>
</tbody>
</table>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/monitor-changes-in-sharepoint-documents">SharePoint Documents</a> - Monitor Microsoft SharePoint files</li>
<li><a href="/help/file-tracking/article/track-changes-in-excel-files">Excel Spreadsheets</a> - Monitor Excel file changes</li>
<li><a href="/help/integrations/article/sync--monitored-pages-to-google-sheets">Google Sheets Sync</a> - Export change data to Google Sheets</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Changes in Microsoft SharePoint Documents]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/file-tracking/article/monitor-changes-in-sharepoint-documents" />
            <id>https://pagecrawl.io/57</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Changes in Microsoft SharePoint Documents</h1>
<p>PageCrawl can monitor Microsoft SharePoint pages and documents for text changes. When content is added, edited, or removed, you receive a notification showing what changed.</p>
<h3>Requirements</h3>
<p>The SharePoint page or document must be reachable at a direct URL that PageCrawl can open.</p>
<h3>Sharing and Access</h3>
<p>Most SharePoint content sits behind a Microsoft 365 login, so you have two options:</p>
<ul>
<li><strong>Share with a link.</strong> In SharePoint, open the document or page, click <strong>Share</strong>, and create a link set to <strong>Anyone with the link</strong> (where your organization's policy allows it). Paste that link into PageCrawl. This is the simplest option and needs no credentials.</li>
<li><strong>Use login authentication.</strong> If the file can only be opened after signing in, set up a <a href="/help/features/article/can-i-track-password-protected-websites">login authentication configuration</a> for your SharePoint/Microsoft 365 login, then select it when adding the page. PageCrawl will sign in before each check.</li>
</ul>
<p>Note: If your organization enforces multi-factor authentication or single sign-on on SharePoint, an "Anyone with the link" share is usually the most reliable option.</p>
<h3>Setup</h3>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the URL to the SharePoint page or document</li>
<li>Choose your check frequency and notification preferences</li>
<li>If the page requires login, select your authentication configuration</li>
<li>Save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste the URL to a SharePoint page or document to start monitoring it">
</div>
<h3>What Can Be Monitored</h3>
<table>
<thead>
<tr>
<th>Content Type</th>
<th>How It Works</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>SharePoint pages</strong></td>
<td>Tracks text content changes on the page</td>
</tr>
<tr>
<td><strong>Word documents</strong></td>
<td>Extracts and compares text content</td>
</tr>
<tr>
<td><strong>Excel files</strong></td>
<td>Extracts and compares cell data</td>
</tr>
<tr>
<td><strong>PDF files</strong></td>
<td>Extracts and compares text content</td>
</tr>
</tbody>
</table>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/file-tracking/article/monitor-changes-in-google-sheets">Google Docs &amp; Sheets</a> - Monitor Google Drive files</li>
<li><a href="/help/features/article/can-i-track-password-protected-websites">Password-Protected Pages</a> - Configure login authentication</li>
<li><a href="/help/file-tracking/article/track-changes-in-word-files">Word Documents</a> - Monitor Word file changes</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitoring Changes in PDF Files]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/tracking-changes-in-pdf-files" />
            <id>https://pagecrawl.io/58</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitoring Changes in PDF Files</h1>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen where you paste a direct PDF URL to begin monitoring it">
</div>
<p>Monitoring text changes in PDF files can be essential for managing contracts, reports, or any important documents that may be frequently updated. Manually reviewing each document for changes can be time-consuming and prone to error. This is where PageCrawl.io comes in handy, offering an automated solution for tracking text changes in PDF files and notifying you whenever there’s an update.</p>
<h3>Why Monitor PDF Files for Text Changes?</h3>
<p>PDFs are often used for official or finalized documents, which means any change can be significant. Whether it's contracts, legal documents, or product manuals, keeping an eye on text changes ensures that you're always aware of important updates. Monitoring PDF files helps with:</p>
<ul>
<li>Keeping track of contract modifications.</li>
<li>Ensuring that no important edits are made without your knowledge.</li>
<li>Detecting unauthorized changes in sensitive documents.</li>
</ul>
<h3>How PageCrawl.io Helps with PDF Monitoring</h3>
<p>With PageCrawl.io, you can set up automated tracking for PDF files. It scans the text in your PDF files and alerts you whenever there’s a change, so you don’t have to sift through documents manually.</p>
<h3>What if PDF does not contain text</h3>
<p>If the PDF you want to monitor does not contain readable text you can use <a href="/help/file-tracking/article/file-checksum-hash-monitoring">File checksum monitoring</a> instead to check if the PDF has been modified or changed. The downside of such approach is that you will not be able to quickly glance what exactly has changed but you will need to review page by page.</p>
<h3>Setting Up PDF Monitoring with PageCrawl.io</h3>
<p>Setting up PDF monitoring is easy with PageCrawl.io. Here’s a quick guide:</p>
<h4>Step 1: Sign in to PageCrawl.io</h4>
<p>Log in to your PageCrawl.io account or sign up if you’re new to the platform.</p>
<h4>Step 2: Add a New Monitored Page</h4>
<p>Navigate to the dashboard and click on the "Track New Page" button. Here, you can paste a link to the PDF file you want to monitor.</p>
<h4>Step 3: Set Up Notifications &amp; How often to check for changes</h4>
<p>Customize how and when you receive notifications. You can choose to be notified immediately when text changes, or you can set up periodic checks if you want less frequent updates.</p>
<h3>Tracking PDFs Embedded in Web Pages</h3>
<p>Some websites display PDF documents directly within a web page using iframes. This is common for contracts, terms of service, financial reports, and other documents that are embedded alongside regular page content.</p>
<p>PageCrawl automatically detects embedded iframes when you add a page for monitoring. When setting up fullpage monitoring on a page that contains iframes, you will see an <strong>"Include embedded content"</strong> checkbox. Enabling this option tells PageCrawl to extract and track text from the embedded PDF along with the rest of the page content.</p>
<p>This means you can monitor both the surrounding web page and the embedded PDF document in a single monitor, receiving notifications whenever either part changes.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Bulk Edit Pages]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/bulk-edit-pages" />
            <id>https://pagecrawl.io/59</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Bulk Edit Pages</h1>
<p>Select multiple monitored pages and change their settings in one operation. Bulk edit is available on paid plans.</p>
<h3>How to Bulk Edit</h3>
<ol>
<li>Go to your page list</li>
<li>Select pages using the checkboxes (or select all)</li>
<li>Click <strong>Bulk actions</strong> in the toolbar</li>
<li>Choose what to change and apply</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/bulk-edit-annotated.png" alt="Tracked pages with rows selected and the Bulk actions menu open, with arrows pointing to the row checkboxes, the Bulk actions button, and the Excel export option">
</div>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> The same <strong>Bulk actions</strong> menu is also where you export your data. Choose <strong>Excel export</strong> to download the selected pages' current values, change history, and configuration (see <a href="#bulk-export">Bulk Export</a> below).
</div>
<h3>Available Bulk Operations</h3>
<table>
<thead>
<tr>
<th>Operation</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Enable / Disable</strong></td>
<td>Turn monitoring on or off for selected pages</td>
</tr>
<tr>
<td><strong>Delete</strong></td>
<td>Permanently delete selected pages and/or folders</td>
</tr>
<tr>
<td><strong>Trigger check</strong></td>
<td>Run an immediate check on all selected pages</td>
</tr>
<tr>
<td><strong>Mark as seen</strong></td>
<td>Clear the "changed" indicator on selected pages</td>
</tr>
</tbody>
</table>
<h3>Bulk-Editable Settings</h3>
<table>
<thead>
<tr>
<th>Setting</th>
<th>Options</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Check frequency</strong></td>
<td>5 min to monthly (depending on plan)</td>
</tr>
<tr>
<td><strong>Engine</strong></td>
<td>Default, Stealth, or Fast</td>
</tr>
<tr>
<td><strong>Proxy location</strong></td>
<td>A built-in region (London, New York, San Francisco, Toronto, Frankfurt, Tel Aviv, Fixed IP, Random), Residential Proxy (paid), or one of your <a href="/help/features/article/custom-proxies">Proxy Pools</a> for using your own proxies</td>
</tr>
<tr>
<td><strong>Notifications</strong></td>
<td>Email, Slack, Telegram, Discord, Teams, or disable</td>
</tr>
<tr>
<td><strong>Notification emails</strong></td>
<td>Choose which Additional Cc Emails receive alerts</td>
</tr>
<tr>
<td><strong>Labels</strong></td>
<td>Add or remove labels</td>
</tr>
<tr>
<td><strong>Folder</strong></td>
<td>Move pages to a specific folder</td>
</tr>
<tr>
<td><strong>Template</strong></td>
<td>Apply a monitoring template</td>
</tr>
<tr>
<td><strong>Screenshots</strong></td>
<td>Enable or disable</td>
</tr>
<tr>
<td><strong>Intelligent Reconnect</strong></td>
<td>Enable or disable automatic retry on failure</td>
</tr>
<tr>
<td><strong>Device</strong></td>
<td>Emulate a specific device viewport</td>
</tr>
<tr>
<td><strong>Language</strong></td>
<td>Set browser language</td>
</tr>
<tr>
<td><strong>Ignored text</strong></td>
<td>Add or replace text patterns to ignore</td>
</tr>
<tr>
<td><strong>Full page selector</strong></td>
<td>Choose between Everything on the page, Content only, or Reader mode</td>
</tr>
<tr>
<td><strong>AI summaries</strong></td>
<td>Enable or disable AI-powered change summaries</td>
</tr>
<tr>
<td><strong>AI focus</strong></td>
<td>Set custom AI instructions for what matters</td>
</tr>
<tr>
<td><strong>AI tier</strong></td>
<td>Basic or Pro (Pro requires Ultimate plan)</td>
</tr>
<tr>
<td><strong>Cookie blocking</strong></td>
<td>Add or remove cookie consent blocking</td>
</tr>
<tr>
<td><strong>Overlay removal</strong></td>
<td>Add or remove popup overlay hiding</td>
</tr>
<tr>
<td><strong>Date exclusion</strong></td>
<td>Add or remove date filtering</td>
</tr>
<tr>
<td><strong>Number exclusion</strong></td>
<td>Add or remove number filtering</td>
</tr>
<tr>
<td><strong>Archive</strong></td>
<td>Enable web archiving (Ultimate plan only)</td>
</tr>
<tr>
<td><strong>Reveal hidden text</strong></td>
<td>Enable or disable extraction of visually hidden text</td>
</tr>
<tr>
<td><strong>Monitored keywords</strong></td>
<td>Set keywords to highlight in change reports</td>
</tr>
<tr>
<td><strong>Report errors</strong></td>
<td>Enable or disable error reporting for failed checks</td>
</tr>
<tr>
<td><strong>Delay when failed</strong></td>
<td>Add a delay before retrying after a failed check</td>
</tr>
<tr>
<td><strong>Authentication</strong></td>
<td>Configure login credentials for password-protected pages</td>
</tr>
<tr>
<td><strong>AI model</strong></td>
<td>Choose the AI model used for change summaries</td>
</tr>
<tr>
<td><strong>Record always</strong></td>
<td>Always save check results, even when no change detected</td>
</tr>
</tbody>
</table>
<h3>Adding Pages in Bulk</h3>
<p>Beyond editing, you can also add multiple pages at once:</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Paste URLs</strong></td>
<td>Paste a list of URLs (one per line) to add them all at once</td>
</tr>
<tr>
<td><strong>Upload file</strong></td>
<td>Import URLs from a CSV or Excel file</td>
</tr>
<tr>
<td><strong>Website scan</strong></td>
<td>Scan an entire website to discover and add pages automatically</td>
</tr>
</tbody>
</table>
<h3>Bulk Export</h3>
<p>Select pages and export their data to Excel, including current values, change history, and configuration.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/organized-page-monitoring">Labels, Folders &amp; Workspaces</a> - Organize your monitored pages</li>
<li><a href="/help/features/article/advanced-configuration">Advanced Configuration</a> - Templates and Power User settings</li>
<li><a href="/help/features/article/page-discovery">Page Discovery</a> - Automatically discover new pages to monitor</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-19T09:27:55+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Organize Monitored Pages with Labels, Folders, and Workspaces]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/organized-page-monitoring" />
            <id>https://pagecrawl.io/60</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Organize Monitored Pages with Labels, Folders, and Workspaces</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/folders-labels-view.png" alt="A tracked-pages list grouped into folders (Competitors, Compliance) with color-coded labels like Pricing, High priority, Product, and Legal on each page">
</div>
<p>PageCrawl provides three levels of organization for your monitored pages: labels for tagging, folders for grouping, and workspaces for separating entire environments.</p>
<h3>Labels</h3>
<p>Labels are color-coded tags you can attach to any monitored page. Each label has a name, optional description, and a color.</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Details</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Colors</strong></td>
<td>Each label has a hex color, auto-generated if not specified</td>
</tr>
<tr>
<td><strong>Multiple labels per page</strong></td>
<td>Attach as many labels as needed</td>
</tr>
<tr>
<td><strong>Filtering</strong></td>
<td>Filter your page list by one or more labels</td>
</tr>
<tr>
<td><strong>Bulk tagging</strong></td>
<td>Apply labels to multiple pages at once via <a href="/help/features/article/bulk-edit-pages">Bulk Edit</a></td>
</tr>
<tr>
<td><strong>Workspace-scoped</strong></td>
<td>Labels belong to a workspace and are not shared across workspaces</td>
</tr>
</tbody>
</table>
<p>To manage labels, go to any page list and use the label filter, or manage them when editing a page in the <strong>Organize</strong> section.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-organize.png" alt="Organize section of the page editor with Labels and Folder selectors for grouping monitored pages">
</div>
<p>Labels can also be applied automatically by AI. See <a href="/help/features/article/ai-powered-change-detection#ai-label-automation">AI Label Automation</a> for details.</p>
<h3>Folders</h3>
<p>Folders let you group pages into a nested hierarchy with unlimited depth. Each folder belongs to a workspace and can contain both pages and sub-folders.</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Details</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Nested hierarchy</strong></td>
<td>Create sub-folders at any depth</td>
</tr>
<tr>
<td><strong>Page counts</strong></td>
<td>Each folder shows the total number of pages, including those in sub-folders</td>
</tr>
<tr>
<td><strong>Bulk move</strong></td>
<td>Move multiple pages to a folder via <a href="/help/features/article/bulk-edit-pages">Bulk Edit</a></td>
</tr>
<tr>
<td><strong>URL slugs</strong></td>
<td>Each folder has a unique slug for direct navigation</td>
</tr>
</tbody>
</table>
<h3>Workspaces</h3>
<p>Workspaces are separate environments within your account. Each workspace has its own pages, folders, labels, notification settings, schedule, and integrations.</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Details</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Separate everything</strong></td>
<td>Pages, folders, labels, webhooks, and settings are workspace-scoped</td>
</tr>
<tr>
<td><strong>Team access</strong></td>
<td>Invite team members to specific workspaces</td>
</tr>
<tr>
<td><strong>Independent settings</strong></td>
<td>Each workspace has its own notification channels, schedule, AI configuration, and integrations</td>
</tr>
<tr>
<td><strong>Quick switching</strong></td>
<td>Switch between workspaces from the sidebar</td>
</tr>
</tbody>
</table>
<p>Use workspaces to separate monitoring by team, client, project, or environment (e.g., production vs staging).</p>
<h3>Creating and Managing</h3>
<table>
<thead>
<tr>
<th>Action</th>
<th>Where</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Create a folder</strong></td>
<td>Click the folder icon in the page list sidebar</td>
</tr>
<tr>
<td><strong>Create a label</strong></td>
<td>When editing a page, or via the label filter</td>
</tr>
<tr>
<td><strong>Create a workspace</strong></td>
<td><strong>Settings</strong> &gt; <strong>Team</strong> &gt; <strong>Workspaces</strong></td>
</tr>
<tr>
<td><strong>Switch workspace</strong></td>
<td>Sidebar workspace selector</td>
</tr>
<tr>
<td><strong>Bulk assign labels/folders</strong></td>
<td>Select pages &gt; <strong>Bulk Edit</strong></td>
</tr>
</tbody>
</table>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/bulk-edit-pages">Bulk Edit</a> - Apply labels, folders, and settings to multiple pages at once</li>
<li><a href="/help/features/article/advanced-configuration">Advanced Configuration</a> - Templates and workspace settings</li>
<li><a href="/help/features/article/page-check-schedule">Check Scheduling</a> - Configure per-workspace monitoring schedules</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Real Browser Monitoring and Engine Selection]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/what-is-real-browser-page-monitoring" />
            <id>https://pagecrawl.io/61</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Real Browser Monitoring and Engine Selection</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/engines-comparison.png" alt="Comparison of PageCrawl's three engines (Default, Fast, Stealth) across speed, JavaScript rendering, screenshots, bot-protection bypass, and what each is best for">
</div>
<p>PageCrawl renders web pages using a real browser, executing JavaScript and loading dynamic content exactly as a visitor would see it. You can choose between three engine modes depending on the page you are monitoring.</p>
<h3>Available Engines</h3>
<table>
<thead>
<tr>
<th>Engine</th>
<th>Best For</th>
<th>How It Works</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Default</strong></td>
<td>Most websites</td>
<td>Full browser with JavaScript rendering</td>
</tr>
<tr>
<td><strong>Stealth</strong></td>
<td>Bot-protected pages</td>
<td>Enhanced mode for reliably accessing protected pages</td>
</tr>
<tr>
<td><strong>Fast</strong></td>
<td>Static pages, speed</td>
<td>Optimized for speed when JavaScript rendering is not needed</td>
</tr>
</tbody>
</table>
<h3>Default Engine</h3>
<p>The default engine loads pages using a real browser. It processes JavaScript, waits for dynamic content, handles cookies, and renders the page as a real user would see it. This works for the majority of websites.</p>
<h3>Stealth Mode</h3>
<p>Some websites use bot protection services that block automated access. Stealth mode is designed to reliably access these pages.</p>
<p>PageCrawl can automatically switch to Stealth mode in several situations: on the first check of a new monitor, when workspace auto-stealth is enabled, or for price and availability monitors. It also activates when a page is blocked (timeout, 403 Forbidden, or 401 Unauthorized). You can also enable it manually per page.</p>
<h3>Fast Mode</h3>
<p>Fast mode is optimized for speed when JavaScript rendering is not needed, making it significantly faster and more resource-efficient. Use this for:</p>
<ul>
<li>Static HTML pages that do not rely on JavaScript</li>
<li>API responses and JSON endpoints</li>
<li>Pages where you only need text or HTML content</li>
<li>High-frequency monitoring where speed matters</li>
<li><strong>Very dynamic, JavaScript-heavy pages that change constantly</strong> (rotating banners, injected ads, live counters, A/B-tested widgets). Because Fast fetches the raw HTML <em>without executing JavaScript</em>, it skips all that script-driven churn and only reports changes in the underlying source. Switching a noisy page from Default to Fast can dramatically cut false positives.</li>
</ul>
<p>Fast mode supports Full Page, Full Page (iframe), Text, Text (all matches), Text (all matches sorted), Number, Price, Rating, Reviews, HTML, HTML (all matches), Boolean/Text Presence, Availability, Links, Feed, and SEO Tags element types. It does not support Visual comparison, screenshots, or actions (click, scroll, type).</p>
<h3>Choosing the Right Engine</h3>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Recommended Engine</th>
</tr>
</thead>
<tbody>
<tr>
<td>Standard website</td>
<td>Default</td>
</tr>
<tr>
<td>JavaScript-heavy SPA</td>
<td>Default</td>
</tr>
<tr>
<td>Bot-protected page</td>
<td>Stealth</td>
</tr>
<tr>
<td>Page returning 403 or timeouts</td>
<td>Stealth</td>
</tr>
<tr>
<td>Static HTML page</td>
<td>Fast</td>
</tr>
<tr>
<td>Noisy page with constant JavaScript-driven changes</td>
<td>Fast (ignores the JS churn)</td>
</tr>
<tr>
<td>API or JSON endpoint</td>
<td>Fast</td>
</tr>
<tr>
<td>Need screenshots or visual diff</td>
<td>Default or Stealth</td>
</tr>
<tr>
<td>High-frequency checks (every 5 min)</td>
<td>Fast (if page allows)</td>
</tr>
</tbody>
</table>
<h3>Configuration</h3>
<p>Set the engine per page in the page editor under <strong>Power User</strong> settings, or apply it in bulk via <a href="/help/features/article/bulk-edit-pages">Bulk Edit</a>.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/monitoring-pages-behind-cloudflare-bot-protection">Monitoring Pages Behind Bot Protection</a> - Handling bot-protected pages</li>
<li><a href="/help/features/article/custom-proxies">Custom Proxies</a> - Use your own proxy servers</li>
<li><a href="/help/features/article/advanced-configuration">Advanced Configuration</a> - Power User mode and engine selection</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Add to PageCrawl.io bookmark]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/add-to-pagecrawl-bookmarklet" />
            <id>https://pagecrawl.io/62</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>"Add to PageCrawl.io" bookmark</h1>
<div class="kb-figure kb-figure--flush">
  <a href="javascript:(function()%7Bvar%20currentUrl%20%3D%20encodeURIComponent(window.location.href)%3Bvar%20pageTitle%20%3D%20encodeURIComponent(document.title)%3Bwindow.location.href%20%3D%20'https%3A%2F%2Fpagecrawl.io%2Fapp%2Fpages%2Fcreate%3Furl%3D'%20%2B%20currentUrl%20%2B%20'%26title%3D'%20%2B%20pageTitle%3B%7D)()%3B" title="Add to PageCrawl.io" style="cursor:grab;display:inline-block;">
    <img src="/images/knowledge/bookmarklet-drag.png" alt="Dragging the Track in PageCrawl bookmarklet up into the browser's bookmarks bar" style="pointer-events:none;">
  </a>
</div>
<p style="text-align:center;color:#64748b;font-size:14px;margin-top:-8px;">Drag the image above straight into your bookmarks bar to install the bookmarklet.</p>
<h3>What is This Bookmarklet?</h3>
<p>This bookmarklet is a quick tool for adding any webpage to your PageCrawl.io account in one click. By saving and clicking the bookmarklet while browsing, you’ll instantly open the PageCrawl.io "Track New Page" form with the URL and title of the current page already filled in for you.</p>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page form that opens with the URL prefilled, including Bookmarklet among the other ways to add pages">
</div>
<h3>Why Use This?</h3>
<p>If you often add new pages to PageCrawl.io, this bookmarklet can save you time by:</p>
<ul>
<li>Skipping the need to copy-paste URLs and titles.</li>
<li>Reducing clicks to navigate through PageCrawl.io’s interface.</li>
<li>Allowing you to add new pages directly from the page you’re currently on.</li>
</ul>
<h3>How to Save the Bookmarklet</h3>
<p>To save, simply drag the link above to your bookmarks bar, or right-click and select "Bookmark This Link."</p>
<p><a href="javascript:(function()%7Bvar%20currentUrl%20%3D%20encodeURIComponent(window.location.href)%3Bvar%20pageTitle%20%3D%20encodeURIComponent(document.title)%3Bwindow.location.href%20%3D%20&#039;https%3A%2F%2FPageCrawl.io%2Fapp%2Fpages%2Fcreate%3Furl%3D&#039;%20%2B%20currentUrl%20%2B%20&#039;%26title%3D&#039;%20%2B%20pageTitle%3B%7D">Add to PageCrawl.io</a>()%3B)</p>
<h3>How to Use the Bookmarklet</h3>
<p>When you’re on a page you want to track in PageCrawl.io:</p>
<ul>
<li>Click the "Add to PageCrawl.io" bookmark in your bookmarks bar.</li>
<li>PageCrawl.io will open with the URL and title of the new page prefilled.</li>
<li>Review or edit the details as needed, then save the page to your account.</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitor Page Changes via RSS Feeds]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/page-monitoring-rss-feeds" />
            <id>https://pagecrawl.io/63</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitor Page Changes via RSS Feeds</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/rss-feeds-flow.png" alt="Data flow: web pages are monitored by PageCrawl, which publishes detected changes to an RSS/Atom feed that RSS readers and automation tools subscribe to">
</div>
<p>PageCrawl can generate RSS feeds for your monitored pages, allowing you to follow detected changes from any RSS reader or automation tool.</p>
<p><strong>Looking to monitor an existing RSS, Atom, or sitemap feed instead?</strong> See <a href="/help/features/article/feed-tracking-mode">Feed Tracking Mode</a>, which watches a feed URL for new items and notifies you about specific additions, removals, and changes.</p>
<h3>How RSS Feeds Work</h3>
<p>Each RSS feed has a unique URL with an access code. When a monitored page detects a change, the feed is updated with the new entry. Feeds follow the Atom format and can be consumed by any standard RSS reader.</p>
<p>You can create feeds scoped to:</p>
<ul>
<li><strong>All pages in workspace</strong> - Get a combined feed of all changes across the workspace</li>
<li><strong>By tags</strong> - Include only pages with specific tags</li>
<li><strong>By folders</strong> - Include only pages in specific folders</li>
<li><strong>By website/domain</strong> - Include only pages from a specific domain</li>
<li><strong>Specific monitors</strong> - Track changes on individually selected monitors</li>
</ul>
<h3>Setting Up an RSS Feed</h3>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>RSS Feeds</strong></li>
<li>Click <strong>New RSS Feed</strong></li>
<li>Choose a scope (all pages, by tags, by folders, by website/domain, or specific monitors)</li>
<li>Copy the generated feed URL</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-rss.png" alt="RSS Feed settings page with the New RSS Feed button for generating a feed of detected changes">
</div>
<p>The feed URL contains a unique access code, so anyone with the link can view the feed without logging in. Keep feed URLs private if the monitored content is sensitive.</p>
<h3>Using Your Feed</h3>
<p>Add the feed URL to any RSS-compatible tool:</p>
<table>
<thead>
<tr>
<th>Tool Type</th>
<th>Examples</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>RSS readers</strong></td>
<td>Feedly, Inoreader, NewsBlur</td>
</tr>
<tr>
<td><strong>Automation platforms</strong></td>
<td>n8n, Zapier, Make</td>
</tr>
<tr>
<td><strong>Dashboards</strong></td>
<td>Custom widgets, internal portals</td>
</tr>
<tr>
<td><strong>Browser extensions</strong></td>
<td>RSS reader extensions for Chrome or Firefox</td>
</tr>
</tbody>
</table>
<h3>Managing Feeds</h3>
<table>
<thead>
<tr>
<th>Action</th>
<th>How</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>List feeds</strong></td>
<td>Go to Settings &gt; RSS Feeds</td>
</tr>
<tr>
<td><strong>Create feed</strong></td>
<td>Click Create Feed and select options</td>
</tr>
<tr>
<td><strong>Delete feed</strong></td>
<td>Click the delete button next to the feed</td>
</tr>
</tbody>
</table>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/feed-tracking-mode">Feed Tracking Mode</a> - Monitor an existing RSS, Atom, or sitemap feed for new items</li>
<li><a href="/help/features/article/api-webhooks-for-custom-integrations">API &amp; Webhooks</a> - Programmatic access and real-time webhooks</li>
<li><a href="/help/integrations/article/webhook-integration">Webhook Integration</a> - HTTP POST notifications for changes</li>
<li><a href="/help/integrations/article/send-slack-notification-when-changes-detected">Slack Notifications</a> - Get change alerts in Slack</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[API and Webhooks for Custom Integrations]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/api-webhooks-for-custom-integrations" />
            <id>https://pagecrawl.io/64</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>API and Webhooks for Custom Integrations</h1>
<p>PageCrawl provides three ways to integrate page monitoring into your own applications and workflows: a <strong>REST API</strong> to manage monitors programmatically, <strong>webhooks</strong> for real-time change notifications, and <strong>RSS feeds</strong> for lightweight consumption.</p>
<div class="kb-figure">
  <img src="/images/knowledge/developers-api-reference.png" alt="Interactive API reference at pagecrawl.io/developers with endpoints, authentication, and copy-paste code examples">
</div>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> API and webhooks are available on paid plans (Standard or above).
</div>
<h3>Authentication</h3>
<p>All API requests require a Bearer token. Go to <strong>Settings &gt; API &gt; API Tokens</strong> and click <strong>Create Token</strong>, then copy it immediately (it is not shown again).</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-api-tokens.png" alt="API Tokens card with the tokens table, Create Token button, and the Bearer authorization note">
</div>
<p>Include the token in the <code>Authorization</code> header on every request:</p>
<pre><code>Authorization: Bearer YOUR_API_TOKEN</code></pre>
<p>For the full reference with every endpoint, parameter, and response schema, see <a href="/developers">pagecrawl.io/developers</a>.</p>
<h3>Quick Start</h3>
<h4>Step 1: Create a monitor</h4>
<p>The simplest way to start monitoring is the <code>/api/track-simple</code> endpoint. It only requires a URL.</p>
<pre><code class="language-bash">curl -X POST "https://pagecrawl.io/api/track-simple" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/pricing",
    "tracking_mode": "fullpage",
    "ai_page_focus": "Alert me if the price drops or the item goes out of stock; skip header, footer, and cookie-banner changes"
  }'</code></pre>
<pre><code class="language-python">import requests

response = requests.post(
    "https://pagecrawl.io/api/track-simple",
    headers={"Authorization": "Bearer YOUR_API_TOKEN"},
    json={"url": "https://example.com/pricing", "tracking_mode": "fullpage", "ai_page_focus": "Alert me if the price drops or the item goes out of stock; skip header, footer, and cookie-banner changes"},
)
page = response.json()
print(f"Monitoring: {page['name']} (ID: {page['id']})")</code></pre>
<pre><code class="language-javascript">const response = await fetch("https://pagecrawl.io/api/track-simple", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url: "https://example.com/pricing", tracking_mode: "fullpage", ai_page_focus: "Alert me if the price drops or the item goes out of stock; skip header, footer, and cookie-banner changes" }),
});
const page = await response.json();
console.log(`Monitoring: ${page.name} (ID: ${page.id})`);</code></pre>
<p><strong>Tracking modes:</strong> <code>fullpage</code> (all visible text, default), <code>content_only</code> (text without navigation/headers/footers), <code>reader</code> (reader-mode content), <code>price</code> (auto-detect prices), <code>specific_text</code> (requires <code>selector</code>), <code>specific_number</code> (requires <code>selector</code>).</p>
<p><strong><code>ai_page_focus</code></strong> (optional) is free text telling the AI what matters most on this page, for example <code>"Alert me if the price drops or the item goes out of stock; skip header, footer, and cookie-banner changes"</code>. It sharpens change summaries and priority scoring.</p>
<p><strong>Frequency:</strong> an optional <code>frequency</code> field (in minutes) controls how often the page is checked: <code>1440</code> for daily, <code>60</code> for hourly, <code>15</code> for every 15 minutes (depends on your plan). When omitted, it defaults to daily.</p>
<h4>Step 2: Set up a webhook</h4>
<pre><code class="language-bash">curl -X POST "https://pagecrawl.io/api/hooks" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "target_url": "https://your-server.com/webhook",
    "match_type": "all",
    "events": ["change_detected"]
  }'</code></pre>
<p><strong>Match types:</strong> <code>all</code> (every page), <code>monitors</code>, <code>tags</code>, <code>folders</code>, <code>domains</code>.
<strong>Events:</strong> <code>change_detected</code>, <code>error</code>, <code>price_change_detected</code>.</p>
<h4>Step 3: Handle webhook payloads</h4>
<p>When a change is detected, PageCrawl POSTs a JSON payload to your endpoint.</p>
<pre><code class="language-python">from flask import Flask, request

app = Flask(__name__)

@app.route("/webhook", methods=["POST"])
def handle_change():
    data = request.json
    print(f"Change detected: {data['title']}")
    print(f"Difference: {data['human_difference']}")
    if data.get("ai_summary"):
        print(f"AI Summary: {data['ai_summary']}")
    return "", 200</code></pre>
<p><strong>Key payload fields:</strong> <code>title</code>, <code>contents</code> (current value), <code>difference</code> (0-100), <code>human_difference</code>, <code>ai_summary</code>, <code>ai_priority_score</code>, <code>markdown_difference</code>, <code>page_screenshot_image</code>.</p>
<h3>API Endpoints</h3>
<table>
<thead>
<tr>
<th>Method</th>
<th>Endpoint</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>GET</code></td>
<td><code>/api/pages</code></td>
<td>List all monitored pages</td>
</tr>
<tr>
<td><code>POST</code></td>
<td><code>/api/pages</code></td>
<td>Create a new monitored page</td>
</tr>
<tr>
<td><code>GET</code></td>
<td><code>/api/pages/{slug}</code></td>
<td>Get page details and latest values</td>
</tr>
<tr>
<td><code>PUT</code></td>
<td><code>/api/pages/{id}</code></td>
<td>Update page settings</td>
</tr>
<tr>
<td><code>DELETE</code></td>
<td><code>/api/pages/{id}</code></td>
<td>Delete a monitored page</td>
</tr>
<tr>
<td><code>PUT</code></td>
<td><code>/api/pages/{id}/check</code></td>
<td>Trigger an immediate check</td>
</tr>
<tr>
<td><code>PUT</code></td>
<td><code>/api/pages/{id}/status</code></td>
<td>Enable or disable a page</td>
</tr>
<tr>
<td><code>GET</code></td>
<td><code>/api/pages/{id}/history</code></td>
<td>Get check history for a page</td>
</tr>
<tr>
<td><code>GET</code></td>
<td><code>/api/pages/{id}/checks/{checkId}/diff.markdown</code></td>
<td>Get a text diff as markdown</td>
</tr>
</tbody>
</table>
<p>For the complete endpoint list with parameters and schemas, see <a href="/developers">pagecrawl.io/developers</a>.</p>
<h3>Webhooks</h3>
<p>Webhooks send HTTP POST requests with a JSON body to your endpoint whenever a page change is detected or an error occurs. Configure them in <strong>Settings</strong> &gt; <strong>Webhooks</strong>.</p>
<table>
<thead>
<tr>
<th>Setting</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Target URL</strong></td>
<td>The HTTP endpoint that receives the POST request</td>
</tr>
<tr>
<td><strong>Event triggers</strong></td>
<td>Change detected, error, or both</td>
</tr>
<tr>
<td><strong>Page filter</strong></td>
<td>Limit to specific pages, tags, folders, or a domain, or fire for all pages</td>
</tr>
<tr>
<td><strong>Payload fields</strong></td>
<td>Select which fields to include (all by default)</td>
</tr>
</tbody>
</table>
<p>Available payload fields include page ID, title, change summary, diff data (markdown and HTML), screenshots, AI summary, AI priority score, and per-element values. See the <a href="/help/integrations/article/webhook-integration">Webhook Integration guide</a> for the full field reference and example payloads.</p>
<p><strong>Reliable delivery (automatic retries):</strong> If your endpoint is briefly unreachable, for example your server was offline for a few minutes or returned a temporary error, PageCrawl automatically retries the webhook with a backoff delay rather than dropping the event. This means a short outage on your side does not cost you the change data: once your server is back online, the queued events are delivered. Return a <code>2xx</code> status code to acknowledge receipt; any other response (or a timeout) is treated as a failure and scheduled for retry.</p>
<h3>RSS Feeds</h3>
<p>Prefer to consume changes in an RSS reader or automation tool? PageCrawl can generate an Atom feed of detected changes scoped to all pages, tags, folders, a domain, or specific monitors. See the <a href="/help/features/article/page-monitoring-rss-feeds">RSS Feeds guide</a> for setup.</p>
<h3>Download the OpenAPI Spec</h3>
<p>The full specification is available as an OpenAPI 3.0 file you can import into Postman, Insomnia, or any API client:</p>
<pre><code>https://pagecrawl.io/api/openapi.yaml</code></pre>
<h3>Common Use Cases</h3>
<ul>
<li><strong>Custom dashboards</strong> - Pull change data into your own monitoring dashboard via API</li>
<li><strong>Automation workflows</strong> - Trigger actions in n8n, Make, Zapier, or custom scripts via webhooks</li>
<li><strong>Database logging</strong> - Store all detected changes in your own database</li>
<li><strong>Alerting systems</strong> - Forward high-priority changes to PagerDuty, Opsgenie, or similar</li>
</ul>
<h3>Related Articles</h3>
<ul>
<li><a href="/developers">Full API Reference</a> - Interactive OpenAPI reference with every endpoint and schema</li>
<li><a href="/help/integrations/article/webhook-integration">Webhook Integration</a> - Detailed webhook setup, payload reference, and testing</li>
<li><a href="/help/tutorials/article/reference-implementations">Advanced Integrations</a> - Copy-paste polling and webhook code in Python, Node.js, and PHP</li>
<li><a href="/help/integrations/article/pagecrawl-zapier-integration">Zapier Integration</a> - Connect PageCrawl to 5,000+ apps</li>
<li><a href="/help/integrations/article/pagecrawl-n8n-integration">n8n Integration</a> - Open-source workflow automation</li>
<li><a href="/help/features/article/page-monitoring-rss-feeds">RSS Feeds</a> - Subscribe to changes via RSS</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How to Monitor Pages That Require OS Selection]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/monitor-pages-with-automatic0os-detection" />
            <id>https://pagecrawl.io/65</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How to Monitor Pages That Require OS Selection</h1>
<p>When monitoring pages that adjust their content based on the user's operating system, like those displaying OS-specific downloads or drivers, you might encounter challenges. Some sites perform OS detection and require interaction to display the desired information. Here's how you can effectively monitor such pages using PageCrawl.io.</p>
<h2>Two Approaches to Handle OS Detection</h2>
<p>There are two main ways to handle pages that require OS selection:</p>
<h3>1. Set a Custom User Agent</h3>
<p>You can configure PageCrawl to use a specific User Agent string that mimics a Windows browser. This approach is simple and works for most basic OS detection scenarios.</p>
<div class="kb-figure">
  <img src="/images/blog/user-agent-setting.png" alt="User Agent setting in page advanced preferences">
</div>
<p><strong>How to set it up:</strong></p>
<ul>
<li>Navigate to your page's Advanced Preferences</li>
<li>Set the User Agent to a Windows 10/11 browser string, for example:<pre><code>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.5735.199 Safari/537.36</code></pre>
</li>
</ul>
<p><strong>Advantages:</strong></p>
<ul>
<li>Quick and easy to implement</li>
<li>Works reliably for basic OS detection</li>
<li>No complex configuration required</li>
</ul>
<p><strong>Limitations:</strong></p>
<ul>
<li>Cannot distinguish between Windows 10 and Windows 11</li>
<li>May not work with sophisticated detection methods</li>
<li>Limited control over specific OS version selection</li>
<li>Older User Agent versions may be blocked by security/bot detection tools used by websites</li>
</ul>
<h3>2. Use Actions to Interact with OS Selection Forms</h3>
<p>For pages with dropdown menus or forms where you need to select a specific OS version, you can use PageCrawl's Actions feature to automate the selection process.</p>
<p><strong>How to set it up:</strong></p>
<ol>
<li>Navigate to your page's Actions settings</li>
<li>Create click actions on the appropriate selectors</li>
<li>Configure the sequence to:<ul>
<li>Click on the OS dropdown/selector</li>
<li>Select your specific OS version</li>
<li>Submit the form if required</li>
</ul>
</li>
</ol>
<p><strong>Example scenario:</strong>
If a driver download page has a form with OS selection dropdown, you can:</p>
<ol>
<li>Add an action to click on the OS dropdown selector</li>
<li>Add an action to click on "Windows 11" option</li>
<li>Add an action to click the submit button</li>
</ol>
<p><strong>Advantages:</strong></p>
<ul>
<li>Precise control over OS version selection</li>
<li>Can handle complex multi-step forms</li>
<li>Works with any type of OS selection interface</li>
</ul>
<p><strong>Limitations:</strong></p>
<ul>
<li>More complex to set up initially</li>
<li>May need adjustments if the page structure changes</li>
<li>Requires identifying the correct CSS selectors</li>
</ul>
<h2>Which Method Should You Choose?</h2>
<ul>
<li>
<p><strong>Use the User Agent method</strong> if:</p>
<ul>
<li>The site only needs basic OS detection</li>
<li>You don't need to distinguish between specific OS versions</li>
<li>You want a quick, maintenance-free solution</li>
</ul>
</li>
<li>
<p><strong>Use the Actions method</strong> if:</p>
<ul>
<li>You need to select a specific OS version (e.g., Windows 11 vs Windows 10)</li>
<li>The page has a form or dropdown for OS selection</li>
<li>The User Agent method doesn't work for your specific page</li>
</ul>
</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Available Tracked Element Types]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/available-tracked-monitoring-types" />
            <id>https://pagecrawl.io/66</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Available Tracked Element Types</h1>
<div class="kb-figure">
  <img src="/images/knowledge/simple-what-to-track.png" alt="What to Track panel with the element type options (Full Page Text, Specific Area, Visual, Price, Feed, Parse) highlighted">
</div>
<p>When monitoring changes on a webpage, the type of tracked element selected defines what kind of content will be tracked and how updates are detected. The six standard types below appear in the standard editor under <strong>What to Track</strong>. Everything else is available in <strong>Advanced mode</strong>, where you can also track several elements on the same page.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> You can configure <strong>multiple tracked elements on a single page</strong> and mix types freely. For example, track a product's Price, Availability, and Rating all at once, each with its own threshold, conditions, and notifications. Add as many as you need in <strong>Advanced mode</strong>.
</div>
<h3>Standard Element Types</h3>
<p>These appear in the standard editor under <strong>What to Track</strong> as soon as you add a page.</p>
<h4>Full Page Text</h4>
<ul>
<li><strong>Description:</strong> Tracks all visible text on the entire webpage.</li>
<li><strong>Use Case:</strong> Useful for capturing comprehensive textual content.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-fullpage.png" alt="Full Page Text example: a web page on the left, and PageCrawl's text diff showing the changed line on the right">
</div>
<h4>Text (Specific Area)</h4>
<ul>
<li><strong>Description:</strong> Monitors text changes in a specified area of a webpage.</li>
<li><strong>Important Note:</strong> Only the first element matching the selector is tracked.</li>
<li><strong>Use Case:</strong> Ideal for tracking text in specific areas, like headlines or descriptions.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-text.png" alt="Text example: a status page on the left, and PageCrawl's red/green text diff of the tracked text on the right">
</div>
<h4>Visual</h4>
<ul>
<li><strong>Description:</strong> Monitors and alerts on visual changes in a specified area.</li>
<li><strong>Note:</strong> This is a beta feature; report any issues encountered.</li>
<li><strong>Use Case:</strong> Ideal for tracking visual changes like layout updates or style changes.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-visual.png" alt="Visual example: a web page on the left, and PageCrawl's Visually Compare control with a difference-percentage progress bar on the right">
</div>
<h4>Price</h4>
<ul>
<li><strong>Description:</strong> Detects and extracts the first price found on the page.</li>
<li><strong>Limitation:</strong> May not work well on pages with multiple prices.</li>
<li><strong>Use Case:</strong> Monitoring product prices on e-commerce websites.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-price.png" alt="Price example: a product page on the left, and PageCrawl showing the old price struck through, a down arrow, the new price, the percent change, and a sparkline">
</div>
<h4>Feed / List</h4>
<ul>
<li><strong>Description:</strong> Tracks entries from RSS or Atom feeds, detecting new, removed, or changed items.</li>
<li><strong>Use Case:</strong> Monitoring blog feeds, news feeds, or any structured list for new entries and updates.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-feed.png" alt="Feed example: a blog feed on the left, and PageCrawl listing the newly added feed items on the right">
</div>
<h4>Parse (AI Extract)</h4>
<ul>
<li><strong>Description:</strong> Uses AI to extract the specific information you describe in plain language (for example "the event date" or "the lowest price"), even when there is no clean selector to target.</li>
<li><strong>Use Case:</strong> Pulling a single fact or field out of an unstructured page; the extracted value is then tracked for changes.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-parse.png" alt="Parse example: an event page on the left, and PageCrawl's AI-extracted event date shown as a diff on the right">
</div>
<h3>Advanced Mode Element Types</h3>
<p>Switch to <strong>Advanced mode</strong> (the link at the top of the page editor) to open the full <strong>TYPE</strong> dropdown, track several elements on one page, and use the additional types below.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-tracked-elements.png" alt="Tracked Elements section in Advanced mode with the TYPE, ELEMENT, and THRESHOLD selectors">
</div>
<h4>Number</h4>
<ul>
<li><strong>Description:</strong> Extracts and monitors numeric values in a specific webpage area.</li>
<li><strong>Features:</strong> Provides basic statistical analysis and visual graphs.</li>
<li><strong>Use Case:</strong> Useful for tracking numbers, such as stock levels or scores.</li>
<li><strong>Also in the standard editor:</strong> You can enable Number tracking from <strong>Text</strong> mode in the standard create flow, without switching to Advanced mode.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-number.png" alt="Number example: a star count on the left, and PageCrawl showing the old value struck through, an up arrow, the new value, the percent change, and a sparkline">
</div>
<h4>Availability</h4>
<ul>
<li><strong>Description:</strong> Tracks the availability status of a product on the page.</li>
<li><strong>Use Case:</strong> Monitoring whether a product is in stock, out of stock, or on pre-order.</li>
<li><strong>Also in the standard editor:</strong> You can enable Availability from <strong>Price</strong> mode in the standard create flow, without switching to Advanced mode.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-availability.png" alt="Availability example: a product page on the left, and PageCrawl showing 'Out of Stock (was: In Stock)' on the right">
</div>
<h4>Rating</h4>
<ul>
<li><strong>Description:</strong> Tracks the product rating displayed on the page.</li>
<li><strong>Use Case:</strong> Monitoring changes to product ratings on review sites or e-commerce platforms.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-rating.png" alt="Rating example: a product's star rating on the left, and PageCrawl showing the old rating, a down arrow, the new rating, and the percent change on the right">
</div>
<h4>Reviews</h4>
<ul>
<li><strong>Description:</strong> Tracks the review count displayed on the page.</li>
<li><strong>Use Case:</strong> Monitoring how many reviews a product has received over time.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-number.png" alt="Reviews example: a review count on the left, and PageCrawl showing the old count, an up arrow, the new count, the percent change, and a sparkline on the right">
</div>
<h4>Links</h4>
<ul>
<li><strong>Description:</strong> Tracks internal and external links originating from a webpage.</li>
<li><strong>Use Case:</strong> Ideal for monitoring link changes on resource-heavy websites.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-links.png" alt="Links example: a navigation menu on the left, and PageCrawl listing added and removed links on the right">
</div>
<h4>Iframes</h4>
<ul>
<li><strong>Description:</strong> Monitors embedded content within <code>&lt;iframe&gt;</code> elements.</li>
<li><strong>Important Note:</strong> May cause issues in some cases if “Hide cookie banners &amp; block ads” is enabled.</li>
<li><strong>Use Case:</strong> Useful for monitoring third-party embedded content.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-iframe.png" alt="Iframe example: a page with an embedded widget on the left, and PageCrawl's text diff of the iframe content on the right">
</div>
<h4>HTML</h4>
<ul>
<li><strong>Description:</strong> Monitors changes in the HTML content of a specific section.</li>
<li><strong>Important Note:</strong> Focus on narrowly defined areas to avoid false positives.</li>
<li><strong>Use Case:</strong> Useful for tracking changes in webpage structure or layout.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-html.png" alt="HTML example: page source on the left, and PageCrawl's text diff of the changed markup on the right">
</div>
<h4>Text (All Matches)</h4>
<ul>
<li><strong>Description:</strong> Tracks all elements matching the selector (not just the first).</li>
<li><strong>Use Case:</strong> Useful for tracking lists, tables, or repeated content blocks.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-textmulti.png" alt="Text (All Matches) example: a list of roles on the left, and PageCrawl's diff showing an added and a removed entry on the right">
</div>
<h4>Text (All Matches, Sorted)</h4>
<ul>
<li><strong>Description:</strong> Similar to “Text (All Matches)” but sorts results alphabetically.</li>
<li><strong>Use Case:</strong> Reduces false positives for frequently reordered elements like product listings.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-textmulti.png" alt="Text (All Matches, Sorted) example: matched entries on the left, and PageCrawl's diff of the sorted set on the right">
</div>
<h4>HTML (All Matches)</h4>
<ul>
<li><strong>Description:</strong> Tracks all matching HTML elements on the page.</li>
<li><strong>Use Case:</strong> Ideal for monitoring multiple dynamic sections.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-html.png" alt="HTML (All Matches) example: page source on the left, and PageCrawl's text diff of the markup on the right">
</div>
<h4>Text Presence</h4>
<ul>
<li><strong>Description:</strong> Searches the full page for specific keywords and returns a simple Yes/No result.</li>
<li><strong>How it Works:</strong> Enter comma-separated keywords. Returns "Yes" if ANY keyword is found on the page, "No" otherwise. The search is case-insensitive.</li>
<li><strong>Invert Option:</strong> Enable "Invert" to reverse the logic - returns "Yes" when NONE of the keywords are found.</li>
<li><strong>Use Cases:</strong><ul>
<li><strong>Stock Availability:</strong> Monitor for "sold out", "out of stock" keywords</li>
<li><strong>Product Status:</strong> Track "discontinued", "pre-order", "coming soon" status</li>
<li><strong>Content Monitoring:</strong> Detect when specific text appears or disappears</li>
<li><strong>Back in Stock Alerts:</strong> Invert "sold out" to detect when product becomes available</li>
<li><strong>Compliance:</strong> Check for required disclaimers or legal text</li>
</ul>
</li>
<li><strong>Best Practice:</strong> Combine with other tracked elements (like Price or Text) to get both the status and the content.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-availability.png" alt="Text Presence example: a page on the left, and PageCrawl showing a Yes/No state with the previous value on the right">
</div>
<h4>PDF File</h4>
<ul>
<li><strong>Description:</strong> Tracks text content within PDF files.</li>
<li><strong>Limitation:</strong> Use "File Checksum" if text extraction is not possible.</li>
<li><strong>Use Case:</strong> Monitoring changes in documents like manuals or policies.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-pdf.png" alt="PDF example: a PDF document on the left, and PageCrawl's text diff of the extracted text on the right">
</div>
<h4>Word File</h4>
<ul>
<li><strong>Description:</strong> Tracks text content within Word documents.</li>
<li><strong>Use Case:</strong> Ideal for tracking updates in editable text documents.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-word.png" alt="Word example: a Word document on the left, and PageCrawl's text diff of the extracted text on the right">
</div>
<h4>Excel and CSV Files</h4>
<ul>
<li><strong>Description:</strong> Monitors content within spreadsheets.</li>
<li><strong>Use Case:</strong> Useful for tracking data changes in structured formats.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-csv.png" alt="Excel/CSV example: a spreadsheet on the left, and PageCrawl's table diff highlighting the changed cell on the right">
</div>
<h4>File Checksum</h4>
<ul>
<li><strong>Description:</strong> Computes and compares SHA-256 checksums to detect file changes.</li>
<li><strong>Limitation:</strong> Does not preview specific changes; manual review required.</li>
<li><strong>Use Case:</strong> Best for unsupported file formats or non-readable PDFs.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-checksum.png" alt="File Checksum example: a download with a SHA-256 hash on the left, and PageCrawl showing the old and new checksum on the right">
</div>
<h4>WHOIS Record</h4>
<ul>
<li><strong>Description:</strong> Tracks domain WHOIS registration data including registrar, expiration date, and name servers.</li>
<li><strong>Use Case:</strong> Monitoring domain ownership changes, expiration dates, or registrar transfers.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-whois.png" alt="WHOIS example: a WHOIS record on the left, and PageCrawl's text diff of the changed expiry date on the right">
</div>
<h4>SEO Tags</h4>
<ul>
<li><strong>Description:</strong> Tracks key SEO-related elements on a page including the title tag, meta description, canonical URL, robots directives, H1 heading, and Open Graph tags.</li>
<li><strong>Use Case:</strong> Monitoring competitor SEO changes, ensuring your own pages maintain correct metadata, or detecting unintended SEO regressions.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-seo.png" alt="SEO Tags example: a page's head tags on the left, and PageCrawl's text diff of the changed title tag on the right">
</div>
<h4>JSON Path</h4>
<ul>
<li><strong>Description:</strong> Extracts and monitors a value from a JSON API response using a JSON path, or tracks the whole JSON document.</li>
<li><strong>Use Case:</strong> Monitoring API endpoints, status feeds, or config files. Changes are shown as a structured JSON diff.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-json.png" alt="JSON Path example: a JSON response on the left, and PageCrawl's colour-coded JSON diff showing the changed field on the right">
</div>
<h4>HTTP Status</h4>
<ul>
<li><strong>Description:</strong> Monitors the HTTP response status code a URL returns (200, 301, 404, 503, and so on).</li>
<li><strong>Use Case:</strong> Uptime and endpoint health checks. The status is colour-coded by class (2xx, 3xx, 4xx, 5xx) and the previous code is shown when it changes.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-http.png" alt="HTTP Status example: a health-check endpoint on the left, and PageCrawl showing '200 OK → 503 Service Unavailable' on the right">
</div>
<h4>JavaScript</h4>
<ul>
<li><strong>Description:</strong> Executes a JavaScript function to return results.</li>
<li><strong>Skill Level:</strong> Requires programming expertise.</li>
<li><strong>Use Case:</strong> Ideal for advanced users needing custom tracking logic. See the <a href="/help/features/article/javascript-tracking-and-actions">JavaScript Tracked Elements guide</a> for examples.</li>
</ul>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/js-code-editor.png" alt="JavaScript example: selecting the JavaScript element type and entering code in the editor, whose return value is then tracked">
</div>
<hr />
<p>Each tracked element type serves a unique purpose. Understanding these differences helps select the right type for specific monitoring needs, ensuring accuracy and reducing false positives. For more detailed guidance, refer to the tooltips within the interface or contact support for assistance.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[AI-Powered Change Detection and Smart Filtering]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/ai-powered-change-detection" />
            <id>https://pagecrawl.io/69</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>AI-Powered Change Detection and Smart Filtering</h1>
<p>PageCrawl.io includes AI-powered analysis for all users. Every plan comes with monthly AI credits that work automatically with zero setup. When a page changes, AI summarizes what happened and scores how important the change is, so you only get notified about what matters.</p>
<p>For users who need more, you can also bring your own API key (BYOK) for unlimited AI usage and full model control.</p>
<h2>AI Credits</h2>
<p>Every plan includes monthly AI credits, visible in the workspace AI settings:</p>
<div class="kb-figure">
  <img src="/images/knowledge/integ-ai-settings.png" alt="AI Features Configuration with the AI Credits usage meter and a summary of what AI does">
</div>
<p>Every plan includes monthly AI credits:</p>
<table>
<thead>
<tr>
<th>Plan</th>
<th>Monthly Credits</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Free</strong></td>
<td>15</td>
</tr>
<tr>
<td><strong>Standard</strong></td>
<td>200 (scales with quantity)</td>
</tr>
<tr>
<td><strong>Enterprise</strong></td>
<td>1,000 (scales with quantity)</td>
</tr>
<tr>
<td><strong>Ultimate</strong></td>
<td>10,000 (scales with quantity, includes Pro tier)</td>
</tr>
</tbody>
</table>
<p>Credits are based on page size. Each 4,000-token block costs 1 credit on Basic tier or 10 credits on Pro tier (Ultimate plan only). A typical blog post uses 1-2 credits. Credits reset monthly.</p>
<p>When credits run out, page monitoring continues normally, but AI summaries and importance filtering pause until the next billing cycle. You can also switch to BYOK at any time for unlimited usage.</p>
<h2>Getting Started</h2>
<p>No setup is required. AI features are enabled by default for all workspaces:</p>
<ol>
<li>Add pages to monitor as usual</li>
<li>When changes are detected, AI automatically summarizes them and assigns importance scores</li>
<li>View your credit usage in <strong>Settings &gt; Workspace &gt; Integrations &gt; AI</strong></li>
</ol>
<p><strong>Workspace-specific</strong>: AI features are configured per workspace. You can have some workspaces with AI enabled and others without.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-ai-notify.png" alt="What matters AI panel in the page editor with the AI summarization toggle and focus instructions">
</div>
<h2>How AI Features Work</h2>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Process</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Summarization</strong></td>
<td>Change detected &gt; Content sent to AI &gt; Human-readable summary generated &gt; Included in notification</td>
</tr>
<tr>
<td><strong>Importance Scoring</strong></td>
<td>Change detected &gt; AI analyzes content &gt; Priority score assigned (0-100) &gt; Low-priority changes filtered</td>
</tr>
<tr>
<td><strong><a href="#ai-label-automation">Label Automation</a></strong></td>
<td>Change detected &gt; AI evaluates your label rules &gt; Labels automatically added or removed</td>
</tr>
</tbody>
</table>
<h2>Configuration</h2>
<h3>Available for All Users</h3>
<table>
<thead>
<tr>
<th>Setting</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Custom Instructions</strong></td>
<td>Teach AI what matters for your monitoring (max 2,000 chars)</td>
</tr>
<tr>
<td><strong>Summary Language</strong></td>
<td>Generate summaries in 19 languages: English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Ukrainian, Russian, Japanese, Korean, Chinese, Arabic, Hindi, Turkish, Lithuanian, Latvian, and Estonian</td>
</tr>
<tr>
<td><strong>Notification Threshold</strong></td>
<td>Set threshold (0-100) for Importance Scoring. Changes scoring below this still get tracked but do not trigger notifications.</td>
</tr>
</tbody>
</table>
<h3>Additional BYOK Settings</h3>
<p>These settings are available when using your own API key:</p>
<table>
<thead>
<tr>
<th>Setting</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Deep Analysis</strong></td>
<td>Send full page content to AI for better context. Uses more tokens but provides more accurate analysis. When disabled, only the changed text (diff) is sent.</td>
</tr>
<tr>
<td><strong>Run on First Check</strong></td>
<td>Get AI analysis on the initial page check, before any changes are detected</td>
</tr>
<tr>
<td><strong>AI Requests Per Month</strong></td>
<td>Set a monthly cap to control costs. When the limit is reached, AI features pause until the next month. Leave empty for unlimited.</td>
</tr>
<tr>
<td><strong>Per Page Per Day</strong></td>
<td>Limit how many AI analyses a single page can trigger in 24 hours. Prevents noisy pages from consuming your entire budget. Default: 10.</td>
</tr>
<tr>
<td><strong>Max Tokens</strong></td>
<td>Limit content size per request. If content exceeds this limit, AI analysis is skipped for that change.</td>
</tr>
</tbody>
</table>
<h3>Understanding Tokens</h3>
<p>A <strong>token</strong> is roughly 4 characters or about 3/4 of a word. With included credits, each 4,000-token block counts as 1 credit.</p>
<table>
<thead>
<tr>
<th>Page Type</th>
<th>Typical Tokens</th>
</tr>
</thead>
<tbody>
<tr>
<td>Simple (blog, article)</td>
<td>~1,000-2,000</td>
</tr>
<tr>
<td>Medium (product, news)</td>
<td>~2,000-5,000</td>
</tr>
<tr>
<td>Large (documentation)</td>
<td>~5,000-10,000</td>
</tr>
</tbody>
</table>
<h2>Using Your Own API Key (BYOK)</h2>
<p>If your included credits are not enough, or you want full control over model selection, you can connect your own API key from OpenAI, Google Gemini, Anthropic, or OpenRouter.</p>
<ol>
<li>Go to <strong>Settings &gt; Workspace &gt; Integrations &gt; AI</strong></li>
<li>Select your AI provider and enter your API key</li>
<li>Click <strong>Test Connection</strong> to verify</li>
<li>Choose your preferred model and save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/ai-byok-config.png" alt="BYOK API Configuration with AI provider, model selection, Quick Select model tiers, and a confirmed API key">
</div>
<p>When using BYOK, AI credits are not consumed and you pay your AI provider directly. See the <a href="/help/integrations/article/ai-byok-setup-guide">BYOK Setup Guide</a> for detailed instructions.</p>
<h2>Best Practices</h2>
<h3>Start Small</h3>
<ul>
<li>AI is enabled by default, so monitor your credit usage for the first few weeks</li>
<li>Check usage statistics in <strong>Settings &gt; Workspace &gt; Integrations &gt; AI</strong></li>
<li>If you need more credits, upgrade your plan or connect your own API key</li>
</ul>
<h3>Optimize Credit Usage</h3>
<ul>
<li>Use Custom Instructions to help AI focus on what matters</li>
<li>A daily cap of 10 analyses per page prevents noisy pages from consuming your budget</li>
<li>For high-volume monitoring, consider BYOK with a budget model like Gemini Flash-Lite</li>
</ul>
<h3>Choose the Right Mode</h3>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Recommendation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Getting started</td>
<td>Use included credits (no setup needed)</td>
</tr>
<tr>
<td>High-volume pages</td>
<td>Enable Importance Scoring to filter noise</td>
</tr>
<tr>
<td>Technical pages</td>
<td>Enable Summarization for readable changes</td>
</tr>
<tr>
<td>Need unlimited AI</td>
<td>Connect your own API key (BYOK)</td>
</tr>
<tr>
<td>Critical pages</td>
<td>Use BYOK with premium models (GPT-5.5, Claude Sonnet)</td>
</tr>
</tbody>
</table>
<h2>AI Label Automation</h2>
<p>AI can automatically apply or remove labels on detected changes based on rules you define. Instead of manually categorizing changes, the AI reads each change and decides which labels to add or remove according to your instructions.</p>
<h3>How to Set It Up</h3>
<ol>
<li>Go to <strong>Settings &gt; Workspace &gt; Labels</strong></li>
<li>Scroll to the <strong>AI Label Automation</strong> section</li>
<li>Click <strong>Add Rule</strong> to create a label/instruction pair</li>
<li>For each rule, choose a label name and write a plain-language instruction explaining when the AI should apply it</li>
<li>Click <strong>Save Changes</strong></li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/ai-label-automation.png" alt="AI Label Automation section with a label/instruction rule pairing the Price Drop label to an instruction, plus the Add Rule button">
</div>
<p>You can configure up to 10 label rules per workspace.</p>
<h3>How It Works</h3>
<p>Each time a change is detected and AI analysis runs, the AI evaluates the change against your label rules and decides which labels to add or remove. The AI receives the current labels on the page, so it can remove labels that no longer apply (e.g., removing "Out of Stock" when a product is back in stock).</p>
<p>Labels are applied to the change record, making them available for filtering on the <a href="/help/features/article/review-board">Review Board</a> and in your page list.</p>
<h3>Example Rules</h3>
<table>
<thead>
<tr>
<th>Label</th>
<th>Instruction</th>
</tr>
</thead>
<tbody>
<tr>
<td>Breaking News</td>
<td>Apply when urgent or breaking news appears</td>
</tr>
<tr>
<td>Policy Update</td>
<td>Apply when terms, policies, or legal text changes</td>
</tr>
<tr>
<td>New Event</td>
<td>Apply when a new conference or event is announced</td>
</tr>
<tr>
<td>Job Posted</td>
<td>Apply when new job listings are added</td>
</tr>
<tr>
<td>Content Removed</td>
<td>Apply when significant content is deleted from the page</td>
</tr>
</tbody>
</table>
<h3>Important Notes</h3>
<ul>
<li>AI can only manage labels defined in your automation rules. Manually applied labels are never touched.</li>
<li>Label names have a maximum of 50 characters; instructions have a maximum of 500 characters.</li>
<li>Labels are created automatically if they do not already exist in your workspace.</li>
<li>AI Label Automation requires AI to be configured for the workspace (either included credits or BYOK).</li>
<li>Label decisions run as part of the standard AI analysis, so no additional credits are used beyond the normal change analysis.</li>
</ul>
<h2>Security and Privacy</h2>
<table>
<thead>
<tr>
<th>Aspect</th>
<th>Details</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Included credits</strong></td>
<td>Content is processed through PageCrawl's managed AI infrastructure</td>
</tr>
<tr>
<td><strong>BYOK mode</strong></td>
<td>Content is sent directly to your chosen AI provider</td>
</tr>
<tr>
<td><strong>Storage</strong></td>
<td>AI summaries stored in PageCrawl.io for your reference</td>
</tr>
<tr>
<td><strong>Security</strong></td>
<td>All transmission via HTTPS, API keys encrypted at rest</td>
</tr>
<tr>
<td><strong>Provider policies</strong></td>
<td>Review your AI provider's data usage and retention policies when using BYOK</td>
</tr>
</tbody>
</table>
<h2>Related Articles</h2>
<ul>
<li><a href="/help/features/article/how-pagecrawl-uses-ai">How PageCrawl Uses AI</a> - An overview of where AI summaries, scoring, and labels appear, and how to control them</li>
<li><a href="/help/integrations/article/ai-byok-setup-guide">AI Integration Setup Guide (BYOK)</a> - Step-by-step guide to configure your own API keys for unlimited AI usage</li>
<li><a href="/help/tutorials/article/choosing-best-ai-model-website-monitoring">Choosing the Right AI Model for Website Monitoring</a> - Compare models and pricing for BYOK users</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[AI Integration Setup Guide - Bring Your Own Key (BYOK)]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/ai-byok-setup-guide" />
            <id>https://pagecrawl.io/70</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>AI Integration Setup Guide - Bring Your Own Key (BYOK)</h1>
<p>All PageCrawl.io plans include AI credits that work automatically with no setup required. This guide is for users who want to go beyond their included credits by connecting their own API key for unlimited AI usage, full model choice, and advanced features like Deep Analysis.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-ai-notify.png" alt="AI settings on a monitor where you describe what matters and choose the AI model">
</div>
<h2>When to Use BYOK</h2>
<p>Most users won't need BYOK since all plans include AI credits. Consider BYOK if you:</p>
<ul>
<li>Run out of credits regularly and need unlimited AI analyses</li>
<li>Want to choose a specific AI model for different page types</li>
<li>Need Deep Analysis mode (sends full page content for better context)</li>
<li>Want to use premium models like GPT-5.5 or Claude Opus 4.8 for critical pages</li>
<li>Monitor sensitive content and need a specific provider's data policies</li>
</ul>
<p>You can switch between credits and BYOK at any time in your settings.</p>
<h2>Supported Providers and Models</h2>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Recommended Model</th>
<th>Best For</th>
<th>Get API Key</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>OpenAI</strong></td>
<td>GPT-5 Mini</td>
<td>Best value for most users</td>
<td><a href="https://platform.openai.com/api-keys">platform.openai.com</a></td>
</tr>
<tr>
<td><strong>Google Gemini</strong></td>
<td>Gemini 3 Flash</td>
<td>Balance of quality and cost</td>
<td><a href="https://ai.google.dev">ai.google.dev</a></td>
</tr>
<tr>
<td><strong>Anthropic</strong></td>
<td>Claude Haiku 4.5</td>
<td>Fast and accurate</td>
<td><a href="https://console.anthropic.com">console.anthropic.com</a></td>
</tr>
<tr>
<td><strong>OpenRouter</strong></td>
<td>Any model</td>
<td>Access 200+ models via single API</td>
<td><a href="https://openrouter.ai">openrouter.ai</a></td>
</tr>
</tbody>
</table>
<h3>OpenAI Models</h3>
<table>
<thead>
<tr>
<th>Model</th>
<th>Use Case</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>GPT-5 Mini</td>
<td>Most users</td>
<td>Best balance of cost and quality</td>
</tr>
<tr>
<td>GPT-5.5</td>
<td>Complex analysis</td>
<td>Most capable, higher cost</td>
</tr>
<tr>
<td>GPT-5 Nano</td>
<td>High volume</td>
<td>Fastest and cheapest</td>
</tr>
</tbody>
</table>
<h3>Google Gemini Models</h3>
<table>
<thead>
<tr>
<th>Model</th>
<th>Use Case</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gemini 3 Flash</td>
<td>General use</td>
<td>Good balance, default</td>
</tr>
<tr>
<td>Gemini 3.1 Pro</td>
<td>Complex tasks</td>
<td>Premium quality</td>
</tr>
<tr>
<td>Gemini 3.1 Flash Lite</td>
<td>Budget monitoring</td>
<td>Most affordable option</td>
</tr>
<tr>
<td>Gemini 2.5 Flash</td>
<td>Legacy</td>
<td>Still available, good balance</td>
</tr>
</tbody>
</table>
<h3>Anthropic Claude Models</h3>
<table>
<thead>
<tr>
<th>Model</th>
<th>Use Case</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Claude Haiku 4.5</td>
<td>Most users</td>
<td>Fast and cost-effective</td>
</tr>
<tr>
<td>Claude Sonnet 4.6</td>
<td>Complex tasks</td>
<td>Better quality, higher cost</td>
</tr>
<tr>
<td>Claude Opus 4.8</td>
<td>Critical apps</td>
<td>Highest accuracy</td>
</tr>
<tr>
<td>Claude Opus 4.7</td>
<td>Legacy</td>
<td>Still available</td>
</tr>
<tr>
<td>Claude Sonnet 4.5</td>
<td>Legacy</td>
<td>Still available</td>
</tr>
</tbody>
</table>
<h3>OpenRouter</h3>
<p>OpenRouter provides unified access to 200+ AI models from multiple providers through a single API key.</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Unified billing</strong></td>
<td>One account for all models</td>
</tr>
<tr>
<td><strong>Automatic fallbacks</strong></td>
<td>Switches models if one is unavailable</td>
</tr>
<tr>
<td><strong>Free models</strong></td>
<td>Access to Llama, Mistral, Qwen community models</td>
</tr>
<tr>
<td><strong>Pricing</strong></td>
<td>5.5% platform fee on top of base model costs</td>
</tr>
</tbody>
</table>
<p><strong>Recommended models</strong>: <code>openai/gpt-5-mini</code>, <code>anthropic/claude-sonnet-4.6</code>, <code>google/gemini-2.5-flash</code></p>
<h2>Step-by-Step Setup</h2>
<h3>Step 1: Get Your API Key</h3>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Steps</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>OpenAI</strong></td>
<td>Visit <a href="https://platform.openai.com/api-keys">platform.openai.com</a> &gt; Create account &gt; API Keys &gt; Create new secret key &gt; Add billing</td>
</tr>
<tr>
<td><strong>Google Gemini</strong></td>
<td>Visit <a href="https://ai.google.dev">ai.google.dev</a> &gt; Sign in with Google &gt; Create project &gt; Enable Gemini API &gt; Generate API key</td>
</tr>
<tr>
<td><strong>Anthropic</strong></td>
<td>Visit <a href="https://console.anthropic.com">console.anthropic.com</a> &gt; Create account &gt; API Keys &gt; Create new key &gt; Add credits</td>
</tr>
<tr>
<td><strong>OpenRouter</strong></td>
<td>Visit <a href="https://openrouter.ai">openrouter.ai</a> &gt; Create account &gt; Settings &gt; API Key &gt; Add credits</td>
</tr>
</tbody>
</table>
<h3>Step 2: Configure in PageCrawl.io</h3>
<div class="kb-figure">
  <img src="/images/knowledge/integ-ai-settings.png" alt="AI Features Configuration in workspace settings where you manage AI credits or connect your own provider key">
</div>
<ol>
<li>Go to <strong>Settings &gt; Integrations &gt; AI</strong></li>
<li>Select your AI provider</li>
<li>Paste your API key</li>
<li>Choose your preferred model</li>
<li>Click <strong>Test Key</strong> to verify</li>
<li>Save your configuration</li>
</ol>
<p>Your workspace will automatically switch to BYOK mode and AI credits will no longer be consumed.</p>
<h3>Step 3: Enable AI Features</h3>
<p>Toggle the features you want:</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>AI Summarization</strong></td>
<td>Get intelligent summaries of page changes</td>
</tr>
<tr>
<td><strong>Importance Scoring</strong></td>
<td>AI scores each change from 0-100, filtering out low-priority noise</td>
</tr>
<tr>
<td><strong>Custom Instructions</strong></td>
<td>Add context for better analysis</td>
</tr>
<tr>
<td><strong>Deep Analysis</strong></td>
<td>Send full page content for better context (BYOK only)</td>
</tr>
<tr>
<td><strong>Run on First Check</strong></td>
<td>Get AI analysis on initial page check (BYOK only)</td>
</tr>
</tbody>
</table>
<h2>Switching Back to Credits</h2>
<p>If you want to stop using your own key and return to included credits:</p>
<ol>
<li>Go to <strong>Settings &gt; Workspace &gt; Integrations &gt; AI</strong></li>
<li>Click <strong>Switch to included credits</strong></li>
<li>Your API key configuration is preserved in case you want to switch back later</li>
</ol>
<h2>Page-Level Configuration</h2>
<p>You can customize AI settings at three levels:</p>
<table>
<thead>
<tr>
<th>Level</th>
<th>Applies To</th>
<th>Best For</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Workspace default</strong></td>
<td>All pages</td>
<td>General settings</td>
</tr>
<tr>
<td><strong>Template override</strong></td>
<td>Pages using that template</td>
<td>Grouped pages (e.g., all product pages)</td>
</tr>
<tr>
<td><strong>Page override</strong></td>
<td>Individual pages</td>
<td>Critical or special pages</td>
</tr>
</tbody>
</table>
<p><strong>Example strategy</strong> (examples include, not requirements):</p>
<ul>
<li>Workspace default: a low-cost general-purpose model (for example, a Gemini Flash Lite-class model)</li>
<li>E-commerce template: a balanced cost/quality model (for example, a GPT Mini-class model)</li>
<li>Legal and ToS templates: a fast, accurate model like a Claude Haiku-class model is a good default</li>
<li>Critical contracts: for cases where accuracy matters most, use a higher-end Sonnet- or Opus-class model</li>
</ul>
<h2>Model Selection Guidelines</h2>
<h3>By Priority</h3>
<table>
<thead>
<tr>
<th>Priority</th>
<th>Model tier (examples include)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Cost optimization</strong></td>
<td>Flash Lite-class or Nano-class models (e.g., Gemini Flash Lite, GPT Nano)</td>
</tr>
<tr>
<td><strong>Accuracy</strong></td>
<td>Premium models (e.g., GPT 5.2-class, Claude Sonnet-class)</td>
</tr>
<tr>
<td><strong>Speed</strong></td>
<td>Fast small models (e.g., Claude Haiku-class, GPT Mini-class)</td>
</tr>
<tr>
<td><strong>Complex content</strong></td>
<td>Higher-end reasoning models (e.g., Claude Sonnet-class, GPT 5.2-class)</td>
</tr>
</tbody>
</table>
<h3>By Page Complexity</h3>
<p>For most pages, a <strong>general-purpose model</strong> provides excellent results at a lower cost:</p>
<table>
<thead>
<tr>
<th>Model</th>
<th>Provider</th>
<th>Best For</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gemini 3 Flash</td>
<td>Google</td>
<td>General monitoring, good balance of speed and quality</td>
</tr>
<tr>
<td>GPT-5 Mini</td>
<td>OpenAI</td>
<td>Reliable all-around performance</td>
</tr>
</tbody>
</table>
<p>For <strong>complex pages</strong> that require deeper analysis or more reasoning (e.g., dense legal documents, technical specifications, multi-section reports), choose a more powerful model:</p>
<table>
<thead>
<tr>
<th>Model</th>
<th>Provider</th>
<th>Best For</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gemini 3.1 Pro</td>
<td>Google</td>
<td>Complex documents requiring extended reasoning</td>
</tr>
<tr>
<td>GPT-5.5</td>
<td>OpenAI</td>
<td>Nuanced analysis and detailed comparisons</td>
</tr>
<tr>
<td>Claude Opus 4.8</td>
<td>Anthropic</td>
<td>Critical documents requiring highest accuracy</td>
</tr>
</tbody>
</table>
<p><strong>Note</strong>: Start with a general-purpose model and upgrade to a more powerful one if you notice the AI missing important changes or providing superficial summaries.</p>
<h3>By Content Type</h3>
<p>Examples of model tiers you can try for each content type. These are starting points, not requirements; tune to your own preferences.</p>
<table>
<thead>
<tr>
<th>Content Type</th>
<th>Budget tier</th>
<th>Balanced tier</th>
<th>Premium tier</th>
</tr>
</thead>
<tbody>
<tr>
<td>Blogs, News</td>
<td>Flash Lite-class</td>
<td>GPT Mini-class</td>
<td>-</td>
</tr>
<tr>
<td>E-commerce</td>
<td>Flash Lite-class</td>
<td>GPT Mini-class</td>
<td>Claude Haiku-class</td>
</tr>
<tr>
<td>Legal, ToS</td>
<td>Claude Haiku-class</td>
<td>Claude Sonnet-class</td>
<td>Claude Sonnet-class</td>
</tr>
<tr>
<td>API Docs</td>
<td>Flash Lite-class</td>
<td>GPT Mini-class</td>
<td>-</td>
</tr>
</tbody>
</table>
<h2>Related Articles</h2>
<ul>
<li><a href="/help/features/article/ai-powered-change-detection">AI-Powered Change Detection and Smart Filtering</a> - Learn how AI summarization and Importance Scoring work</li>
<li><a href="/help/tutorials/article/choosing-best-ai-model-website-monitoring">Choosing the Right AI Model for Website Monitoring</a> - Compare models and pricing to find the best fit</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How to Monitor Terms of Service and Privacy Policy Pages for Compliance]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/monitoring-terms-of-service-privacy-policy-compliance" />
            <id>https://pagecrawl.io/71</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How to Monitor Terms of Service and Privacy Policy Pages for Compliance</h1>
<div class="kb-figure">
  <img src="/images/knowledge/simple-what-to-track.png" alt="What to Track panel with Reader mode selected to monitor only the main legal document text">
</div>
<p>Businesses rely on numerous third-party services, each with their own Terms of Service and Privacy Policy that can change at any time. These changes might affect your compliance status, operational procedures, or legal obligations. PageCrawl.io provides an automated way to track these critical documents, ensuring you're always informed when important updates occur.</p>
<p>This guide will show you how to set up automated monitoring for legal documents using PageCrawl.io's features.</p>
<h3>Why Monitor Legal Documents</h3>
<p>When vendors update their terms without direct notification, it can impact your business in several ways. Payment processors might change their fee structures, cloud providers could modify data processing agreements, or analytics tools might update their data retention policies. Manual checking of these documents is time-consuming and prone to missing important updates.</p>
<h3>Understanding What to Monitor</h3>
<p>Legal document monitoring typically focuses on tracking changes in Terms of Service, Privacy Policies, Data Processing Agreements, and Service Level Agreements from your vendors and partners.</p>
<h3>Setting Up Compliance Monitoring in PageCrawl.io</h3>
<p>The process of setting up monitoring for legal documents is straightforward and can be completed in a few minutes per page.</p>
<h4>Step 1: Add the Legal Document Page</h4>
<ol>
<li>Log in to your PageCrawl.io dashboard</li>
<li>Click the "Track New Page" button</li>
<li>Enter the URL of the Terms of Service or Privacy Policy you want to monitor</li>
<li>Provide a descriptive name for the monitoring task (e.g., "Stripe Terms of Service" or "AWS Privacy Policy")</li>
</ol>
<h4>Step 2: Configure Detection Settings</h4>
<ol>
<li>Select "Full page text" as your detection method and enable "Reader mode" - this captures only the main text content, automatically ignoring irrelevant changes in page footers, headers, or sidebar areas</li>
<li>Set how frequently the page should be checked - daily is sufficient for most legal documents, but you can adjust based on your needs (hourly for critical vendors, weekly for stable documents)</li>
</ol>
<h4>Step 3: Set Up Notifications</h4>
<ol>
<li>Choose when to receive notifications: Instantly when changes are detected, or as a daily/weekly digest that summarizes all changes across your monitored pages</li>
<li>Select notification channels: Email, Slack, Discord, Microsoft Teams, Telegram, or Webhooks for system integration</li>
<li>Configure team notifications by adding relevant team members to receive alerts</li>
</ol>
<h3>Practical Implementation Tips</h3>
<p>Start by monitoring your most critical vendor agreements first, then gradually expand to include other services. Use clear naming conventions for your monitoring tasks to easily identify which document changed when you receive an alert.</p>
<h3>Organizing Your Monitoring Portfolio</h3>
<p>Create a structured approach to monitoring by categorizing your tracked pages. Group them into critical vendors (payment processors, infrastructure providers), data processors (analytics tools, CRM systems), and regulatory pages (government compliance guidelines).</p>
<h3>Using Tags for Better Organization</h3>
<p>Implement a tagging system from the start. Use tags like #vendor, #competitor, #gdpr, or #payment to quickly filter and manage your monitored pages. This becomes especially useful as your monitoring portfolio grows.</p>
<h3>Handling Different Types of Changes</h3>
<p>Not all changes are equal. Some updates might be minor formatting adjustments, while others could be significant legal modifications. PageCrawl.io helps you distinguish between these by highlighting exactly what changed, showing removed text in red and new text in green.</p>
<p>For each detected change, PageCrawl.io stores:</p>
<ul>
<li><strong>Screenshots</strong> of the page before and after the change</li>
<li><strong>Text differences</strong> with clear highlighting of additions and removals</li>
<li><strong>AI summaries</strong> explaining what changed in plain language (when enabled)</li>
<li><strong>Historical versions</strong> for complete audit trails</li>
</ul>
<p>This comprehensive record ensures you have all the evidence needed for compliance audits and legal reviews.</p>
<h3>Troubleshooting Common Issues</h3>
<p>If you're receiving too many alerts about minor changes, check that Reader mode is enabled to filter out navigation and footer updates. For more strategies on reducing false positives, see our guide on <a href="https://pagecrawl.io/help/reduce-false-positives/article/reduce-false-positives-monitoring-website-for-changes">reducing false positives when monitoring websites</a>.</p>
<p>If you're missing important changes, verify that the correct URL is being monitored and that the page is accessible.</p>
<h3>Next Steps</h3>
<p>Once you've set up basic monitoring, consider implementing advanced strategies such as keyword-based alerts for critical terms like "price increase" or "data breach", or comparison monitoring to track how your policies compare to competitors.</p>
<h3>Getting Started Today</h3>
<p>Begin with your most important vendor agreements. Setup takes just a minute or two per page, or you can save time by importing multiple URLs at once - simply copy and paste a list of URLs or upload an Excel file for bulk import.</p>
<p>PageCrawl.io handles the monitoring automatically once configured. You'll receive clear notifications when changes occur, allowing you to review and respond promptly to maintain compliance.</p>
<p>For businesses monitoring multiple vendors, check our <a href="https://pagecrawl.io/pricing">pricing page</a> - monitoring 500 URLs costs just $30/month, making enterprise-wide compliance monitoring affordable and efficient.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitoring Numeric Values for Changes to Spot Trends]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/monitor-numeric-values-with-number-tracker" />
            <id>https://pagecrawl.io/72</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitoring Numeric Values for Changes to Spot Trends</h1>
<div class="kb-figure">
  <img src="/images/blog/number-tracker.png" alt="number tracker">
</div>
<p>You can track numeric values on a page using the "Number" tracked element type. This extracts numbers from a selected area on the page and displays them in a chart so you can quickly see the history of values and spot trends. Instead of manually checking a number every day, PageCrawl monitors it for you and builds a visual record over time.</p>
<h3>What You Can Track</h3>
<p>Common things to monitor with the Number tracker:</p>
<ul>
<li><strong>E-commerce</strong>: Product prices, discounts, stock quantities available</li>
<li><strong>Finance</strong>: Stock prices, cryptocurrency values, exchange rates</li>
<li><strong>Analytics</strong>: Page views, visitor counts, conversion rates</li>
<li><strong>Ratings</strong>: Product ratings, review scores, customer satisfaction metrics</li>
<li><strong>Inventory</strong>: Stock levels, warehouse quantities, supply counts</li>
</ul>
<h3>Set Up on PageCrawl.io</h3>
<ol>
<li>Log in to your <a href="https://pagecrawl.io">pagecrawl.io</a> account</li>
<li>Click <strong>Track New Page</strong> and enter the URL of the page containing the number you want to monitor</li>
<li>Click <strong>Tracked Elements</strong> to add what you want to monitor</li>
<li>Select "Number" as the tracked element type</li>
<li>Use the visual selector to click directly on the number on the page, or manually enter an XPath/CSS selector if you prefer</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-tracked-elements.png" alt="Tracked Elements section where the Number element type is selected with its selector and threshold">
</div>
<p>The visual selector is the easiest way - just point and click on the number you want to track. PageCrawl will figure out the selector for you automatically.</p>
<h3>Using Selectors Manually</h3>
<p>If you prefer to manually write selectors by analyzing HTML source, here are some examples:</p>
<p>For a price like this:</p>
<pre><code class="language-html">&lt;span class="price"&gt;$49.99&lt;/span&gt;</code></pre>
<p>Use: <code>//span[@class="price"]</code> or <code>.price</code></p>
<p>For an inventory count:</p>
<pre><code class="language-html">&lt;div class="stock"&gt;150 items available&lt;/div&gt;</code></pre>
<p>Use: <code>//div[@class="stock"]</code> or <code>div.stock</code></p>
<p>For a specific ID:</p>
<pre><code class="language-html">&lt;p id="total-views"&gt;2,543 views&lt;/p&gt;</code></pre>
<p>Use: <code>//p[@id="total-views"]</code> or <code>#total-views</code></p>
<p>For a rating or score:</p>
<pre><code class="language-html">&lt;span class="rating"&gt;4.5&lt;/span&gt;</code></pre>
<p>Use: <code>//span[@class="rating"]</code> or <code>.rating</code></p>
<h3>How It Works</h3>
<p>Once you've set up your number tracker, PageCrawl will:</p>
<ul>
<li>Extract the numeric value each time it checks the page</li>
<li>Store the values over time and build a historical record</li>
<li>Display all values in a chart so you can see trends at a glance</li>
<li>Show you when values go up or down and by how much</li>
<li>Alert you if the number changes by a certain amount (if you configure notification conditions)</li>
</ul>
<p>The chart displays your complete history, making it easy to spot patterns and see how values change over different time periods. You can see exactly when changes happened and track the progression of any number over days, weeks, or months.</p>
<h3>Understanding the Chart</h3>
<p>Your number tracking chart shows:</p>
<ul>
<li>All previous values recorded over time</li>
<li>Exact dates and times when each value was captured</li>
<li>Trends and patterns in how the number changes</li>
<li>Peaks (highest values) and valleys (lowest values)</li>
<li>How much the number changed between each check</li>
</ul>
<p>This gives you a clear visual picture of what's happening with the metric you're tracking.</p>
<h3>Statistics Overview</h3>
<p>PageCrawl displays comprehensive statistics about your tracked number:</p>
<ul>
<li><strong>Data Points</strong>: Total number of checks performed and days tracked</li>
<li><strong>Average</strong>: The mean of all recorded values over time</li>
<li><strong>Median</strong>: The middle value, useful for understanding typical values when outliers exist</li>
<li><strong>First Recorded</strong>: The initial value and when tracking began</li>
<li><strong>Current Value</strong>: Your most recent reading with:<ul>
<li>90-day change comparison</li>
<li>Distance from average (shows if current value is higher or lower than typical)</li>
</ul>
</li>
<li><strong>Highest Value</strong>: The maximum value ever recorded and when it occurred</li>
<li><strong>Lowest Value</strong>: The minimum value ever recorded and when it occurred</li>
<li><strong>Total Change</strong>: How much the value has changed since you started tracking (absolute and percentage)</li>
<li><strong>Trend</strong>: Overall direction indicator (📈 Up, 📉 Down, or ➡️ Stable)</li>
<li><strong>Last Changed</strong>: When the value actually changed (not just checked)</li>
</ul>
<p>These statistics are color-coded:</p>
<ul>
<li>Green indicates increases or positive changes</li>
<li>Red indicates decreases or negative changes</li>
<li>Gray indicates neutral or stable values</li>
</ul>
<p>This helps you quickly understand the overall behavior of your metric without manually analyzing the chart.</p>
<h3>Chart</h3>
<p>The chart visualization includes powerful interactive features to help you analyze your data:</p>
<p><strong>Date Range Filters:</strong></p>
<ul>
<li>Use the quick filter buttons to view specific time periods:<ul>
<li>Last 7 Days - Recent short-term trends</li>
<li>Last 30 Days - Monthly patterns</li>
<li>Last 90 Days - Quarterly trends</li>
<li>All Time - Complete history</li>
</ul>
</li>
</ul>
<p><strong>Chart Controls:</strong></p>
<ul>
<li><strong>Avg Line</strong>: Toggle the average reference line on/off</li>
<li><strong>Moving Avg</strong>: Toggle moving average lines on/off to smooth out short-term fluctuations<ul>
<li>Choose between 7-day or 30-day moving averages</li>
<li>The moving average line appears as a dashed line in the same color as your data</li>
<li>Helps identify underlying trends by filtering out daily noise</li>
<li>Hover over any point to see both the actual value and the moving average</li>
</ul>
</li>
</ul>
<p><strong>Visual Annotations:</strong></p>
<ul>
<li><strong>Average Line</strong>: A dashed horizontal line shows the overall average value</li>
<li><strong>Highest Point</strong>: Marked with a red dot and label showing the peak value</li>
<li><strong>Lowest Point</strong>: Marked with a green dot and label showing the minimum value</li>
<li><strong>Color-Coded Dots</strong>: Each data point is colored based on change direction:<ul>
<li>Green dots indicate the value increased from the previous check</li>
<li>Red dots indicate the value decreased</li>
<li>Standard color means no change</li>
</ul>
</li>
<li><strong>Zoom Brush</strong>: On desktop, use the brush tool at the bottom to zoom into specific date ranges</li>
</ul>
<p><strong>Legend:</strong></p>
<ul>
<li>Click on any metric name in the legend to show/hide that line</li>
<li>Disabled lines appear grayed out with a strikethrough</li>
<li>Perfect for focusing on specific metrics when tracking multiple values</li>
<li>Click again to re-enable the line</li>
<li>All reference lines (average, annotations) update based on visible lines</li>
</ul>
<p><strong>Tooltips:</strong>
When you hover over any point on the chart, you'll see:</p>
<ul>
<li>The exact date and time of the check</li>
<li>The current value at that point</li>
<li>Change from the previous check with up/down arrows (▲ ▼)</li>
<li>The moving average value at that point (if enabled)</li>
<li>All values are clearly labeled so you know what each number means</li>
</ul>
<p><strong>Performance Optimizations:</strong>
For long tracking periods with thousands of data points:</p>
<ul>
<li>The chart automatically samples data when viewing "All Time" to maintain smooth performance</li>
<li>You'll see a note indicating how many points are shown (e.g., "Showing 150 of 500 points")</li>
<li>This ensures fast, responsive charts even with years of historical data</li>
</ul>
<p>These features make it easy to spot trends, identify when significant changes occurred, and understand your data at a glance.</p>
<h3>Tips for Best Results</h3>
<ul>
<li><strong>Use the visual selector</strong>: Click directly on the number you want to track rather than writing selectors manually</li>
<li><strong>Check your selector works</strong>: Make sure the selector is targeting the right element on the page</li>
<li><strong>Set reasonable check frequency</strong>: How often PageCrawl checks depends on how fast you expect the number to change</li>
<li><strong>Use templates for multiple pages</strong>: If you're tracking the same metric on different pages (like product prices), create a template and apply it to all pages. If you need to update the monitored pages, you will only need to make one change.</li>
</ul>
<h3>Using Templates</h3>
<p>If you need to monitor the same numeric value across multiple pages on a website, you can:</p>
<ol>
<li>Create a template with your Number tracker configuration</li>
<li>Apply that template to all the pages you want to monitor</li>
<li>Compare how the value changes across different pages</li>
</ol>
<p>This saves you time and makes it easy to track metrics across your entire site.</p>
<h3>Comparing Multiple Monitors on One Chart</h3>
<p>If you're tracking the same type of number across different pages (for example, the price of a product on multiple retailers), you can overlay them all on a single chart to compare side by side.</p>
<p><strong>How to set it up:</strong></p>
<ol>
<li>Open any monitor that has a Number or Price tracked element</li>
<li>Above the chart, you'll see a <strong>"Compare with..."</strong> dropdown</li>
<li>Click it and search for other monitors you want to add by name or URL</li>
<li>Select the monitors you want to compare. You can add up to 5 monitors on the same chart</li>
</ol>
<p>PageCrawl will suggest relevant monitors automatically, prioritizing monitors in the same folder, on the same domain, or tracking similar products.</p>
<p><strong>What the combined chart shows:</strong></p>
<ul>
<li>Each monitor appears as a separate line in a distinct color</li>
<li>All data points are merged onto a shared timeline so you can see how values move relative to each other</li>
<li>The chart legend lists every line. Click any line in the legend to show or hide it</li>
<li>Hovering over the chart shows a tooltip with the values from all monitors at that point in time</li>
<li>Date filters, moving averages, and zoom all apply to every line at once</li>
</ul>
<p><strong>Reading the comparison:</strong></p>
<ul>
<li>The Y-axis adjusts automatically to fit all values</li>
<li>The average, highest, and lowest annotations still apply to the primary monitor</li>
<li>Comparison data in tooltips is marked with a bullet (●) so you can tell which values belong to the primary monitor and which are from compared monitors</li>
</ul>
<p>This is especially useful for:</p>
<ul>
<li><strong>Competitive price tracking</strong>: See how your price compares to competitors over time on one chart</li>
<li><strong>Cross-retailer monitoring</strong>: Track the same product on Amazon, Walmart, and other stores and see price differences instantly</li>
<li><strong>Regional comparisons</strong>: Compare the same metric across different regional pages</li>
<li><strong>Benchmarking</strong>: Overlay your metric against an industry reference point</li>
</ul>
<p>Your comparison selections are saved, so the next time you open the monitor the same comparison lines will appear on the chart.</p>
<h3>Common Examples</h3>
<p><strong>E-commerce Store</strong>: Track product prices across listings. When prices drop or go on sale, you'll see it immediately in the chart. Compare pricing across multiple product pages to spot trends.</p>
<p><strong>Real Estate Pricing</strong>: Track property prices on listing sites. Monitor how prices change over time, identify when properties go on sale, or track pricing trends in your area of interest.</p>
<p><strong>Competitor Pricing</strong>: Monitor competitor product prices, discount percentages, or pricing changes. The chart gives you a clear view of when they adjust their prices.</p>
<p><strong>Job Postings</strong>: Track how many open positions a company has posted. The chart shows when they're actively hiring and when positions get filled.</p>
<p><strong>Education Programs</strong>: Monitor tuition costs, enrollment numbers for programs, or available spots in courses. Track how these metrics change throughout the year.</p>
<p><strong>Government Fees &amp; Services</strong>: Monitor permit costs, license fees, visa application prices, or other government service charges that may be subject to change.</p>
<p><strong>Stock Price Monitoring</strong>: Monitor the current price of a stock or cryptocurrency. The chart shows you exactly when the price changed and by how much.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[SAML SSO Configuration in PageCrawl]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/account-settings/article/saml-sso-configuration" />
            <id>https://pagecrawl.io/73</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>SAML SSO Configuration in PageCrawl</h1>
<p>This guide covers the PageCrawl side of SSO setup: importing your identity provider's metadata, enabling SSO, configuring enforcement and user provisioning. For step-by-step instructions on configuring your identity provider (Azure AD, Google Workspace, Okta, etc.), see the <a href="/help/account-settings/article/set-up-identity-provider-for-saml-sso">Identity Provider Setup Guide</a>.</p>
<p>Single Sign-On (SSO) allows your team members to securely access PageCrawl using your organization's identity provider, such as Azure AD, Google Workspace, Okta, or OneLogin.</p>
<h2>Requirements</h2>
<p>To use SAML SSO, your team must meet the following requirements:</p>
<ul>
<li><strong>Enterprise or Ultimate Plan</strong> subscription</li>
<li><strong>Corporate email domain</strong> - The team owner must use a verified corporate email address (free email providers like Gmail, Yahoo, Outlook, and iCloud are not supported)</li>
<li><strong>Identity Provider</strong> that supports SAML 2.0 standard</li>
</ul>
<h2>How to Configure SAML SSO</h2>
<h3>1. Access SSO Settings</h3>
<p>Navigate to <strong>Settings → Team → Auth &amp; SSO</strong> in your PageCrawl account. You must be a team Owner or Administrator to access these settings.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-sso.png" alt="Single Sign-On via SAML 2.0 section in Auth and SSO settings with the Enable SSO toggle">
</div>
<p>When you first access the SSO settings page, PageCrawl automatically generates a unique identifier (UUID) and creates an initial SSO configuration for your team. This UUID is immediately available and used to create your Entity ID and Metadata URL.</p>
<h3>2. Get Service Provider Information</h3>
<p>Before configuring your Identity Provider, copy the <strong>Metadata URL</strong> displayed in the blue information box at the top of the SSO settings page.</p>
<div class="kb-figure">
  <img src="/images/knowledge/sso-sp-info.png" alt="Service Provider Information box showing the Metadata URL, Entity ID, Reply URL (ACS), Sign on URL, and Logout URL">
</div>
<p>The URL will look like: <code>https://pagecrawl.io/sso/saml/abc-123-def-456/metadata</code></p>
<p><strong>Important:</strong> Copy the actual URL shown in PageCrawl, not this example.</p>
<p>Most Identity Providers can automatically import all necessary configuration (Entity ID, ACS URL, Logout URL, etc.) from this metadata URL.</p>
<p><strong>Note:</strong> If your IdP requires manual entry, the individual URLs are also displayed in the same box:</p>
<ul>
<li>Reply URL (Assertion Consumer Service URL)</li>
<li>Sign on URL</li>
<li>Logout URL</li>
</ul>
<h3>3. Configure Your Identity Provider</h3>
<p>Follow the instructions in our <a href="./set-up-identity-provider-for-saml-sso">Identity Provider Setup Guide</a> for your specific IdP (Azure AD, Google Workspace, Okta, etc.).</p>
<p>You'll need to create a SAML application in your IdP and provide the ACS URL and Entity ID from step 2.</p>
<h3>4. Import Identity Provider Metadata into PageCrawl</h3>
<p>You have three options to configure your IdP:</p>
<div class="kb-figure">
  <img src="/images/knowledge/sso-metadata-import.png" alt="Metadata import tabs in PageCrawl SSO settings: Metadata URL, Metadata XML, and Manual Entry">
</div>
<p><strong>Option A: Metadata URL</strong> (Recommended)</p>
<ul>
<li>Enter your IdP's metadata URL</li>
<li>Click "Parse Metadata from URL"</li>
<li>PageCrawl will automatically extract all required settings</li>
</ul>
<p><strong>Option B: Metadata XML</strong></p>
<ul>
<li>Copy your IdP's metadata XML</li>
<li>Paste it into the metadata XML field</li>
<li>Click "Parse Metadata XML"</li>
</ul>
<p><strong>Option C: Manual Entry</strong></p>
<ul>
<li>Manually enter Entity ID, SSO URL, SLO URL, and X.509 Certificate</li>
<li>This option is useful for custom configurations</li>
</ul>
<h3>5. Enable SSO Features</h3>
<p>Configure the following settings based on your needs:</p>
<h4>Enable SSO</h4>
<p>Turn on SAML authentication for your domain.</p>
<h4>Enforce SSO</h4>
<p>When enabled, password login will be disabled for users with your email domain. Users must authenticate via your identity provider.</p>
<h4>Just-in-Time (JIT) Provisioning</h4>
<div class="kb-figure">
  <img src="/images/knowledge/sso-jit-provisioning.png" alt="Just-in-Time provisioning settings: Enable Automatic Account Creation, Default Role, Create Personal Workspace, and Default Workspaces">
</div>
<p><strong>Enable Automatic Account Creation</strong></p>
<ul>
<li><strong>Enabled</strong>: New users logging in via SSO will automatically get accounts created</li>
<li><strong>Disabled</strong>: Only existing users can log in via SSO. New users must be manually added first.</li>
</ul>
<p>When JIT provisioning is enabled, you can configure:</p>
<p><strong>Default Role for New SSO Users</strong></p>
<ul>
<li>Administrator</li>
<li>Standard User</li>
<li>Viewer</li>
</ul>
<p><strong>Default Workspaces</strong></p>
<ul>
<li>Leave empty to assign all workspaces</li>
<li>Select specific workspaces to limit access</li>
</ul>
<p><strong>Auto-Create Personal Workspace</strong></p>
<ul>
<li>When enabled, each new SSO user gets a personal workspace</li>
<li>Note: Your account has a workspace limit based on your subscription</li>
<li>If the limit is reached, no personal workspaces will be created</li>
</ul>
<h2>Workspace Limits</h2>
<p>Personal workspace creation depends on your <a href="/pricing">subscription plan</a>:</p>
<p>If you enable "Auto-Create Personal Workspace" and have reached your limit, new SSO users will be assigned to default workspaces instead of creating personal workspaces.</p>
<h2>SSO Login Flow</h2>
<p>Once configured, users with your email domain will:</p>
<ol>
<li>Go to PageCrawl login page</li>
<li>Enter their email address</li>
<li>Be redirected to your identity provider</li>
<li>Authenticate with their corporate credentials</li>
<li>Be redirected back to PageCrawl and logged in automatically</li>
</ol>
<p>If JIT provisioning is enabled and they're a new user, an account will be created automatically with the configured role and workspace assignments.</p>
<h2>Troubleshooting Common Issues</h2>
<h3>"Team has reached member limit"</h3>
<p><strong>Error:</strong> "Unable to provision SSO user: Team has reached its member limit."</p>
<p><strong>Solution:</strong></p>
<ul>
<li>Check your subscription plan in <strong>Settings → Team → Subscription</strong></li>
<li>Either upgrade to a plan with more seats or remove inactive members</li>
<li>Once you have available seats, the user can try logging in again</li>
</ul>
<h3>"Automatic account creation is disabled"</h3>
<p><strong>Error:</strong> "Automatic account creation is disabled. Please ask your team administrator to enable JIT provisioning."</p>
<p><strong>Solution:</strong></p>
<ul>
<li>Enable <strong>"Enable Automatic Account Creation"</strong> in <strong>Settings → Team → Auth &amp; SSO</strong></li>
<li>Or manually add the user in <strong>Settings → Team → Users</strong> before they log in</li>
</ul>
<h3>User Not Assigned in Identity Provider</h3>
<p><strong>Symptoms:</strong> User gets error after authenticating at IdP.</p>
<p><strong>Solution:</strong></p>
<ul>
<li><strong>Azure AD:</strong> Go to Enterprise Applications → PageCrawl → Users and groups → Add user/group</li>
<li><strong>Google Workspace:</strong> Admin Console → PageCrawl app → User access → Enable for user's org unit</li>
<li><strong>Okta:</strong> Applications → PageCrawl → Assignments → Assign to People</li>
</ul>
<h3>Certificate Expired or Invalid</h3>
<p><strong>Symptoms:</strong> "Invalid signature" or authentication fails at final step.</p>
<p><strong>Solution:</strong></p>
<ol>
<li>In PageCrawl SSO settings, update the metadata:<ul>
<li>Click <strong>Parse Metadata from URL</strong> to refresh, or</li>
<li>Download fresh XML from IdP and paste it, then click <strong>Parse Metadata XML</strong></li>
</ul>
</li>
<li>Most IdPs rotate certificates every 1-3 years</li>
</ol>
<h3>Metadata Import Errors</h3>
<p><strong>Common Issues:</strong></p>
<ul>
<li><strong>EntitiesDescriptor Format:</strong> PageCrawl requires <code>EntityDescriptor</code> format, not <code>EntitiesDescriptor</code></li>
<li><strong>Invalid XML:</strong> Ensure you copied the entire XML including <code>&lt;?xml</code> declaration</li>
<li><strong>URL Not Accessible:</strong> Ensure metadata URL is publicly accessible</li>
</ul>
<h3>Personal Workspace Not Created</h3>
<p><strong>Cause:</strong> Team has reached workspace limit for subscription plan.</p>
<p><strong>Solution:</strong></p>
<ul>
<li>Delete unused workspaces in <strong>Settings → Team → Workspaces</strong></li>
<li>Or upgrade to a plan with more workspaces</li>
<li>New users will still be assigned to default workspaces</li>
</ul>
<h2>Testing Your SSO Configuration</h2>
<ol>
<li><strong>Use Incognito/Private Window</strong> to test fresh user experience</li>
<li><strong>Test with Assigned User</strong> who has access in your IdP</li>
<li><strong>Verify Each Step:</strong><ul>
<li>Enter email at PageCrawl login</li>
<li>Verify redirect to IdP</li>
<li>Authenticate at IdP</li>
<li>Verify redirect back to PageCrawl</li>
<li>Confirm successful login</li>
</ul>
</li>
<li><strong>Test Different Scenarios:</strong><ul>
<li>New user (if JIT enabled)</li>
<li>Existing user</li>
<li>User with wrong domain (should fail correctly)</li>
</ul>
</li>
</ol>
<h2>Security Best Practices</h2>
<ul>
<li>Monitor certificate expiration dates and update before they expire</li>
<li>Only assign necessary users in your IdP</li>
<li>Set appropriate default role (usually "Viewer" or "Standard User")</li>
<li>Enable "Enforce SSO" only after thorough testing with all users</li>
</ul>
<h2>Frequently Asked Questions</h2>
<p><strong>Q: Can I have multiple identity providers?</strong>
A: No, PageCrawl supports one identity provider per team.</p>
<p><strong>Q: What happens to existing users when I enable SSO?</strong>
A: Existing users can continue using password login unless you enable "Enforce SSO". With JIT provisioning enabled, their accounts will be automatically linked to SSO on first SSO login.</p>
<p><strong>Q: Can I disable SSO after enabling it?</strong>
A: Yes, you can disable SSO anytime in the settings. Users will revert to password-based login.</p>
<p><strong>Q: What if my IdP certificate expires?</strong>
A: Users won't be able to log in until you update the certificate. Update metadata in PageCrawl SSO settings as soon as your IdP rotates certificates.</p>
<p><strong>Q: Why can't I use Gmail or other free email providers?</strong>
A: SSO requires corporate email domains for security. Free email providers don't provide the organizational control needed for enterprise SSO.</p>
<p><strong>Q: How do I migrate all users to SSO?</strong>
A: Enable SSO with JIT provisioning first. Test with a few users. Once confirmed working, enable "Enforce SSO" to require all users to use SSO.</p>
<p><strong>Q: What happens if we reach our member or workspace limit?</strong>
A: New SSO users won't be able to log in if member limit is reached. If workspace limit is reached, personal workspaces won't be created, but users will still be assigned to default workspaces.</p>
<h2>Support</h2>
<p>For assistance with SSO configuration or to request early access, contact <a href="mailto:support@pagecrawl.io">support@pagecrawl.io</a>.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Set Up Your Identity Provider for SAML SSO]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/account-settings/article/set-up-identity-provider-for-saml-sso" />
            <id>https://pagecrawl.io/74</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Set Up Your Identity Provider for SAML SSO</h1>
<p>This guide covers the identity provider (IdP) side of SSO setup with step-by-step instructions for Azure AD, Google Workspace, Okta, OneLogin, and custom SAML providers. For PageCrawl-side settings (enabling SSO, enforcement, JIT provisioning), see the <a href="/help/account-settings/article/saml-sso-configuration">SSO Configuration Guide</a>.</p>
<p>Before you begin, ensure you have:</p>
<ul>
<li>Access to your identity provider's admin console</li>
<li>PageCrawl Enterprise or Ultimate plan with SSO enabled</li>
<li>Team owner's verified corporate email address</li>
</ul>
<h2>Get Your Service Provider Information</h2>
<p><strong>IMPORTANT: Complete this step first before configuring your Identity Provider</strong></p>
<ol>
<li>Navigate to <strong>Settings → Team → Auth &amp; SSO</strong> in PageCrawl</li>
<li>Copy the <strong>Metadata URL</strong> shown in the blue Service Provider information box</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-sso.png" alt="Auth and SSO settings in PageCrawl with the SAML 2.0 Single Sign-On section">
</div>
   - It will look like: `https://pagecrawl.io/sso/saml/abc-123-def-456/metadata`
   - **Important:** Copy the actual URL from PageCrawl, not this example
<ol start="3">
<li>Keep this URL handy - most Identity Providers can automatically import all configuration from this metadata URL</li>
</ol>
<p><strong>Note:</strong> If your IdP doesn't support metadata import, copy the individual URLs from PageCrawl (they will also be shown in the same box):</p>
<ul>
<li>Reply URL (Assertion Consumer Service URL)</li>
<li>Sign on URL</li>
<li>Logout URL</li>
</ul>
<p><strong>Additional information for reference:</strong></p>
<ul>
<li><strong>NameID Format</strong>: Email Address (<code>urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress</code>)</li>
<li><strong>Binding</strong>: HTTP-POST for ACS, HTTP-Redirect for Single Sign-On</li>
</ul>
<hr />
<div class="kb-provider-logo">
  <img src="/images/knowledge/providers/azure-ad.svg" alt="Microsoft Entra ID logo">
</div>
<h3>Step 1: Create Enterprise Application</h3>
<ol>
<li>Sign in to the <a href="https://portal.azure.com">Azure Portal</a></li>
<li>Navigate to <strong>Azure Active Directory → Enterprise Applications</strong></li>
<li>Click <strong>New application</strong></li>
<li>Click <strong>Create your own application</strong></li>
<li>Name it "PageCrawl" and select <strong>Integrate any other application you don't find in the gallery (Non-gallery)</strong></li>
<li>Click <strong>Create</strong></li>
</ol>
<h3>Step 2: Configure SAML</h3>
<ol>
<li>In your PageCrawl application, click <strong>Single sign-on</strong> in the left menu</li>
<li>Select <strong>SAML</strong> as the single sign-on method</li>
<li>In section <strong>1. Basic SAML Configuration</strong>, click <strong>Edit</strong> and enter:<ul>
<li><strong>Identifier (Entity ID)</strong>: Paste your Entity ID from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../metadata</code>)</li>
<li><strong>Reply URL (ACS URL)</strong>: Paste your Reply URL from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../acs</code>)</li>
</ul>
</li>
<li>Click <strong>Save</strong></li>
</ol>
<h3>Step 3: Configure Attributes &amp; Claims</h3>
<p>The default Name ID (user.mail) is sufficient. No additional changes needed.</p>
<h3>Step 4: Download Metadata</h3>
<ol>
<li>In section <strong>3. SAML Signing Certificate</strong>, copy the <strong>App Federation Metadata Url</strong></li>
<li>In PageCrawl SSO settings, paste this URL in the <strong>Metadata URL</strong> field</li>
<li>Click <strong>Parse Metadata from URL</strong></li>
</ol>
<h3>Step 5: Assign Users</h3>
<ol>
<li>Navigate to <strong>Users and groups</strong></li>
<li>Click <strong>Add user/group</strong></li>
<li>Select users or groups who should have access to PageCrawl</li>
<li>Click <strong>Assign</strong></li>
</ol>
<hr />
<div class="kb-provider-logo">
  <img src="/images/knowledge/providers/google-workspace.svg" alt="Google Workspace logo">
</div>
<h3>Step 1: Create Custom SAML Application</h3>
<ol>
<li>Sign in to your <a href="https://admin.google.com">Google Admin Console</a></li>
<li>Go to <strong>Apps → Web and mobile apps</strong></li>
<li>Click <strong>Add app → Add custom SAML app</strong></li>
<li>Enter "PageCrawl" as the app name</li>
<li>Click <strong>Continue</strong></li>
</ol>
<h3>Step 2: Download Google IdP Metadata</h3>
<ol>
<li>On the <strong>Google Identity Provider details</strong> page, click <strong>Download Metadata</strong></li>
<li>Save the XML file</li>
<li>Click <strong>Continue</strong></li>
</ol>
<h3>Step 3: Configure Service Provider Details</h3>
<ol>
<li>Enter the following values:<ul>
<li><strong>ACS URL</strong>: Paste your Reply URL from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../acs</code>)</li>
<li><strong>Entity ID</strong>: Paste your Entity ID from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../metadata</code>)</li>
<li><strong>Start URL</strong>: Leave empty</li>
<li><strong>Name ID format</strong>: EMAIL</li>
<li><strong>Name ID</strong>: Basic Information &gt; Primary email</li>
<li><strong>Signed response</strong>: Leave unchecked (PageCrawl requires signed assertions, which is the industry standard default)</li>
</ul>
</li>
<li>Click <strong>Continue</strong></li>
<li>Click <strong>Finish</strong> (skip attribute mapping)</li>
</ol>
<h3>Step 4: Import Metadata to PageCrawl</h3>
<ol>
<li>Open the downloaded metadata XML file</li>
<li>In PageCrawl SSO settings, paste the content into <strong>Metadata XML</strong> field</li>
<li>Click <strong>Parse Metadata XML</strong></li>
</ol>
<h3>Step 5: Turn On the App</h3>
<ol>
<li>In Google Admin, click on your PageCrawl app</li>
<li>Click <strong>User access</strong></li>
<li>Select <strong>ON for everyone</strong> or specific organizational units</li>
<li>Click <strong>Save</strong></li>
</ol>
<hr />
<div class="kb-provider-logo">
  <img src="/images/knowledge/providers/okta.svg" alt="Okta logo">
</div>
<h3>Step 1: Add Application</h3>
<ol>
<li>Sign in to your <a href="https://admin.okta.com">Okta Admin Console</a></li>
<li>Go to <strong>Applications → Applications</strong></li>
<li>Click <strong>Create App Integration</strong></li>
<li>Select <strong>SAML 2.0</strong> and click <strong>Next</strong></li>
</ol>
<h3>Step 2: General Settings</h3>
<ol>
<li>Enter "PageCrawl" as the <strong>App name</strong></li>
<li>(Optional) Upload a logo</li>
<li>Click <strong>Next</strong></li>
</ol>
<h3>Step 3: Configure SAML</h3>
<ol>
<li>In the <strong>SAML Settings</strong> section, enter:<ul>
<li><strong>Single sign-on URL</strong>: Paste your Reply URL from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../acs</code>)</li>
<li><strong>Audience URI (SP Entity ID)</strong>: Paste your Entity ID from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../metadata</code>)</li>
<li><strong>Name ID format</strong>: EmailAddress</li>
<li><strong>Application username</strong>: Email</li>
</ul>
</li>
<li>Leave other settings as default</li>
<li>Click <strong>Next</strong></li>
</ol>
<h3>Step 4: Feedback</h3>
<ol>
<li>Select <strong>I'm an Okta customer adding an internal app</strong></li>
<li>Click <strong>Finish</strong></li>
</ol>
<h3>Step 5: Get Metadata URL</h3>
<ol>
<li>On the <strong>Sign On</strong> tab, scroll to <strong>SAML Signing Certificates</strong></li>
<li>Click <strong>Actions</strong> next to the active certificate</li>
<li>Click <strong>View IdP metadata</strong></li>
<li>Copy the URL from your browser's address bar</li>
<li>In PageCrawl SSO settings, paste this URL in the <strong>Metadata URL</strong> field</li>
<li>Click <strong>Parse Metadata from URL</strong></li>
</ol>
<h3>Step 6: Assign Users</h3>
<ol>
<li>Go to the <strong>Assignments</strong> tab</li>
<li>Click <strong>Assign</strong> and select <strong>Assign to People</strong> or <strong>Assign to Groups</strong></li>
<li>Assign users who should have access to PageCrawl</li>
<li>Click <strong>Done</strong></li>
</ol>
<hr />
<div class="kb-provider-logo">
  <img src="/images/knowledge/providers/onelogin.svg" alt="OneLogin logo">
</div>
<h3>Step 1: Add Application</h3>
<ol>
<li>Sign in to your <a href="https://app.onelogin.com/admin">OneLogin Admin Console</a></li>
<li>Go to <strong>Applications → Applications</strong></li>
<li>Click <strong>Add App</strong></li>
<li>Search for "SAML Test Connector (Advanced)" and select it</li>
</ol>
<h3>Step 2: Configure Application</h3>
<ol>
<li>Enter "PageCrawl" as the <strong>Display Name</strong></li>
<li>Click <strong>Save</strong></li>
</ol>
<h3>Step 3: Configure SAML Settings</h3>
<ol>
<li>Go to the <strong>Configuration</strong> tab</li>
<li>Enter the following:<ul>
<li><strong>Audience (Entity ID)</strong>: Paste your Entity ID from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../metadata</code>)</li>
<li><strong>Recipient</strong>: Paste your Reply URL from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../acs</code>)</li>
<li><strong>ACS (Consumer) URL Validator</strong>: Use regex pattern <code>https://pagecrawl\.io/sso/saml/[^/]+/acs</code></li>
<li><strong>ACS (Consumer) URL</strong>: Paste your Reply URL from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../acs</code>)</li>
</ul>
</li>
<li>Click <strong>Save</strong></li>
</ol>
<h3>Step 4: Get Metadata URL</h3>
<ol>
<li>Go to the <strong>More Actions</strong> menu</li>
<li>Select <strong>SAML Metadata</strong></li>
<li>Copy the metadata URL</li>
<li>In PageCrawl SSO settings, paste this URL in the <strong>Metadata URL</strong> field</li>
<li>Click <strong>Parse Metadata from URL</strong></li>
</ol>
<h3>Step 5: Assign Users</h3>
<ol>
<li>Go to the <strong>Users</strong> tab</li>
<li>Select users who should have access</li>
<li>Click <strong>Save</strong></li>
</ol>
<hr />
<h2>Custom SAML 2.0 Provider</h2>
<p>If your identity provider isn't listed above but supports SAML 2.0, you can configure it manually:</p>
<h3>Step 1: Configure Your Identity Provider</h3>
<p>In your IdP, create a new SAML application with these settings:</p>
<ul>
<li><strong>Entity ID</strong>: Paste your Entity ID from PageCrawl (you copied this in the first section above, e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../metadata</code>)</li>
<li><strong>ACS URL</strong>: Paste your Reply URL from PageCrawl (e.g., <code>https://pagecrawl.io/sso/saml/abc-123.../acs</code>)</li>
<li><strong>NameID Format</strong>: Email Address</li>
<li><strong>Binding</strong>: HTTP-POST for ACS, HTTP-Redirect for SSO</li>
</ul>
<h3>Step 2: Get IdP Information</h3>
<p>From your identity provider, collect:</p>
<ul>
<li><strong>Entity ID</strong> (IdP Issuer)</li>
<li><strong>SSO URL</strong> (Sign-on URL)</li>
<li><strong>SLO URL</strong> (Sign-out URL) - Optional</li>
<li><strong>X.509 Certificate</strong></li>
</ul>
<h3>Step 3: Manual Configuration in PageCrawl</h3>
<ol>
<li>In PageCrawl SSO settings, select the <strong>Manual Entry</strong> tab</li>
<li>Enter the collected information:<ul>
<li>Entity ID</li>
<li>SSO URL</li>
<li>SLO URL (optional)</li>
<li>X.509 Certificate (paste the full certificate including BEGIN/END markers)</li>
</ul>
</li>
<li>Enable SSO and configure JIT provisioning settings</li>
<li>Click <strong>Save Changes</strong></li>
</ol>
<hr />
<h2>Validation</h2>
<p>After configuration, test your SSO:</p>
<ol>
<li>Open an incognito/private browser window</li>
<li>Go to PageCrawl login page</li>
<li>Enter a test user's email address with your domain</li>
<li>Verify you're redirected to your IdP</li>
<li>Complete authentication</li>
<li>Verify you're logged into PageCrawl successfully</li>
</ol>
<p>If you encounter issues, check:</p>
<ul>
<li>User is assigned to the PageCrawl application in your IdP</li>
<li>Email domain matches your configured domain</li>
<li>Metadata was imported correctly</li>
<li>X.509 certificate is valid and not expired</li>
</ul>
<hr />
<h2>Notes</h2>
<ul>
<li><strong>Metadata XML Format</strong>: PageCrawl does not support the <code>EntitiesDescriptor</code> element. Use <code>EntityDescriptor</code> format.</li>
<li><strong>Multiple IdPs</strong>: PageCrawl supports one identity provider per team.</li>
<li><strong>Certificate Rotation</strong>: When your IdP certificate expires, update the metadata in PageCrawl SSO settings.</li>
</ul>
<h2>Support</h2>
<p>For assistance with your specific identity provider, contact <a href="mailto:support@pagecrawl.io">support@pagecrawl.io</a>.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Choosing the Right AI Model for Website Change Monitoring in 2026]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/choosing-best-ai-model-website-monitoring" />
            <id>https://pagecrawl.io/75</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Choosing the Right AI Model for Website Change Monitoring in 2026</h1>
<p>Every PageCrawl.io plan includes AI credits that work automatically with no setup. For most users, the included credits are all you need. This guide is primarily for users who want to bring their own API key (BYOK) and choose a specific model, covering budget options to premium models with cost comparisons based on 2026 pricing.</p>
<p><strong>Using included credits?</strong> You don't need to choose a model. PageCrawl automatically uses optimized models on your behalf. Each 4,000-token block costs 1 credit (Basic tier) or 10 credits (Pro tier, Ultimate plan only). See <a href="/help/features/article/ai-powered-change-detection">AI-Powered Change Detection</a> for details on how credits work.</p>
<p><strong>Pricing updates frequently.</strong> Verify current rates at: <a href="https://openai.com/api/pricing/">OpenAI</a>, <a href="https://ai.google.dev/pricing">Gemini</a>, <a href="https://www.anthropic.com/pricing">Anthropic</a>, <a href="https://openrouter.ai/models">OpenRouter</a></p>
<div class="kb-figure">
  <img src="/images/knowledge/integ-ai-settings.png" alt="AI Features Configuration in workspace settings where included credits or your own provider key and model are managed">
</div>
<h3>Why AI Models Matter</h3>
<p>AI models enhance website monitoring by automatically summarizing changes, assigning priority scores, and distinguishing meaningful updates from noise.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-ai-notify.png" alt="Per-monitor AI settings with the focus instructions field and AI model selector">
</div>
<p>PageCrawl.io supports four AI providers:</p>
<ul>
<li><strong>OpenAI</strong> - GPT-5 family, reliable and fast</li>
<li><strong>Google Gemini</strong> - Gemini 3 family with competitive pricing</li>
<li><strong>Anthropic Claude</strong> - Claude 4.x series, high accuracy and premium quality</li>
<li><strong>OpenRouter</strong> - A marketplace that gives you access to 200+ AI models from different providers, all through a single account and API key</li>
</ul>
<h3>Understanding Tokens and Costs</h3>
<h4>What is a Token?</h4>
<p>A <strong>token</strong> is roughly 4 characters or about 3/4 of a word. AI providers charge based on tokens processed:</p>
<ul>
<li>"Hello world" = ~3 tokens</li>
<li>A typical paragraph = ~100 tokens</li>
<li>A blog post (1,000 words) = ~1,300 tokens</li>
<li>A full webpage = ~2,000-10,000 tokens</li>
</ul>
<h4>How PageCrawl Uses Tokens</h4>
<p>PageCrawl's AI costs are dominated by <strong>input tokens</strong> (the page content sent to AI). Output tokens are minimal because summaries are typically just 1-2 paragraphs (~100-200 tokens).</p>
<p><strong>Typical token usage per check:</strong></p>
<ul>
<li>Simple page (blog post, article): ~1,000-2,000 tokens</li>
<li>Medium page (product page, news): ~2,000-5,000 tokens</li>
<li>Large page (documentation, e-commerce): ~5,000-10,000 tokens</li>
</ul>
<p><strong>Example cost calculation (Gemini 3 Flash at ~$0.40/M input):</strong></p>
<ul>
<li>2,000 token page = $0.0008 per check (~1,250 checks per dollar)</li>
<li>5,000 token page = $0.002 per check (~500 checks per dollar)</li>
</ul>
<p><strong>Example cost calculation (Claude Opus 4.8 at ~$5/M input):</strong></p>
<ul>
<li>2,000 token page = $0.03 per check (~33 checks per dollar)</li>
<li>5,000 token page = $0.075 per check (~13 checks per dollar)</li>
</ul>
<p>Since output is just a short summary (~150 tokens), output costs add less than 10% to the total. Additionally, AI only runs when a meaningful change is detected on the page. PageCrawl's advanced change detection infrastructure filters out tiny, insignificant changes before they ever reach AI, so you only spend tokens on changes that actually matter.</p>
<h3>Available Models by Provider</h3>
<p>Below are the models currently available in PageCrawl.io. Models marked with a star are the recommended defaults for each provider.</p>
<h4>OpenAI Models</h4>
<table>
<thead>
<tr>
<th>Model</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>GPT-5.4 Mini</strong> ⭐</td>
<td>Default. Great balance of speed, quality, and cost.</td>
</tr>
<tr>
<td><strong>GPT-5.5</strong></td>
<td>Most capable OpenAI model. Best for complex pages.</td>
</tr>
<tr>
<td><strong>GPT-5.4</strong></td>
<td>Latest generation full model.</td>
</tr>
<tr>
<td><strong>GPT-5.4 Nano</strong></td>
<td>Fastest and cheapest. Good for simple pages.</td>
</tr>
<tr>
<td><strong>GPT-5.2</strong></td>
<td>Recent generation balanced option.</td>
</tr>
<tr>
<td><strong>GPT-5.1</strong></td>
<td>Recent generation option.</td>
</tr>
<tr>
<td><strong>GPT-5</strong></td>
<td>Full GPT-5 model.</td>
</tr>
<tr>
<td><strong>GPT-5 Mini</strong></td>
<td>Earlier GPT-5 balanced option.</td>
</tr>
<tr>
<td><strong>GPT-5 Nano</strong></td>
<td>Earlier GPT-5 budget option.</td>
</tr>
<tr>
<td><strong>GPT-4.1</strong></td>
<td>Previous generation, good for complex tasks.</td>
</tr>
<tr>
<td><strong>GPT-4.1 Mini</strong></td>
<td>Previous generation, still reliable.</td>
</tr>
<tr>
<td><strong>GPT-4.1 Nano</strong></td>
<td>Previous generation budget option.</td>
</tr>
</tbody>
</table>
<h4>Google Gemini Models</h4>
<table>
<thead>
<tr>
<th>Model</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Gemini 3.1 Flash</strong> ⭐</td>
<td>Default. Latest generation with great speed and quality.</td>
</tr>
<tr>
<td><strong>Gemini 3.1 Pro</strong></td>
<td>Premium model, Google's most capable.</td>
</tr>
<tr>
<td><strong>Gemini 3.1 Flash Lite</strong></td>
<td>Budget option in the latest generation.</td>
</tr>
<tr>
<td><strong>Gemini 3 Flash</strong></td>
<td>Recent generation balanced option.</td>
</tr>
<tr>
<td><strong>Gemini 2.5 Flash</strong></td>
<td>Reliable previous generation model.</td>
</tr>
<tr>
<td><strong>Gemini 2.5 Flash Lite</strong></td>
<td>Very affordable previous generation option.</td>
</tr>
<tr>
<td><strong>Gemini 2.5 Pro</strong></td>
<td>Previous generation premium model.</td>
</tr>
</tbody>
</table>
<h4>Anthropic Claude Models</h4>
<table>
<thead>
<tr>
<th>Model</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Claude Haiku 4.5</strong> ⭐</td>
<td>Default. Fast, affordable, and accurate.</td>
</tr>
<tr>
<td><strong>Claude Opus 4.8</strong></td>
<td>Most capable Anthropic model. Premium pricing.</td>
</tr>
<tr>
<td><strong>Claude Opus 4.7</strong></td>
<td>Previous flagship, still excellent.</td>
</tr>
<tr>
<td><strong>Claude Sonnet 4.6</strong></td>
<td>Strong all-rounder with excellent accuracy.</td>
</tr>
<tr>
<td><strong>Claude Opus 4.6</strong></td>
<td>Previous generation premium model.</td>
</tr>
<tr>
<td><strong>Claude Sonnet 4.5</strong></td>
<td>Previous generation, strong all-rounder.</td>
</tr>
</tbody>
</table>
<h3>Recommended Models by Use Case</h3>
<p>PageCrawl.io only calls AI when a page actually changes. If you monitor 1,000 pages and only 150 change, you pay for 150 AI requests, not 1,000.</p>
<p>The PageCrawl.io settings page provides three Quick Select tiers to help you choose:</p>
<h4>Best / Most Capable</h4>
<p>For complex pages where accuracy matters most (legal documents, terms of service, compliance monitoring).</p>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Model</th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenAI</td>
<td>GPT-5.5</td>
</tr>
<tr>
<td>Anthropic</td>
<td>Claude Opus 4.8</td>
</tr>
<tr>
<td>Google Gemini</td>
<td>Gemini 3.1 Pro</td>
</tr>
<tr>
<td>OpenRouter</td>
<td>Claude Opus 4.8</td>
</tr>
</tbody>
</table>
<p>These models deliver the most accurate results but cost significantly more. Only use them for pages where precision is critical.</p>
<h4>Good Quality (Recommended for Most Users)</h4>
<p>Best balance of quality and cost for everyday monitoring.</p>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Model</th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenAI</td>
<td>GPT-5.4 Mini</td>
</tr>
<tr>
<td>Anthropic</td>
<td>Claude Haiku 4.5</td>
</tr>
<tr>
<td>Google Gemini</td>
<td>Gemini 3.1 Flash</td>
</tr>
<tr>
<td>OpenRouter</td>
<td>Gemini 2.5 Flash</td>
</tr>
</tbody>
</table>
<p>This tier is the sweet spot for most BYOK users. These models handle the vast majority of monitoring tasks reliably and affordably.</p>
<h4>Budget</h4>
<p>Lowest cost for high-volume monitoring where some accuracy trade-off is acceptable.</p>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Model</th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenAI</td>
<td>GPT-5.4 Nano</td>
</tr>
<tr>
<td>Anthropic</td>
<td>Claude Haiku 4.5</td>
</tr>
<tr>
<td>Google Gemini</td>
<td>Gemini 2.5 Flash Lite</td>
</tr>
<tr>
<td>OpenRouter</td>
<td>DeepSeek V3.2</td>
</tr>
</tbody>
</table>
<p>Good for simple page monitoring (blog posts, news, documentation) where you want to keep costs as low as possible.</p>
<h3>Best Models by Content Type</h3>
<table>
<thead>
<tr>
<th>Content Type</th>
<th>Budget Option</th>
<th>Recommended</th>
<th>Premium</th>
</tr>
</thead>
<tbody>
<tr>
<td>Blogs, News, Docs</td>
<td>GPT-5.4 Nano</td>
<td>GPT-5.4 Mini</td>
<td>-</td>
</tr>
<tr>
<td>E-commerce, Pricing</td>
<td>Gemini 2.5 Flash Lite</td>
<td>Gemini 3.1 Flash</td>
<td>Claude Haiku 4.5</td>
</tr>
<tr>
<td>Legal, ToS, Compliance</td>
<td>Claude Haiku 4.5</td>
<td>Claude Sonnet 4.6</td>
<td>Claude Opus 4.8</td>
</tr>
<tr>
<td>Competitor Monitoring</td>
<td>Gemini 2.5 Flash Lite</td>
<td>GPT-5.4 Mini</td>
<td>Claude Haiku 4.5</td>
</tr>
<tr>
<td>API Docs, Changelogs</td>
<td>GPT-5.4 Nano</td>
<td>Gemini 3.1 Flash</td>
<td>-</td>
</tr>
</tbody>
</table>
<h3>Real-World Cost Examples</h3>
<p><strong>Costs can vary significantly.</strong> These are estimates only. Your actual costs depend on:</p>
<ul>
<li>Page complexity and content length</li>
<li>How often pages change</li>
<li>Deep Analysis setting (on = full page, off = changes only)</li>
<li>Max token settings</li>
</ul>
<p><strong>Token usage by page type:</strong></p>
<ul>
<li>Simple pages (blogs, docs): ~500 tokens</li>
<li>Average pages: ~2,000 tokens</li>
<li>Content-heavy pages: ~5,000-10,000 tokens</li>
<li>Complex pages (e-commerce, SPAs): 10,000-25,000+ tokens</li>
</ul>
<p><strong>Recommendation</strong>: Start with budget-friendly models like GPT-5.4 Nano or Gemini 2.5 Flash Lite and set strict monthly limits to avoid unexpected bills.</p>
<h4>Controlling Token Usage</h4>
<p>You can reduce token usage in PageCrawl.io settings:</p>
<ul>
<li><strong>Deep Analysis off</strong>: Only send changed text to AI (lower tokens, less context)</li>
<li><strong>Deep Analysis on</strong>: Send entire page for better understanding (higher tokens)</li>
<li><strong>Max tokens limit</strong>: Set a maximum per request (falls back to diff if exceeded)</li>
<li><strong>Monthly request limits</strong>: Set max AI requests per month to cap costs</li>
<li><strong>Per-page daily limit</strong>: Prevent noisy pages from consuming all your AI budget</li>
</ul>
<p><strong>Note</strong>: Check your actual token usage in PageCrawl.io's AI statistics to estimate your costs accurately.</p>
<h3>OpenRouter: Access 200+ Models</h3>
<p>OpenRouter provides unified access to AI models from multiple providers through a single API key. PageCrawl.io recommends OpenRouter as the default BYOK option because of its flexibility.</p>
<p><strong>Benefits</strong>: Unified billing, automatic fallbacks, access to models from OpenAI, Anthropic, Google, xAI, DeepSeek, Meta, Mistral, Qwen, and Cohere</p>
<p><strong>Best for</strong>: Users who want a single API key with access to many models, easy billing budgets, and the ability to switch models without changing keys</p>
<p><strong>Privacy Mode</strong>: When enabled in PageCrawl.io settings, your data is only routed through AI providers that don't use it for training</p>
<p>The model dropdown in PageCrawl.io groups OpenRouter models by price tier (Premium, Recommended, Standard, Budget) and only shows models that support the structured output format required for PageCrawl's AI features.</p>
<h3>How to Set Up BYOK in PageCrawl.io</h3>
<h4>Step 1: Get Your API Key</h4>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Get Key At</th>
</tr>
</thead>
<tbody>
<tr>
<td>OpenRouter</td>
<td><a href="https://openrouter.ai">openrouter.ai</a> &gt; Settings &gt; API Key</td>
</tr>
<tr>
<td>OpenAI</td>
<td><a href="https://platform.openai.com">platform.openai.com</a> &gt; API Keys</td>
</tr>
<tr>
<td>Google Gemini</td>
<td><a href="https://ai.google.dev">ai.google.dev</a> &gt; Get API Key</td>
</tr>
<tr>
<td>Anthropic</td>
<td><a href="https://console.anthropic.com">console.anthropic.com</a> &gt; API Keys</td>
</tr>
</tbody>
</table>
<h4>Step 2: Configure in PageCrawl.io</h4>
<ol>
<li>Go to <strong>Settings &gt; Workspace &gt; Integrations</strong></li>
<li>Click <strong>Manage</strong> or <strong>Setup</strong> on the AI Features card</li>
<li>Scroll down to the <strong>Bring Your Own Key</strong> section</li>
<li>Select your AI provider (OpenRouter is pre-selected and recommended)</li>
<li>Choose a model from the dropdown</li>
<li>Click <strong>Add API Key</strong>, paste your key, and use <strong>Test Key</strong> to verify it works</li>
<li>Click <strong>Save Configuration</strong></li>
</ol>
<h4>Step 3: Use Quick Select for Easy Model Choices</h4>
<p>After adding your API key, the settings page shows three Quick Select cards:</p>
<ul>
<li><strong>Best / Most Capable</strong> - Most accurate results, higher cost</li>
<li><strong>Good Quality</strong> - Recommended for most users</li>
<li><strong>Budget</strong> - Lowest cost, good for simple monitoring</li>
</ul>
<p>Click any card to instantly switch to that model. You can also choose from the full model dropdown for more options.</p>
<h4>Step 4: Optimize with Model Overrides</h4>
<p>You can customize AI models at three levels:</p>
<ol>
<li><strong>Workspace default</strong> - applies to all pages</li>
<li><strong>Template override</strong> - applies to pages using that template</li>
<li><strong>Page override</strong> - applies to individual pages</li>
</ol>
<p><strong>Example strategy</strong>:</p>
<ul>
<li>Workspace default: Gemini 2.5 Flash Lite (cheapest)</li>
<li>E-commerce template: GPT-5.4 Mini (good balance)</li>
<li>Legal template: Claude Sonnet 4.6 (high accuracy)</li>
<li>Critical page: Claude Opus 4.8 (most capable)</li>
</ul>
<h3>Tips for Optimizing Costs</h3>
<ol>
<li><strong>Start with the Good Quality tier</strong> - GPT-5.4 Mini, Gemini 3.1 Flash, or Claude Haiku 4.5 offer excellent quality at reasonable prices</li>
<li><strong>Use templates</strong> - Group similar pages with the same model to optimize costs by content type</li>
<li><strong>Check frequency doesn't affect AI costs</strong> - AI only runs when changes occur, not on every check</li>
<li><strong>Set monthly limits</strong> - Use the "AI Requests Per Month" setting to cap spending</li>
<li><strong>Monitor token usage</strong> - Check the token statistics in your AI settings to understand your actual costs</li>
<li><strong>Use per-page daily limits</strong> - Prevent frequently updating pages from consuming all your budget</li>
</ol>
<h3>Privacy and Data Security Considerations</h3>
<table>
<thead>
<tr>
<th>Provider</th>
<th>Data Usage</th>
<th>Best For</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>OpenAI/Anthropic</strong></td>
<td>API data not used for training</td>
<td>Confidential content, legal docs</td>
</tr>
<tr>
<td><strong>Google Gemini</strong></td>
<td>Review Google's data policies</td>
<td>General monitoring</td>
</tr>
<tr>
<td><strong>OpenRouter</strong></td>
<td>Varies by underlying model. Enable Privacy Mode to restrict to non-training providers.</td>
<td>Flexible choice</td>
</tr>
</tbody>
</table>
<p>When using included AI credits, content is processed through PageCrawl's managed AI infrastructure. When using BYOK, content is sent directly to your chosen provider.</p>
<p><strong>Data protection policies</strong>: <a href="https://openai.com/policies/api-data-usage-policies">OpenAI</a>, <a href="https://www.anthropic.com/privacy">Anthropic</a>, <a href="https://cloud.google.com/terms/data-processing-addendum">Google</a></p>
<p><strong>Privacy note</strong>: Free tier models (including some OpenRouter models) may use your data for training. Use paid tiers for sensitive content.</p>
<h3>FAQ</h3>
<p><strong>Do I need BYOK to use AI?</strong> No. All plans include AI credits that work automatically. BYOK is optional for users who want unlimited usage or specific model control.</p>
<p><strong>What happens when my credits run out?</strong> Page monitoring continues normally, but AI summaries pause until credits reset next month. You can also switch to BYOK for unlimited usage.</p>
<p><strong>Can I switch between credits and BYOK?</strong> Yes, at any time in Settings &gt; Workspace &gt; Integrations &gt; AI.</p>
<p><strong>Can I switch models after starting?</strong> Yes. Changes apply immediately to new checks. Historical data remains intact.</p>
<p><strong>Do I pay for checks that don't find changes?</strong> No. AI only runs when pages actually change.</p>
<p><strong>Can I use different models for different pages?</strong> Yes, via workspace defaults, template overrides, and page-level overrides.</p>
<p><strong>Why is OpenRouter recommended as the default BYOK provider?</strong> OpenRouter gives you access to models from all major providers with a single API key. You can switch models anytime without changing keys, set spending limits in the OpenRouter dashboard, and enable Privacy Mode to control data handling.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/ai-powered-change-detection">AI-Powered Change Detection and Smart Filtering</a> - Learn how AI summarization and Importance Scoring work</li>
<li><a href="/help/integrations/article/ai-byok-setup-guide">AI Integration Setup Guide (BYOK)</a> - Step-by-step guide to configure your API keys</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[PageCrawl Browser Extension]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/browser-extension-guide" />
            <id>https://pagecrawl.io/76</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>PageCrawl Browser Extension</h1>
<p>The PageCrawl browser extension lets you instantly add any webpage to your monitoring list with just a few clicks. View recent changes, switch between workspaces, and start monitoring new pages - all without leaving your current tab.</p>
<h3>Installation</h3>
<p>The PageCrawl extension is available for:</p>
<ul>
<li><strong>Chrome</strong>: <a href="https://chromewebstore.google.com/detail/pagecrawl-website-change/ofiinglodfpodfghggakcadoloidhpla">Install from Chrome Web Store</a></li>
<li><strong>Firefox</strong>: <a href="https://addons.mozilla.org/en-US/firefox/addon/pagecrawl-web-change-monitor/">Install from Firefox Add-ons</a></li>
<li><strong>Safari, Edge, Brave, or any other browser</strong>: Save the <a href="/bookmarklet">PageCrawl bookmarklet</a> to your bookmarks bar instead. Click it on any page and PageCrawl picks up where you are, no extension required.</li>
</ul>
<div class="kb-figure">
  <img src="/images/blog/browser-extension-extensions-panel.png" alt="PageCrawl in Extensions Panel">
</div>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Click the pin icon next to PageCrawl to keep it visible in your browser toolbar for quick access.
</div>
<h3>Getting Started</h3>
<h4>1. Connect Your Account</h4>
<p>After installing the extension, click the PageCrawl icon in your browser toolbar. You'll see a welcome screen prompting you to log in.</p>
<div class="kb-figure">
  <img src="/images/blog/browser-extension-login.png" alt="PageCrawl Login Screen">
</div>
<ol>
<li>Click <strong>"Log In to PageCrawl"</strong></li>
<li>You'll be redirected to PageCrawl to authenticate</li>
<li>Once logged in, you'll be automatically connected</li>
</ol>
<p>If you don't have an account yet, click "Don't have an account? Sign up" to create one.</p>
<h4>2. View Recent Changes</h4>
<p>Once connected, the extension opens to your <strong>Recent Changes</strong> timeline. This shows the latest detected changes across all your monitored pages:</p>
<div class="kb-figure">
  <img src="/images/blog/browser-extension-timeline.png" alt="Browser Extension Timeline">
</div>
<ul>
<li><strong>AI Summaries</strong>: If enabled, you'll see AI-generated summaries of what changed</li>
<li><strong>Text Diffs</strong>: For text-based monitoring, you'll see the actual text additions (highlighted in green) and deletions (highlighted in red)</li>
<li><strong>Visual Changes</strong>: Shows the percentage of visual difference detected</li>
<li><strong>Price/Number Changes</strong>: Shows how the value changed (e.g., "increased by 10%")</li>
</ul>
<p>Click any change to open it directly in your dashboard and see the full details.</p>
<h4>3. Start Monitoring a Page</h4>
<p>To add a new page to your monitoring:</p>
<ol>
<li>Navigate to any webpage you want to monitor</li>
<li>Click the PageCrawl extension icon</li>
<li>Click <strong>"+ Track New Page"</strong></li>
<li>Choose your monitoring type and options</li>
<li>Click <strong>"Start Monitoring"</strong></li>
</ol>
<div class="kb-figure">
  <img src="/images/blog/browser-extension-track-page.png" alt="Track New Page Form">
</div>
<h3>Monitoring Types</h3>
<h4>Full Page Monitoring</h4>
<p>Best for: Blog posts, news articles, documentation pages</p>
<p>Monitors text content on the page. Choose your tracking level:</p>
<ul>
<li><strong>Everything on page</strong>: Monitors all text, including navigation and footers</li>
<li><strong>Content only</strong>: Excludes navigation, headers, and footers</li>
<li><strong>Reader mode</strong>: Focuses on the main article content only</li>
</ul>
<p><strong>Keyword Monitoring</strong>: Optionally enter keywords (comma-separated) to only be notified when specific words appear or disappear. Leave empty to be notified of all changes.</p>
<h4>Element Monitoring (Specific Area)</h4>
<p>Best for: Prices, stock status, specific data points</p>
<ol>
<li>Click <strong>"Click to Select Element"</strong></li>
<li>Hover over the page and click the element you want to monitor</li>
<li>The selector will be automatically captured</li>
<li>Confirm your selection</li>
</ol>
<p>You can also manually enter a CSS selector if you prefer.</p>
<p><strong>Track as Number</strong>: Enable this to extract numeric values from the element. This allows you to track trends and percentage changes over time.</p>
<p><strong>Keyword Monitoring</strong>: Same as Full Page - enter keywords to filter notifications.</p>
<h4>Visual Monitoring</h4>
<p>Best for: Charts, images, layouts, design changes</p>
<ol>
<li>Click <strong>"Draw Area on Page"</strong></li>
<li>Click and drag to select the area you want to monitor</li>
<li>Confirm your selection</li>
</ol>
<p>The extension will capture screenshots of this area and compare them for changes.</p>
<p><strong>Change Threshold</strong>: Set how much the area must change before you're notified:</p>
<ul>
<li>Any change (most sensitive)</li>
<li>Tiny (1%) - Very Minor (3%) - Minor (5%)</li>
<li>Moderate (10%) - Recommended for most cases</li>
<li>Significant (30%) - Very High (50%) - Extremely High (80%)</li>
</ul>
<h4>Price Monitoring</h4>
<p>Best for: Product pages, e-commerce sites</p>
<p>PageCrawl will automatically detect and track the main price on the page. This is optimized for common e-commerce platforms and product pages.</p>
<h3>Check Frequency</h3>
<p>Choose how often PageCrawl should check for changes:</p>
<ul>
<li>Options depend on your subscription plan</li>
<li>Paid plans offer more frequent checks</li>
</ul>
<h3>Right-Click Menu</h3>
<p>You can quickly access PageCrawl from any webpage using the right-click context menu:</p>
<ol>
<li>Right-click anywhere on a webpage</li>
<li>Select <strong>"Open in PageCrawl"</strong></li>
</ol>
<div class="kb-figure">
  <img src="/images/blog/browser-extension-context-menu.png" alt="Right-Click Context Menu">
</div>
<p><strong>What happens next depends on whether the page is already monitored:</strong></p>
<ul>
<li>
<p><strong>If the page is already monitored</strong>: You'll be taken directly to the page's dashboard where you can view change history, adjust settings, or check the current status.</p>
</li>
<li>
<p><strong>If the page is not monitored</strong>: You'll be taken to the page creation form with the URL pre-filled, ready to set up monitoring.</p>
</li>
</ul>
<h3>Header Actions</h3>
<p>The extension header provides quick access to:</p>
<ul>
<li><strong>PageCrawl Logo</strong>: Click to open your main dashboard</li>
<li><strong>Workspace Switcher</strong>: Switch between workspaces (if you have multiple)</li>
<li><strong>Help</strong> (question mark icon): Open this guide</li>
</ul>
<h3>More Options</h3>
<p>For advanced configuration (notifications, proxies, actions, etc.), click <strong>"More options →"</strong> below the Start Monitoring button. This opens the full page creation form on PageCrawl with your current settings pre-filled.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Add Pages to PageCrawl from iOS Safari]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/add-page-from-ios-safari" />
            <id>https://pagecrawl.io/77</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Add Pages to PageCrawl from iOS Safari</h1>
<h3>What Is This?</h3>
<p>Add any webpage to PageCrawl.io monitoring directly from Safari's Share Sheet on your iPhone or iPad. Just tap Share, tap the shortcut, and you're done.</p>
<h3>Install the Shortcut</h3>
<p>Tap the button below on your iPhone or iPad to install the "Add to PageCrawl" shortcut:</p>
<p><a href="https://www.icloud.com/shortcuts/0a26f8166104460e8825872e5d7c3128" class="kb-cta">Get the Shortcut</a></p>
<p>When prompted, tap <strong>Get Shortcut</strong> to install it.</p>
<h3>How to Use It</h3>
<ol>
<li>Open Safari and navigate to any page you want to monitor</li>
<li>Tap the <strong>Share</strong> button (square with arrow pointing up)</li>
<li>Scroll down and tap <strong>Add to PageCrawl</strong></li>
<li>PageCrawl.io opens with the URL pre-filled</li>
<li>Configure your monitoring options and save</li>
</ol>
<div class="kb-figure">
  <img src="/images/blog/share-add-to-pagecrawl.png" alt="Using the shortcut from Share Sheet">
</div>
<h3>Works on Mac Too</h3>
<p>This shortcut also works on macOS! In Safari on your Mac:</p>
<ol>
<li>Click the <strong>Share</strong> button in the toolbar</li>
<li>Select <strong>Shortcuts</strong> from the menu</li>
<li>Click <strong>Add to PageCrawl</strong></li>
</ol>
<p>Alternatively, for desktop browsers you can use our <a href="/bookmarklet">bookmarklet</a>, just drag it to your bookmarks bar for one-click access.</p>
<div class="kb-figure">
  <img src="/images/blog/mac-shortcuts.png" alt="Using the shortcut from Mac">
</div>
<h3>Why iOS Needs a Shortcut</h3>
<p>Android phones can install PageCrawl as a Progressive Web App and share pages to it directly from the system share sheet. iOS Safari doesn't support that yet, even after a PWA is added to the Home Screen. The Shortcut above is the fastest way to bridge that gap on iPhone and iPad.</p>
<h3>Android Users</h3>
<p>On Android, install PageCrawl as an app and share pages straight from any browser. See <a href="/help/tutorials/article/add-page-from-android">Add Pages to PageCrawl from Android</a> for the step-by-step. If you'd rather not install, our <a href="/bookmarklet">bookmarklet</a> works in any mobile browser too.</p>
<h3>Tips</h3>
<ul>
<li><strong>Pin to top of Share Sheet</strong>: Tap "Edit Actions..." at the bottom of the Share Sheet to move "Add to PageCrawl" to your favorites for quicker access.</li>
<li><strong>Works everywhere</strong>: This shortcut works in any app that shares URLs, including Safari, Chrome, Firefox, News apps, or anywhere with a Share button.</li>
<li><strong>Stay logged in</strong>: For the smoothest experience, make sure you're logged into PageCrawl.io in Safari.</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[What is the difference between Priority Support and Standard Support?]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/subscription/article/difference-between-ultimate-and-standard-support" />
            <id>https://pagecrawl.io/78</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>What is the difference between Priority Support and Standard Support?</h1>
<p>We aim to respond to your inquiries promptly but sometimes due to increased number of support requests, Enterprise and Ultimate customer requests/emails are prioritized over Standard customers. Therefore, the response time is faster, and you may expect a 'higher level' of support in case you are not able to set up the page the way you want.</p>
<p>For technical support our response times are prioritized according to your subscription plan:</p>
<ul>
<li>Free Forever Plan: Technical support not offered</li>
<li>Standard Plan: Within 72 hours (excluding weekends)</li>
<li>Enterprise Plan: Within 24 hours (excluding weekends)</li>
<li>Ultimate Plan: Within 24 hours (excluding weekends)</li>
</ul>]]>
            </summary>
                                    <updated>2026-03-05T10:31:13+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[PageCrawl.io + n8n integration]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/pagecrawl-n8n-integration" />
            <id>https://pagecrawl.io/79</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>PageCrawl.io + n8n integration</h1>
<div class="kb-figure">
  <img src="/images/knowledge/integ-n8n-setup.png" alt="n8n integration dialog in PageCrawl with community node install steps, available nodes, and workflow examples">
</div>
<p>PageCrawl.io provides dedicated n8n community nodes that integrate directly into your n8n instance. With the <strong>PageCrawl Trigger</strong> and <strong>PageCrawl</strong> nodes, you can trigger workflows when changes are detected and interact with the PageCrawl.io API to manage pages, retrieve diffs, and download screenshots, all from within n8n's visual workflow editor.</p>
<h3>Why integrate PageCrawl.io with n8n?</h3>
<p>n8n is a workflow automation tool that you can self-host or run in the cloud. By connecting PageCrawl.io to n8n, you can:</p>
<ol>
<li><strong>Keep data on your infrastructure</strong>: Run workflows on your own servers, keeping sensitive change data within your network.</li>
<li><strong>Build complex workflows visually</strong>: Use n8n's visual editor to chain together multiple steps, add conditional logic, and connect to hundreds of services.</li>
<li><strong>Avoid per-task pricing</strong>: Unlike hosted automation platforms, self-hosted n8n has no limits on the number of workflow executions.</li>
<li><strong>Connect to developer tools</strong>: Integrate directly with databases, APIs, Git repositories, and internal services that hosted platforms may not support.</li>
</ol>
<h3>Available nodes</h3>
<p>PageCrawl.io provides two n8n nodes:</p>
<h4>PageCrawl Trigger</h4>
<p>The trigger node starts your workflow automatically when something happens on a monitored page. Supported events:</p>
<ul>
<li><strong>Change Detected</strong>: Fires when a monitored page's content changes.</li>
<li><strong>Error</strong>: Fires when a page check fails (timeout, blocked, etc.).</li>
</ul>
<p>You can filter triggers by workspace and by specific page, or listen for changes across all pages in a workspace. The node automatically registers and cleans up webhooks with the PageCrawl.io API.</p>
<h4>PageCrawl (Action node)</h4>
<p>The action node lets you interact with the PageCrawl.io API within your workflows. Available resources and operations:</p>
<p><strong>Page operations</strong></p>
<ul>
<li><strong>Get</strong>: Retrieve details about a monitored page including recent check history.</li>
<li><strong>Quick Create</strong>: Add a new page to monitor with just a URL (auto-detects settings).</li>
<li><strong>Create (Advanced)</strong>: Add a page with full control over elements, actions, conditions, frequency, location, device, and more.</li>
<li><strong>Update</strong>: Modify settings on an existing monitored page.</li>
<li><strong>Delete</strong>: Remove a page from monitoring.</li>
<li><strong>Run Check Now</strong>: Trigger an immediate check on a page.</li>
</ul>
<p><strong>Check operations</strong></p>
<ul>
<li><strong>Get History</strong>: Retrieve check history for a page with change diffs.</li>
<li><strong>Get Diff Image</strong>: Download a visual diff image showing what changed.</li>
<li><strong>Get Diff HTML</strong>: Get the change diff as HTML markup.</li>
<li><strong>Get Diff Markdown</strong>: Get the change diff as Markdown text.</li>
</ul>
<p><strong>Screenshot operations</strong></p>
<ul>
<li><strong>Get Screenshot</strong>: Download the latest (or previous) screenshot of a page.</li>
<li><strong>Get Screenshot Diff</strong>: Download a side-by-side visual comparison screenshot.</li>
</ul>
<h3>Setting up the integration</h3>
<h4>Step 1: Install the PageCrawl community node</h4>
<ol>
<li>Open your n8n instance and go to <strong>Settings</strong> &gt; <strong>Community Nodes</strong>.</li>
<li>Click <strong>Install a community node</strong>.</li>
<li>Enter <code>@pagecrawl/n8n-nodes-pagecrawl</code> as the package name.</li>
<li>Click <strong>Install</strong> and confirm the installation.</li>
<li>Restart n8n if prompted.</li>
</ol>
<h4>Step 2: Add your API credentials</h4>
<ol>
<li>In your <a href="https://pagecrawl.io">PageCrawl.io</a> account, go to <strong>Settings</strong> &gt; <strong>API</strong> and copy your API key.</li>
<li>In n8n, go to <strong>Credentials</strong> and create a new <strong>PageCrawl API</strong> credential.</li>
<li>Paste your API key and save.</li>
</ol>
<h4>Step 3: Create a workflow with the trigger</h4>
<ol>
<li>Create a new workflow in n8n.</li>
<li>Add the <strong>PageCrawl Trigger</strong> node.</li>
<li>Select your workspace and (optionally) a specific page to monitor.</li>
<li>Choose which events to listen for: change detected, error, or both.</li>
<li>Click <strong>Listen for Test Event</strong> to verify the connection. The node will automatically send a test event so you can see the data format.</li>
</ol>
<h4>Step 4: Add workflow actions</h4>
<p>With the trigger in place, add any n8n nodes to define what happens when a change is detected. Some examples:</p>
<ul>
<li><strong>Store changes in a database</strong> using the PostgreSQL, MySQL, or MongoDB nodes.</li>
<li><strong>Create a GitHub or GitLab issue</strong> for your team to review the change.</li>
<li><strong>Summarize the change with AI</strong> using the OpenAI or Anthropic nodes.</li>
<li><strong>Send a notification</strong> to Matrix, Mattermost, or any platform with an API.</li>
<li><strong>Trigger an incident</strong> in PagerDuty or Opsgenie for critical page changes.</li>
</ul>
<p>You can also add the <strong>PageCrawl</strong> action node mid-workflow to fetch additional data, such as downloading a diff image to attach to a notification or retrieving the full page details.</p>
<h4>Step 5: Activate</h4>
<p>Once your workflow is tested and working, activate it so it runs automatically whenever changes are detected.</p>
<h3>Example workflow ideas</h3>
<ul>
<li><strong>Compliance monitoring</strong>: When a vendor's terms of service change, use the PageCrawl node to get the diff as Markdown, store it in a database, create a Jira ticket for legal review, and notify the compliance team on Slack.</li>
<li><strong>Competitor intelligence</strong>: When a competitor updates their pricing page, get the diff HTML, summarize the key changes with OpenAI, log them in a spreadsheet, and send a summary to your sales channel.</li>
<li><strong>Visual regression tracking</strong>: When a page changes, download the screenshot diff image, attach it to a GitHub issue, and alert the design team for review.</li>
<li><strong>Uptime and integrity checks</strong>: Listen for error events, trigger a PagerDuty incident, and post an alert to your ops channel when a critical page becomes unreachable.</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[User Access Roles and Permissions]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/account-settings/article/user-access-roles" />
            <id>https://pagecrawl.io/80</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>User Access Roles and Permissions</h1>
<p>PageCrawl uses role-based access control to manage what each team member can do. There are four roles, each with different permission levels. Manage members and their roles under <strong>Settings &gt; Team &gt; Users</strong>.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-users.png" alt="Users settings page listing team members with their email, assigned workspaces, and role, plus the Invite User button">
</div>
<h3>Available Roles</h3>
<table>
<thead>
<tr>
<th>Role</th>
<th style="text-align: center;">Manage Team</th>
<th style="text-align: center;">Manage Workspaces</th>
<th style="text-align: center;">Edit Pages</th>
<th style="text-align: center;">View Pages</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Owner</strong></td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td><strong>Administrator</strong></td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td><strong>Standard User</strong></td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
<td style="text-align: center;">Yes</td>
</tr>
<tr>
<td><strong>Viewer</strong></td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">No</td>
<td style="text-align: center;">Yes</td>
</tr>
</tbody>
</table>
<h3>Owner</h3>
<p>Each team has exactly one Owner (the account creator). The Owner has full control over all team settings, billing, and member management. Ownership cannot be transferred or removed.</p>
<h3>Administrator</h3>
<p>Administrators can manage the team on behalf of the Owner:</p>
<ul>
<li>Invite and remove team members</li>
<li>Change member roles</li>
<li>Assign workspace access to members</li>
<li>Create and delete workspaces</li>
<li>Edit all team and workspace settings (notifications, integrations, AI, etc.)</li>
<li>Full access to all workspaces</li>
</ul>
<h3>Standard User</h3>
<p>Standard Users can work within their assigned workspaces:</p>
<ul>
<li>View and edit monitored pages in assigned workspaces</li>
<li>Create new pages and tracked elements</li>
<li>Review changes and leave feedback</li>
<li>Access all monitoring features within their workspaces</li>
</ul>
<p>Standard Users cannot invite members, change roles, or access workspaces they haven't been assigned to.</p>
<h3>Viewer</h3>
<p>Viewers have read-only access to their assigned workspaces:</p>
<ul>
<li>View monitored pages and detected changes</li>
<li>Browse change history and reports</li>
<li>Cannot create, edit, or delete pages</li>
<li>Cannot modify any settings</li>
</ul>
<h3>Managing Team Members</h3>
<p>To manage roles and access:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Team</strong> &gt; <strong>Users</strong></li>
<li>View the member list showing name, email, workspaces, and role</li>
<li>Click a member's role to change it (Owner and Administrator only)</li>
<li>Click <strong>Update</strong> in the Workspaces column to assign or revoke workspace access</li>
</ol>
<h3>Inviting New Members</h3>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Team</strong> &gt; <strong>Users</strong></li>
<li>Click <strong>Invite User</strong></li>
<li>Enter their email address and select a role</li>
<li>The invite expires after 2 weeks. You can resend it if needed.</li>
</ol>
<h3>Workspace Access</h3>
<p>Members only see workspaces they've been assigned to. Administrators can assign workspace access per user. If all workspace access is removed from a user, they are removed from the team entirely.</p>
<p>This means you can have team members who only see specific projects, clients, or departments without exposure to other workspaces.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Advanced Configuration Options for Power Users]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/advanced-configuration" />
            <id>https://pagecrawl.io/81</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Advanced Configuration Options for Power Users</h1>
<p>PageCrawl offers advanced configuration options for users who need fine-grained control over their monitoring setup. This guide covers the key power-user features.</p>
<h3>Power User Mode</h3>
<p>When editing a monitored page, you can enable <strong>Power User</strong> mode using the toggle in the page settings. This reveals additional settings that are hidden by default to keep the interface clean for everyday use.</p>
<p>With Power User mode enabled, you get access to:</p>
<ul>
<li><strong>Engine selection</strong> - Choose between the default browser engine, Stealth Mode (for sites that block bots), or Fast mode (optimized for static pages)</li>
<li><strong>Intelligent Reconnect</strong> - Automatically retry failed checks with a different approach</li>
<li><strong>Custom User Agent</strong> - Set a specific browser user agent string</li>
<li><strong>Custom Headers</strong> - Add custom HTTP headers to requests</li>
<li><strong>Custom JavaScript</strong> - Run JavaScript code before or after page load</li>
<li><strong>Device emulation</strong> - Emulate specific device viewports</li>
</ul>
<p>Power User settings are marked with a special icon throughout the edit form so you can easily identify them.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-power-user.png" alt="Power User Settings enabled in the page editor, revealing Engine, Intelligent Reconnect, Device Simulation, User-Agent, Request Headers, and Custom Proxies options">
</div>
<h3>Advanced Mode vs Simple Mode</h3>
<p>PageCrawl offers two ways to add and edit monitored pages:</p>
<p><strong>Simple Mode</strong> (default) guides you through setup step by step. It auto-detects the best settings, shows a live preview, and covers the most common use cases. Best for getting started quickly.</p>
<p><strong>Advanced Mode</strong> gives you full control over every setting in a single form. Use it when you need to:</p>
<ul>
<li>Track multiple elements on the same page simultaneously</li>
<li>Configure complex action sequences</li>
<li>Set up templates or apply existing ones</li>
<li>Fine-tune notification conditions per element</li>
<li>Work with custom selectors, thresholds, and comparison methods</li>
</ul>
<p>You can switch to Advanced Mode from the Simple Mode page by clicking the "Advanced setup" link at the bottom. If you prefer to always use Advanced Mode, check the "Always show Advanced Setup" option.</p>
<h3>Multiple Tracked Elements</h3>
<p>Each monitored page can track multiple elements simultaneously, each with its own comparison method:</p>
<table>
<thead>
<tr>
<th>Type</th>
<th>What It Tracks</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Full Page</strong></td>
<td>Entire page text content</td>
</tr>
<tr>
<td><strong>Text</strong></td>
<td>Text content of a specific element (by CSS/XPath selector)</td>
</tr>
<tr>
<td><strong>Number</strong></td>
<td>Numeric values with configurable change thresholds</td>
</tr>
<tr>
<td><strong>Price</strong></td>
<td>Price values with currency detection</td>
</tr>
<tr>
<td><strong>Availability</strong></td>
<td>In-stock/out-of-stock status</td>
</tr>
<tr>
<td><strong>Links</strong></td>
<td>All outgoing links on the page</td>
</tr>
<tr>
<td><strong>Visual</strong></td>
<td>Visual screenshot comparison with diff percentage</td>
</tr>
<tr>
<td><strong>HTML</strong></td>
<td>Raw HTML structure of an element</td>
</tr>
<tr>
<td><strong>Boolean</strong></td>
<td>Presence or absence of an element</td>
</tr>
<tr>
<td><strong>Feed/List</strong></td>
<td>RSS, Atom, or other feed content</td>
</tr>
<tr>
<td><strong>Rating</strong></td>
<td>Star ratings or review scores</td>
</tr>
<tr>
<td><strong>Reviews</strong></td>
<td>Customer review text and metadata</td>
</tr>
<tr>
<td><strong>JavaScript</strong></td>
<td>Values extracted by running custom JavaScript</td>
</tr>
<tr>
<td><strong>SEO Tags</strong></td>
<td>Meta tags, Open Graph data, and structured data</td>
</tr>
<tr>
<td><strong>PDF</strong></td>
<td>Text content extracted from PDF files</td>
</tr>
<tr>
<td><strong>Word</strong></td>
<td>Text content extracted from Word documents</td>
</tr>
<tr>
<td><strong>Excel</strong></td>
<td>Data extracted from Excel spreadsheets</td>
</tr>
<tr>
<td><strong>CSV</strong></td>
<td>Data extracted from CSV files</td>
</tr>
<tr>
<td><strong>PowerPoint</strong></td>
<td>Text content extracted from PowerPoint presentations</td>
</tr>
</tbody>
</table>
<p>Each tracked element can have its own set of <a href="/help/features/article/perform-actions">actions</a> and comparison settings.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-tracked-elements.png" alt="Multiple tracked elements configured on a single page, each with its own type and comparison method">
</div>
<h3>Templates</h3>
<p>Templates let you save a monitoring configuration and apply it to multiple pages automatically. This is especially useful when combined with <a href="/help/features/article/page-discovery">Page Discovery</a> for auto-monitoring newly discovered pages.</p>
<p>To create a template:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Templates</strong></li>
<li>Enter a sample URL to auto-fill settings</li>
<li>Configure tracked elements, actions, check frequency, and notifications</li>
<li>Save the template</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-template-create.png" alt="Template Details form with a template label, sample URL, check frequency, and proxy location">
</div>
<p>Templates can also define URL filters for page discovery, so new pages matching your criteria are automatically monitored with the template's settings.</p>
<h3>Bulk Editing</h3>
<p>Edit settings across multiple pages at once:</p>
<ol>
<li>Select pages from your page list using the checkboxes</li>
<li>Click <strong>Bulk Edit</strong> in the toolbar</li>
<li>Choose what to change: check frequency, engine, proxy, actions, notifications, tags, or folder</li>
<li>Apply changes to all selected pages</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/bulk-edit.png" alt="Tracked pages with rows selected and the Bulk actions menu open">
</div>
<p>Available on paid plans.</p>
<h3>AI Configuration</h3>
<p>Configure AI-powered change analysis per workspace:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Integrations</strong> &gt; <strong>AI</strong></li>
<li>Choose your AI provider (OpenAI, Gemini, or Anthropic)</li>
<li>Select a model</li>
<li>Optionally set focus areas to guide the AI on what changes matter most</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/integ-ai-settings.png" alt="AI Features Configuration in workspace integrations with provider, model, and credit usage">
</div>
<p>Each plan includes monthly AI credits. You can also bring your own API key (BYOK) for unlimited usage. See <a href="/help/integrations/article/ai-byok-setup-guide">AI BYOK Setup</a> for details.</p>
<h3>Custom Check Scheduling</h3>
<p>Control exactly when PageCrawl checks your pages:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>Schedule</strong></li>
<li>Set active monitoring hours (e.g., business hours only)</li>
<li>Choose which days of the week to run checks</li>
<li>Set the workspace timezone</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-schedule.png" alt="Workspace schedule settings with quick presets, day selection, an active-hours slider, and a schedule summary">
</div>
<p>This helps reduce unnecessary checks during off-hours and keeps your check quota focused on the times that matter.</p>
<h3>Global Filters</h3>
<p>Apply text filters across all pages in a workspace:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Workspace</strong> &gt; <strong>General</strong></li>
<li>Add global ignored text patterns</li>
<li>These patterns are excluded from change detection on every page in the workspace</li>
</ol>
<p>Useful for filtering out dynamic content like timestamps, ad copy, or session IDs that appear across many pages.</p>
<h3>Proxy Configuration</h3>
<p>Choose where PageCrawl checks your pages from:</p>
<ul>
<li><strong>Default</strong> - Automatic server selection</li>
<li><strong>Proxy Pool</strong> - Use your own proxies (managed under Settings → Proxy Pools) for pages behind firewalls or geo-restrictions</li>
<li><strong>Location-specific</strong> - Select from available proxy locations (London, New York, San Francisco, Toronto, Frankfurt, Tel Aviv)</li>
<li><strong>Residential</strong> - Use residential IP addresses for pages that block datacenter IPs</li>
</ul>
<p>Configure per page or apply via bulk edit.</p>]]>
            </summary>
                                    <updated>2026-06-19T09:27:55+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[JavaScript Tracked Elements and Custom JavaScript Actions]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/javascript-tracking-and-actions" />
            <id>https://pagecrawl.io/82</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>JavaScript Tracked Elements and Custom JavaScript Actions</h1>
<p>PageCrawl lets you use JavaScript in two powerful ways: as a <strong>tracked element</strong> to extract and monitor computed values, and as a <strong>custom action</strong> to manipulate the page before monitoring. Both run JavaScript directly in the browser context with full access to the DOM.</p>
<h3>JavaScript Tracked Element</h3>
<p>A JavaScript tracked element lets you execute JavaScript code on a page and monitor the return value for changes. This is useful when the data you want to track is not directly accessible via CSS or XPath selectors, for example computed values, data attributes, or content that requires logic to extract.</p>
<p><strong>How to set it up:</strong></p>
<ol>
<li>Add a new tracked element to your monitored page</li>
<li>Select <strong>JavaScript</strong> as the element type</li>
<li>Enter your JavaScript code in the code field</li>
<li>The return value of your code becomes the monitored content</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/js-code-editor.png" alt="Tracked Elements section in Advanced mode with JavaScript selected as the type and a multi-line code snippet entered in the code editor, plus a Test button">
</div>
<p><strong>How it works:</strong> Your JavaScript code runs directly in the browser, giving it full access to the page's DOM, window object, and all standard browser APIs. The return value is captured and compared against the previous check to detect changes.</p>
<p><strong>Examples:</strong></p>
<p>Extract the page title:</p>
<pre><code class="language-javascript">document.title</code></pre>
<p>Get text from a specific element:</p>
<pre><code class="language-javascript">document.querySelector('.status-badge').innerText</code></pre>
<p>Count the number of items in a list:</p>
<pre><code class="language-javascript">document.querySelectorAll('.job-listing').length</code></pre>
<p>Extract a data attribute:</p>
<pre><code class="language-javascript">document.querySelector('[data-version]').getAttribute('data-version')</code></pre>
<p>Combine multiple values into one:</p>
<pre><code class="language-javascript">Array.from(document.querySelectorAll('.feature-list li')).map(el =&gt; el.textContent.trim()).join(', ')</code></pre>
<p>Extract JSON-LD structured data:</p>
<pre><code class="language-javascript">JSON.parse(document.querySelector('script[type="application/ld+json"]').textContent).name</code></pre>
<p>Count words on a page:</p>
<pre><code class="language-javascript">document.body.innerText.split(/\s+/).filter(w =&gt; w.length &gt; 0).length</code></pre>
<h3>Advanced Examples</h3>
<p>For multi-line logic, wrap your code in an immediately invoked function:</p>
<p>Extract a software version number from a release page:</p>
<pre><code class="language-javascript">(() =&gt; {
  const text = document.querySelector('.release-header, [class*="version"]')?.textContent || '';
  const match = text.match(/v?(\d+\.\d+\.\d+)/);
  return match ? match[1] : 'Version not found';
})()</code></pre>
<p>Build a summary from a table:</p>
<pre><code class="language-javascript">(() =&gt; {
  const rows = document.querySelectorAll('table tbody tr');
  return Array.from(rows).map(row =&gt; {
    const cells = row.querySelectorAll('td');
    return Array.from(cells).map(c =&gt; c.textContent.trim()).join(' | ');
  }).join('\n');
})()</code></pre>
<p>Count job listings by department:</p>
<pre><code class="language-javascript">(() =&gt; {
  const jobs = document.querySelectorAll('.job-listing');
  const departments = {};
  jobs.forEach(job =&gt; {
    const dept = job.querySelector('.department')?.textContent.trim() || 'Other';
    departments[dept] = (departments[dept] || 0) + 1;
  });
  return Object.entries(departments).map(([k, v]) =&gt; `${k}: ${v}`).join('\n');
})()</code></pre>
<p>Extract all outbound links from a page:</p>
<pre><code class="language-javascript">(() =&gt; {
  const host = window.location.hostname;
  const links = Array.from(document.querySelectorAll('a[href]'))
    .map(a =&gt; a.href)
    .filter(href =&gt; href.startsWith('http') &amp;&amp; !href.includes(host));
  return [...new Set(links)].join('\n');
})()</code></pre>
<p>Monitor the number of open issues or pull requests:</p>
<pre><code class="language-javascript">(() =&gt; {
  const text = document.querySelector('[data-tab-item="issues"] .Counter, .issues-count')?.textContent.trim();
  return text ? parseInt(text.replace(/,/g, ''), 10) : 'Not found';
})()</code></pre>
<p>Extract and format event dates from a schedule page:</p>
<pre><code class="language-javascript">(() =&gt; {
  const events = document.querySelectorAll('.event-item, .schedule-row');
  return Array.from(events).map(ev =&gt; {
    const date = ev.querySelector('.date, time')?.textContent.trim();
    const title = ev.querySelector('.title, .event-name')?.textContent.trim();
    return `${date}: ${title}`;
  }).join('\n');
})()</code></pre>
<p><strong>Important notes:</strong></p>
<ul>
<li>Your code should return a value (string, number, or any value that can be converted to text)</li>
<li>If the return value is <code>null</code> or <code>undefined</code>, an empty string is stored</li>
<li>Errors in your code will cause the check to fail for that element</li>
<li>A <strong>30-second safety timeout</strong> applies. If the expression does not resolve within 30 seconds, the engine aborts it and the check fails for that element. Keep extraction logic synchronous when you can; if you need awaits or short waits, budget for the cap.</li>
<li>JavaScript tracked elements require a real browser engine (not compatible with Fast mode)</li>
</ul>
<h3>Custom JavaScript Actions</h3>
<p>Custom JavaScript actions let you run JavaScript code on the page as part of the action sequence, before the tracked elements are extracted. Use them for complex interactions that other action types (click, type, wait) cannot handle.</p>
<p><strong>How to set it up:</strong></p>
<ol>
<li>Open the page settings and go to the <strong>Actions</strong> section</li>
<li>Add a new action and select <strong>Custom JavaScript</strong></li>
<li>Enter your JavaScript code</li>
<li>The code runs during the check, before element extraction</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/action-types-dropdown.png" alt="Action type dropdown in the page editor where Custom JavaScript and other action types are selected">
</div>
<p><strong>How it works:</strong> The JavaScript runs in the browser context, similar to tracked elements. The key difference is that the return value is ignored. JavaScript actions are used for their side effects: modifying the DOM, triggering events, or setting up the page state needed for accurate monitoring.</p>
<p><strong>When to use JavaScript actions:</strong> PageCrawl has built-in actions for common tasks like clicking elements, typing text, scrolling, waiting, removing elements, and selecting dropdown options. Use JavaScript actions when you need to do something the built-in actions cannot handle, such as setting browser storage, dispatching custom events, modifying element properties, or running multi-step DOM manipulation.</p>
<h4>Recommended pattern: async IIFE</h4>
<p>For anything beyond a single statement, write your action as a self-executed async arrow function. This is the recommended shape across the UI, MCP server, and public API because it lets you <code>await</code> between steps and keep a small <code>sleep</code> helper inline. Return values are ignored for actions; the function runs purely for its side effects.</p>
<pre><code class="language-javascript">(async () =&gt; {
  const sleep = ms =&gt; new Promise(r =&gt; setTimeout(r, ms));
  document.querySelector('#load-more')?.click();
  await sleep(800);
  document.querySelector('#load-more')?.click();
  await sleep(800);
})();</code></pre>
<p>This action clicks "Load more" twice, waiting briefly between clicks so the page can render the new rows before the next click and before extraction begins. <code>await sleep(800)</code> is the part a sync IIFE cannot do: it lets the next line wait until the previous step has had time to settle.</p>
<div class="kb-figure">
  <img src="/images/knowledge/js-action-editor.png" alt="Actions section in Advanced mode with a Custom JavaScript action selected and an async IIFE entered in the action code editor">
</div>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Custom JavaScript actions and JavaScript tracked elements run with a 30-second safety timeout. If your code does not finish within 30 seconds, the engine aborts it and continues with the next action (so a runaway <code>while (true)</code> or an indefinite poll loop cannot hang the check). Inside that window you can chain multiple awaits, polls, and event dispatches; just keep the total bounded. Separately, depending on your plan the overall check has its own timeout limit (Free 45 seconds, Standard 90 seconds, Enterprise and Ultimate 180 seconds).
</div>
<p>The native action types (<code>click</code>, <code>wait</code>, <code>wait_text</code>, <code>type</code>, <code>select</code>) are still preferred for single-step interactions because they're shorter and the engine produces clearer logs. Reach for a custom JavaScript action when you need to sequence multiple steps with delays in between, dispatch framework events, or poll for an element that may take an unpredictable amount of time to appear.</p>
<p><strong>Examples:</strong></p>
<p>The examples below use the sync IIFE shape because each one is a single side-effect. Use the async IIFE pattern above whenever your action needs to wait, poll, or sequence multiple steps.</p>
<p>Set localStorage or sessionStorage to change page behavior:</p>
<pre><code class="language-javascript">localStorage.setItem('region', 'us-east')</code></pre>
<p>Set a cookie to bypass a language selector or A/B test:</p>
<pre><code class="language-javascript">document.cookie = 'lang=en; path=/; max-age=86400'</code></pre>
<p>Replace dynamic content (session IDs, timestamps, random tokens) with static text to reduce false positives:</p>
<pre><code class="language-javascript">document.querySelectorAll('[data-session-id], .csrf-token, .nonce').forEach(el =&gt; el.textContent = '[REDACTED]')</code></pre>
<p>Trigger a framework event that a regular click action does not fire (e.g., React, Vue, Angular):</p>
<pre><code class="language-javascript">(() =&gt; {
  const input = document.querySelector('#search-input');
  const nativeInputValueSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, 'value').set;
  nativeInputValueSetter.call(input, 'monitoring keywords');
  input.dispatchEvent(new Event('input', { bubbles: true }));
})()</code></pre>
<p>Toggle a checkbox and dispatch both change and click events to satisfy form validation:</p>
<pre><code class="language-javascript">(() =&gt; {
  const checkbox = document.querySelector('#agree-terms');
  checkbox.checked = true;
  checkbox.dispatchEvent(new Event('change', { bubbles: true }));
  checkbox.dispatchEvent(new Event('click', { bubbles: true }));
})()</code></pre>
<p>Switch a page to a specific view mode by modifying URL parameters without a full reload:</p>
<pre><code class="language-javascript">(() =&gt; {
  const url = new URL(window.location);
  url.searchParams.set('view', 'list');
  url.searchParams.set('per_page', '100');
  window.history.replaceState({}, '', url);
  window.dispatchEvent(new PopStateEvent('popstate'));
})()</code></pre>
<p>Expand all collapsed sections at once on a FAQ or documentation page:</p>
<pre><code class="language-javascript">document.querySelectorAll('details:not([open])').forEach(el =&gt; el.setAttribute('open', ''))</code></pre>
<p>Remove inline styles that hide content behind a paywall or login wall:</p>
<pre><code class="language-javascript">(() =&gt; {
  document.querySelectorAll('.article-body, .content-area').forEach(el =&gt; {
    el.style.maxHeight = 'none';
    el.style.overflow = 'visible';
    el.classList.remove('truncated', 'blurred', 'paywall');
  });
  document.querySelectorAll('.paywall-overlay, .signup-gate').forEach(el =&gt; el.remove());
})()</code></pre>
<p><strong>Important notes:</strong></p>
<ul>
<li>Custom JavaScript actions and JavaScript tracked elements run with a <strong>30-second safety timeout</strong>. If your code does not finish within 30 seconds, the engine aborts it and continues with the next action (so a runaway <code>while (true)</code> or an indefinite poll loop cannot hang the check). Inside that window you can chain multiple awaits, polls, and event dispatches; just keep the total bounded.</li>
<li>In the default engine, JavaScript action errors are silently caught and the check continues. In Stealth mode, action errors will stop the remaining action sequence by default</li>
<li>Actions run after the page has loaded but before elements are extracted</li>
<li>You can chain multiple JavaScript actions with other action types (click, wait, type)</li>
<li>JavaScript actions require a real browser engine (not compatible with Fast mode)</li>
</ul>
<h3>Difference Between JavaScript Elements and Actions</h3>
<table>
<thead>
<tr>
<th></th>
<th>JavaScript Tracked Element</th>
<th>Custom JavaScript Action</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Purpose</strong></td>
<td>Extract and monitor a value</td>
<td>Manipulate the page before extraction</td>
</tr>
<tr>
<td><strong>Return value</strong></td>
<td>Captured and tracked for changes</td>
<td>Ignored</td>
</tr>
<tr>
<td><strong>Error handling</strong></td>
<td>Check fails if code errors</td>
<td>Default engine: errors silently ignored. Stealth mode: errors stop the action sequence</td>
</tr>
<tr>
<td><strong>When it runs</strong></td>
<td>During element extraction</td>
<td>Before element extraction (in action sequence)</td>
</tr>
<tr>
<td><strong>Use case</strong></td>
<td>"Get me this computed value"</td>
<td>"Set up the page so I can monitor it correctly"</td>
</tr>
</tbody>
</table>
<h3>Common Patterns</h3>
<p><strong>Extract then monitor:</strong> Use a JavaScript action to set up the page (e.g., click "Load more"), then use a regular Text or Full Page tracked element to capture the content. This is often simpler than writing a JavaScript tracked element.</p>
<p><strong>Normalize before compare:</strong> Use a JavaScript action to replace dynamic content (timestamps, session IDs, random values) with static placeholders, then track the normalized page content. This reduces false positives without needing global filters.</p>
<p><strong>Complex extraction:</strong> When the value you want to monitor requires logic (math, filtering, combining multiple elements), use a JavaScript tracked element instead of trying to target it with CSS selectors.</p>
<h3>What JavaScript Has Access To</h3>
<p>Your code runs in the browser page context with full access to:</p>
<ul>
<li><strong>DOM API</strong> - <code>document.querySelector()</code>, <code>document.body</code>, <code>document.title</code>, etc.</li>
<li><strong>Window object</strong> - <code>window.location</code>, <code>window.innerWidth</code>, <code>window.scrollTo()</code>, etc.</li>
<li><strong>Standard JavaScript</strong> - String methods, Array methods, Math, JSON, RegExp, etc.</li>
<li><strong>Browser APIs</strong> - <code>localStorage</code>, <code>sessionStorage</code>, <code>fetch()</code>, etc.</li>
<li><strong>Page state</strong> - Any JavaScript variables or functions defined by the page itself</li>
</ul>
<p>Your code does not have access to Node.js APIs or the file system.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Monitoring Multiple Elements on a Page]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/monitoring-multiple-elements-on-page" />
            <id>https://pagecrawl.io/83</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Monitoring Multiple Elements on a Page</h1>
<p>PageCrawl lets you track multiple parts of the same page independently. Each tracked element gets its own comparison method, selector, label, and threshold, so you can monitor different sections of a page with the settings that make the most sense for each one.</p>
<h3>Why Track Multiple Elements</h3>
<p>Different parts of a page often change in different ways. For example, on a product page you might want to:</p>
<ul>
<li>Track the <strong>price</strong> using the Price element type so you are alerted when it goes up or down</li>
<li>Track the <strong>stock status</strong> using the Availability element type so you know when an item is back in stock</li>
<li>Track the <strong>product description</strong> as text so you catch content updates</li>
</ul>
<p>Each of these uses a dedicated element type designed for that kind of data, giving you more precise alerts and fewer false positives than tracking the entire page as a single unit.</p>
<h3>Supported Element Types</h3>
<p>Each tracked element can use one of these comparison types:</p>
<table>
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Full Page</strong></td>
<td>Tracks the entire visible page content</td>
</tr>
<tr>
<td><strong>Parse</strong></td>
<td>Uses an AI prompt to extract a specific value from the page in the format you ask for</td>
</tr>
<tr>
<td><strong>Text</strong></td>
<td>Extracts and compares text content from a CSS/XPath selector</td>
</tr>
<tr>
<td><strong>Number</strong></td>
<td>Extracts a numeric value for threshold-based comparison</td>
</tr>
<tr>
<td><strong>Price</strong></td>
<td>Specialized number extraction that handles currency symbols and formatting</td>
</tr>
<tr>
<td><strong>Availability</strong></td>
<td>Detects in-stock/out-of-stock status from common patterns</td>
</tr>
<tr>
<td><strong>Visual</strong></td>
<td>Compares screenshots of a specific element for visual changes</td>
</tr>
<tr>
<td><strong>HTML</strong></td>
<td>Compares the raw HTML of a selected element</td>
</tr>
<tr>
<td><strong>Boolean</strong></td>
<td>Checks whether an element exists or is visible on the page</td>
</tr>
<tr>
<td><strong>Links</strong></td>
<td>Extracts and compares all links within a selected area</td>
</tr>
<tr>
<td><strong>JavaScript</strong></td>
<td>Evaluates a custom JavaScript expression and tracks the return value</td>
</tr>
<tr>
<td><strong>Text (All Matches)</strong></td>
<td>Extracts text from all elements matching a selector</td>
</tr>
<tr>
<td><strong>Text (All Matches Sorted)</strong></td>
<td>Same as above, but sorted alphabetically for order-independent comparison</td>
</tr>
<tr>
<td><strong>HTML (All Matches)</strong></td>
<td>Extracts HTML from all elements matching a selector</td>
</tr>
</tbody>
</table>
<h3>How to Add Multiple Elements</h3>
<ol>
<li>Open the page you want to monitor and click <strong>Edit</strong></li>
<li>Switch to <strong>Advanced Mode</strong> using the toggle at the top of the editor</li>
<li>You will see your current tracked element listed</li>
<li>Click <strong>Add Element</strong> to add another tracked element</li>
<li>Configure each element with its own selector, type, label, and threshold</li>
<li>Save your changes</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/multi-elements-configured.png" alt="Tracked Elements section in Advanced Mode with six elements of different types: Full Page Text, Price, Availability, Visual, Number, and Parse, each with its own label, selector or prompt, and threshold">
</div>
<p>The example above shows a single product page configured with six tracked elements, each a different type: the whole page as <strong>Full Page Text</strong>, the <strong>Price</strong> (with availability tracking), the <strong>Availability</strong> state from the Add to Cart button, the <strong>Visual</strong> product gallery, the <strong>Number</strong> of reviews, and a <strong>Parse</strong> element that uses an AI prompt to extract the battery life and connectivity spec. Each one compares independently and triggers its own alert.</p>
<h3>Simple vs Advanced Mode</h3>
<ul>
<li><strong>Simple Mode</strong> tracks a single element on the page. This is the default for new monitors and is the easiest way to get started.</li>
<li><strong>Advanced Mode</strong> unlocks the ability to track multiple elements. Switch to Advanced Mode using the toggle in the page editor.</li>
</ul>
<p>Once you add more than one tracked element, the monitor stays in Advanced Mode. To return to Simple Mode, remove the extra elements first so only one remains.</p>
<h3>Per-Element Settings</h3>
<p>Each tracked element has its own independent settings:</p>
<ul>
<li><strong>Label</strong> - A descriptive name for the element (e.g., "Product Price", "Stock Status")</li>
<li><strong>Selector</strong> - A CSS selector or XPath expression that identifies the element on the page</li>
<li><strong>Type</strong> - The comparison method to use (text, number, visual, etc.)</li>
<li><strong>Threshold</strong> - How much the value needs to change before triggering a notification</li>
<li><strong>Include hidden text</strong> - Whether to include text from elements hidden via CSS</li>
</ul>
<h3>Click-to-Select</h3>
<p>You do not need to write CSS selectors or XPath expressions manually. Use the visual selector tool to click on elements directly on the page. PageCrawl generates the appropriate selector for you automatically.</p>
<h3>Use Cases</h3>
<p><strong>Product page monitoring</strong> - Use the Price element type for the product price, the Availability element type for stock status, and a Text element for the product description. Each triggers its own alert so you know exactly what changed.</p>
<p><strong>Content sections and sidebar tracking</strong> - Monitor the main article content as text and the sidebar navigation as HTML. Catch content updates without being distracted by layout changes.</p>
<p><strong>Multi-section compliance monitoring</strong> - Track terms of service, privacy policy sections, and legal disclaimers as separate elements on the same page. Each section triggers its own alert when updated.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/advanced-configuration">Advanced Configuration</a></li>
<li><a href="/help/features/article/available-tracked-monitoring-types">Available Tracked Types</a></li>
<li><a href="/help/features/article/perform-actions">Perform Actions</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Perform Actions: Automate Browser Interactions Before Monitoring]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/perform-actions" />
            <id>https://pagecrawl.io/84</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Perform Actions: Automate Browser Interactions Before Monitoring</h1>
<p>Actions are tasks that PageCrawl executes in the browser before taking a page snapshot. They let you automate interactions like dismissing cookie banners, clicking tabs, logging in, scrolling to load content, or waiting for dynamic elements to appear.</p>
<p>Actions are configured per tracked element and execute in order from top to bottom.</p>
<h3>Where to Configure Actions</h3>
<p>Open any monitored page and click <strong>Edit</strong>. In the page configuration form, find the <strong>Actions</strong> section. Click <strong>Add Action</strong> to add a new action, then select the action type from the dropdown.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-actions.png" alt="Actions section in the page configuration form with Block cookie banners and Hide overlays actions and the Add Action button">
</div>
<h3>Available Actions</h3>
<h4>Error Handling</h4>
<table>
<thead>
<tr>
<th>Action</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Mark as failed</strong></td>
<td>Mark the check as failed when conditions are met (page inaccessible, contains specific text, etc.)</td>
</tr>
</tbody>
</table>
<h4>Block and Hide</h4>
<table>
<thead>
<tr>
<th>Action</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Block cookie banners &amp; ads</strong></td>
<td>Automatically hide cookie consent banners and block ads</td>
</tr>
<tr>
<td><strong>Hide website overlays &amp; popups</strong></td>
<td>Hide website overlays and popups</td>
</tr>
<tr>
<td><strong>Remove dates</strong></td>
<td>Replace dates with "[DATE REMOVED]" to prevent false positives</td>
</tr>
<tr>
<td><strong>Remove element</strong></td>
<td>Remove a specific element by CSS or XPath selector</td>
</tr>
<tr>
<td><strong>Remove text</strong></td>
<td>Remove elements containing specific text</td>
</tr>
</tbody>
</table>
<h4>Wait</h4>
<table>
<thead>
<tr>
<th>Action</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Wait for text</strong></td>
<td>Wait up to 15 seconds for specific text to appear on the page</td>
</tr>
<tr>
<td><strong>Wait for text to disappear</strong></td>
<td>Wait up to 15 seconds for specific text to disappear</td>
</tr>
<tr>
<td><strong>Wait for element</strong></td>
<td>Wait for an element (by XPath or CSS selector) to appear</td>
</tr>
<tr>
<td><strong>Wait for redirect</strong></td>
<td>Wait for the page to redirect to a new URL</td>
</tr>
<tr>
<td><strong>Wait</strong></td>
<td>Pause for a specified number of seconds</td>
</tr>
</tbody>
</table>
<h4>Interact</h4>
<table>
<thead>
<tr>
<th>Action</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Click button</strong></td>
<td>Click an element containing specific text</td>
</tr>
<tr>
<td><strong>Click element</strong></td>
<td>Click any element by CSS or XPath selector</td>
</tr>
<tr>
<td><strong>Click at coordinates</strong></td>
<td>Click at specific X/Y pixel coordinates</td>
</tr>
<tr>
<td><strong>Hover</strong></td>
<td>Hover over an element</td>
</tr>
<tr>
<td><strong>Type text</strong></td>
<td>Type text into an input field</td>
</tr>
<tr>
<td><strong>Select option</strong></td>
<td>Select an option from a dropdown</td>
</tr>
<tr>
<td><strong>Submit form</strong></td>
<td>Submit a form</td>
</tr>
<tr>
<td><strong>Scroll to bottom</strong></td>
<td>Scroll the page to the bottom (useful for lazy-loaded content)</td>
</tr>
<tr>
<td><strong>Go back</strong></td>
<td>Navigate back in browser history</td>
</tr>
<tr>
<td><strong>Reveal hidden text</strong></td>
<td>Make hidden text visible. Has two modes: "Expandable Sections Only" (expands collapsible sections and accordions) and "All invisible text" (reveals all hidden text on the page)</td>
</tr>
</tbody>
</table>
<h4>Advanced</h4>
<table>
<thead>
<tr>
<th>Action</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Disable JavaScript</strong></td>
<td>Disable JavaScript before the page loads</td>
</tr>
<tr>
<td><strong>Set cookie</strong></td>
<td>Set or manage browser cookies</td>
</tr>
<tr>
<td><strong>Execute JavaScript</strong></td>
<td>Run custom JavaScript code on the page</td>
</tr>
<tr>
<td><strong>Store Contents for Tracked Element</strong></td>
<td>Store a tracked element's value at this point in the action sequence, useful when the element is only visible after a specific interaction</td>
</tr>
<tr>
<td><strong>Handle CAPTCHA</strong></td>
<td>Interact with CAPTCHA challenges</td>
</tr>
</tbody>
</table>
<h3>Common Use Cases</h3>
<p><strong>Dismiss cookie banners</strong>: Add a "Block cookie banners &amp; ads" action to automatically hide consent popups and ads that can trigger false change notifications.</p>
<p><strong>Load lazy content</strong>: Add "Scroll to bottom" followed by "Wait" (2-3 seconds) to load content that only appears when scrolling.</p>
<p><strong>Navigate to a tab or section</strong>: Add a "Click element" action with the CSS selector of the tab you want to monitor.</p>
<p><strong>Login to a page</strong>: Add "Type text" actions for username and password fields, followed by "Click button" to submit the login form.</p>
<p><strong>Wait for dynamic content</strong>: Add "Wait for text" with the text that appears after the page finishes loading (e.g., "Showing results").</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Review Board: Organize and Track Page Changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/review-board" />
            <id>https://pagecrawl.io/85</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Review Board: Organize and Track Page Changes</h1>
<p>The Review Board is a Kanban-style board that helps you organize and track detected changes across your monitored pages. Instead of reviewing changes one by one, you can drag and drop change cards between customizable lanes to manage your review workflow.</p>
<h3>Accessing the Review Board</h3>
<p>Navigate to the <strong>Review</strong> tab in the main sidebar to open the board.</p>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/review-board.png" alt="Kanban-style Review Board with To Review, Flagged, and Reviewed lanes holding change cards with AI summaries and priority scores">
</div>
<h3>How It Works</h3>
<p>Each time PageCrawl detects a change on one of your monitored pages, a card appears on the board. Cards show:</p>
<ul>
<li>Page name and URL</li>
<li>Time since the change was detected</li>
<li>Visual difference percentage</li>
<li>AI priority score and importance tag (if AI is enabled)</li>
</ul>
<p>Click any card to view the full change details, timeline, and AI summary.</p>
<h3>Customizing Lanes</h3>
<p>By default, the board includes three lanes: <strong>To Review</strong>, <strong>Reviewed</strong>, and <strong>Flagged</strong>. You can customize these to match your workflow:</p>
<ol>
<li>Click the <strong>+</strong> button to add a new lane</li>
<li>Give the lane a name and pick a color</li>
<li>Drag lanes to reorder them</li>
<li>Click the gear icon in the lane header to edit or delete it</li>
</ol>
<p>Common lane setups:</p>
<ul>
<li><strong>New / In Review / Done</strong> - Simple three-stage workflow</li>
<li><strong>New / Important / Needs Action / Archived</strong> - Priority-based workflow</li>
<li><strong>New / Design Team / Dev Team / Resolved</strong> - Team-based workflow</li>
</ul>
<h3>Filtering and Sorting</h3>
<p>Use the toolbar at the top of the board to filter changes:</p>
<ul>
<li><strong>Folders</strong> - Show changes from a specific folder</li>
<li><strong>Tags</strong> - Filter by label</li>
<li><strong>Website</strong> - Filter by website domain</li>
<li><strong>Date range</strong> - All time, Today, Yesterday, Last 7 days, Last 30 days, Last 90 days, This week, Last week, This month, Last month, This year, Last year, and Custom range</li>
<li><strong>Priority</strong> - Filter by AI priority score</li>
<li><strong>Sort</strong> - Order cards by most recent, oldest, or priority score</li>
</ul>
<h3>Feedback Auto-Review</h3>
<p>When enabled, giving thumbs-up or thumbs-down feedback on a change notification automatically moves the card to your "Reviewed" lane. Enable this from the gear icon menu on the board.</p>
<p>You can configure which lane cards move to after positive or negative feedback.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Sitemap Monitoring: Automatically Detect New Pages on Any Website]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/sitemap-monitoring" />
            <id>https://pagecrawl.io/86</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Sitemap Monitoring: Automatically Detect New Pages on Any Website</h1>
<p>Most websites maintain an XML sitemap listing every page on the site. They do this for SEO: a sitemap tells Google, Bing, and other search engines exactly which URLs exist, when each one was last modified, and how often it changes. Without a sitemap, search engines have to discover pages by crawling links one by one, which is slow and often misses freshly published or deeply nested content. Because Google rewards indexable content, almost every CMS (WordPress, Shopify, Squarespace, Wix, etc.) generates and publishes a sitemap automatically.</p>
<p>For change monitoring, that same sitemap is a goldmine - it is the website's own up-to-date list of every page that matters, maintained by the site itself. PageCrawl can monitor these sitemaps to detect new pages, removed URLs, and structural changes automatically.</p>
<div class="kb-figure">
  <img src="/images/knowledge/create-page.png" alt="Track New Page screen with the Scan a Website option used to monitor a site's sitemap">
</div>
<p>PageCrawl supports two distinct ways to monitor a sitemap, and you should pick the one that fits your goal:</p>
<ul>
<li><strong><a href="/help/features/article/page-discovery">Page Discovery (Scan a Website)</a></strong>: turns each new URL into its own tracked page with full change history, screenshots, content alerts, and AI summaries. Best for deep monitoring of individual pages.</li>
<li><strong><a href="/help/features/article/feed-tracking-mode">Feed tracking mode</a></strong>: treats the sitemap URL as a single tracked element and emits item-level alerts when URLs are added or removed. Best for lightweight new-URL alerts when you do not need per-page content tracking.</li>
</ul>
<p>Most teams pick one or the other for a given site depending on whether they need deep per-page tracking or just new-URL alerts.</p>
<h3>Approach 1: Page Discovery (Scan a Website)</h3>
<p>This is the heavy-duty approach. Each new URL discovered in the sitemap becomes its own tracked page in your workspace, with full change history, screenshots, content alerts, and AI summaries.</p>
<h4>How it works</h4>
<ol>
<li>PageCrawl downloads the website's XML sitemap on your configured schedule</li>
<li>New URLs are compared against the previous scan</li>
<li>Newly discovered pages are matched against your filters</li>
<li>You receive a notification listing the new pages</li>
<li>Optionally, matched pages are auto-monitored for content changes</li>
</ol>
<h4>Setting it up</h4>
<ol>
<li>Click <strong>Track New Page</strong> and select <strong>Scan a Website</strong></li>
<li>Enter the website URL (e.g., <code>competitor.com</code>)</li>
<li>PageCrawl automatically detects the sitemap</li>
<li>Set your check frequency and add filters</li>
<li>Enable notifications and optionally enable auto-monitoring</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/discovery-add-website.png" alt="Add Website for Discovery dialog with website URL, what to track options (all pages, top-level only, review first), and scan frequency">
</div>
<h4>Filtering discovered pages</h4>
<p>Large websites may add many pages between checks. Filters help you focus on what matters:</p>
<ul>
<li><strong>URL filters</strong> - Match by path patterns (e.g., <code>/products/</code>, <code>/blog/2026/*</code>)</li>
<li><strong>Exclude filters</strong> - Skip irrelevant sections (e.g., <code>/products/accessories/</code>)</li>
<li><strong>Title/content filters</strong> - Match against page title or body text after fetching</li>
</ul>
<p>Exclude filters always take priority over include filters. You can combine multiple filter types.</p>
<h4>Auto-monitoring</h4>
<p>When auto-monitoring is enabled, pages matching your filters are automatically added to your monitoring workspace. For example:</p>
<ol>
<li>A competitor publishes a new product page on Monday</li>
<li>Sitemap monitoring discovers the URL the same day</li>
<li>From Tuesday onward, PageCrawl tracks that page for price and content changes</li>
</ol>
<p>No manual setup required. Combined with <a href="/help/features/article/organized-page-monitoring">templates</a>, auto-monitored pages inherit your preferred check frequency, notification channels, and tracking settings.</p>
<h4>Beyond sitemaps</h4>
<p>Not all websites have complete sitemaps. PageCrawl supplements sitemap monitoring with additional discovery methods:</p>
<ul>
<li><strong>Base URL Link Discovery</strong> - Extracts all links from a specific page</li>
<li><strong>Deep Scan</strong> - Follows links multiple levels deep with JavaScript rendering</li>
<li><strong>Automatic Mode</strong> - Runs all discovery methods together and deduplicates results</li>
</ul>
<p>See <a href="/help/features/article/page-discovery">Page Discovery</a> for full details on all discovery methods.</p>
<h4>Plan limits</h4>
<p>Sitemap monitoring via Page Discovery is available on all plans:</p>
<table>
<thead>
<tr>
<th>Plan</th>
<th>Pages per Website</th>
</tr>
</thead>
<tbody>
<tr>
<td>Free</td>
<td>Up to 2,000</td>
</tr>
<tr>
<td>Standard</td>
<td>Up to 20,000</td>
</tr>
<tr>
<td>Enterprise</td>
<td>Up to 100,000</td>
</tr>
<tr>
<td>Ultimate</td>
<td>Up to 100,000</td>
</tr>
</tbody>
</table>
<p>All plans include filters, notifications, and auto-monitoring.</p>
<h3>Approach 2: Feed Tracking Mode</h3>
<p>This is the lightweight approach. Instead of creating one tracked page per URL, the entire sitemap becomes a single tracked element. You get an alert when URLs are added or removed, but PageCrawl does not fetch or track the content of each page.</p>
<h4>How it works</h4>
<ol>
<li>PageCrawl fetches the sitemap XML on your configured schedule</li>
<li>The XML is parsed into a list of items - one per <code>&lt;url&gt;</code> entry</li>
<li>Each item is identified by its <code>&lt;loc&gt;</code> URL (the stable key)</li>
<li>The new list is compared against the previous check using the keys</li>
<li>You receive a notification listing the URLs that were added or removed</li>
</ol>
<p>There is only one Change record in your workspace - the sitemap monitor itself - regardless of how many URLs the sitemap contains.</p>
<h4>Setting it up</h4>
<ol>
<li>Click <strong>Track New Page</strong></li>
<li>Paste the sitemap URL directly (e.g., <code>competitor.com/sitemap.xml</code>) and click <strong>Load Page</strong></li>
<li>PageCrawl parses the XML, shows an <strong>XML Sitemap detected</strong> badge, and adds a <strong>Feed</strong> option under <em>What to Track</em>. Select <strong>Feed</strong>.</li>
<li>Confirm the detected list shows the URLs you expect</li>
<li>Adjust the <strong>Track first N items</strong> cap and pick what to alert on (items added, removed, content or order changes)</li>
<li>Choose your notification channels and save</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/sitemap-simplecreate.png" alt="Track New Page in Feed mode after pasting a sitemap URL, showing the XML Sitemap detected badge, the parsed list of URLs, the Track first N items cap, and the alert options">
</div>
<p>When a sitemap holds more URLs than your plan's per-feed cap, PageCrawl shows a notice (for example, <em>Monitoring the first 10 of 200 items</em>) and reminds you that sitemaps are not guaranteed to be newest-first. If you need every page tracked, use Page Discovery instead.</p>
<h4>The item limit</h4>
<p>Feeds are capped at a per-plan number of items so a 50,000-URL sitemap does not produce 50,000-item JSON blobs on every check:</p>
<table>
<thead>
<tr>
<th>Plan</th>
<th>Maximum Items Per Feed</th>
</tr>
</thead>
<tbody>
<tr>
<td>Free</td>
<td>10</td>
</tr>
<tr>
<td>Standard</td>
<td>100</td>
</tr>
<tr>
<td>Enterprise</td>
<td>1,000</td>
</tr>
<tr>
<td>Ultimate</td>
<td>10,000</td>
</tr>
</tbody>
</table>
<p>Items are returned in document order. For RSS and Atom feeds this is fine because the newest items are conventionally at the top, but <strong>sitemaps do not guarantee that</strong>. If your sitemap has more URLs than your plan cap, the UI shows a notice and suggests either raising the cap or using Page Discovery instead, which has no per-feed cap (it uses your monitor quota).</p>
<p>For sites with both a sitemap and an RSS or Atom feed, the RSS/Atom feed is usually a better choice for Feed mode because new content is guaranteed to appear at the top. Try <code>/feed</code>, <code>/rss</code>, or <code>/atom.xml</code> on the site.</p>
<h4>When to choose Feed mode</h4>
<ul>
<li>You only need new-URL alerts, not per-page change tracking</li>
<li>The site has a small or medium sitemap that fits inside your plan's item cap</li>
<li>You do not want each URL consuming a monitor slot from your plan</li>
</ul>
<p>For fully-fledged monitoring with per-page change history, screenshots, content alerts, AI summaries, and proper handling of large sitemaps, use <strong><a href="/help/features/article/page-discovery">Page Discovery (Scan a Website)</a></strong> instead. Feed mode is intentionally minimal - it is a fast way to get new-URL notifications without the overhead of tracking each page, but it cannot replace Page Discovery for serious change monitoring.</p>
<h4>Sitemap vs RSS coverage (important)</h4>
<p>If you are choosing between monitoring a site's sitemap and its RSS or Atom feed, the two are not equivalent:</p>
<ul>
<li><strong>A sitemap lists every indexable URL on the site.</strong> A WordPress blog with 500 posts will have all 500 in <code>sitemap.xml</code>. New posts appear there as soon as the CMS regenerates the sitemap.</li>
<li><strong>An RSS or Atom feed is typically a rolling window of the most recent 10 to 20 posts.</strong> Older entries fall off the end as new ones arrive. The feed is designed for "what is new", not "what exists".</li>
</ul>
<p>For tracking new content, both work - the RSS feed is usually more reliable because new posts are guaranteed to appear at the top, but you cannot use the RSS feed to discover the site's full back catalog. Use the sitemap when you need complete URL coverage and the RSS feed when you only care about new content.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/feed-tracking-mode">Feed tracking mode</a> - lightweight alternative that treats the sitemap as a single tracked feed instead of auto-creating per-page monitors</li>
<li><a href="/help/features/article/page-discovery">Page Discovery</a> - other discovery methods (URL Scanning, Deep Crawl, Automatic Mode)</li>
<li><a href="/help/features/article/organized-page-monitoring">Organized page monitoring</a> - templates and folders for keeping auto-monitored pages tidy</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Web Archiving with WACZ: Preserve Full Page Snapshots]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/web-archiving-wacz" />
            <id>https://pagecrawl.io/87</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Web Archiving with WACZ: Preserve Full Page Snapshots</h1>
<p>PageCrawl can automatically create a full web archive of your monitored pages every time a change is detected. Archives capture the complete page (HTML, CSS, images, scripts) so you can replay it exactly as it appeared at that moment.</p>
<p>Archives are saved in the WACZ (Web Archive Collection Zipped) format, an open standard for web archiving used by libraries, governments, and legal teams worldwide.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Available on Ultimate plan.
</div>
<h3>How It Works</h3>
<ol>
<li>PageCrawl detects a change on a monitored page</li>
<li>A full WACZ archive is created capturing the complete page state</li>
<li>The archive is stored securely in the cloud</li>
<li>You can replay the archived page at any time from the change history</li>
</ol>
<p>If WACZ generation fails (e.g., due to complex page structure), PageCrawl falls back to creating a self-contained HTML archive instead.</p>
<p>Enable archiving with the <strong>Web Archive</strong> toggle in the page editor's Crawling Preferences:</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-crawling-preferences.png" alt="Crawling Preferences with the Web Archive toggle enabled to capture a WACZ archive on every change">
</div>
<h3>How Archives Differ from Screenshots</h3>
<p>PageCrawl offers both screenshots and web archives, but they serve different purposes:</p>
<table>
<thead>
<tr>
<th></th>
<th>Screenshot</th>
<th>Web Archive (WACZ)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>What it captures</strong></td>
<td>A flat image of the visible page</td>
<td>The complete page: HTML, CSS, JavaScript, images, fonts</td>
</tr>
<tr>
<td><strong>Interactivity</strong></td>
<td>None (static image)</td>
<td>Fully interactive: scroll, click links, hover over elements</td>
</tr>
<tr>
<td><strong>Content below the fold</strong></td>
<td>Only if full-page screenshot is enabled</td>
<td>Always included, the entire page is preserved</td>
</tr>
<tr>
<td><strong>Dynamic content</strong></td>
<td>Shows one visual state</td>
<td>Preserves interactive elements, dropdowns, tabs</td>
</tr>
<tr>
<td><strong>File size</strong></td>
<td>Small (typically under 1 MB)</td>
<td>Larger (includes all page assets)</td>
</tr>
<tr>
<td><strong>Best for</strong></td>
<td>Quick visual reference, visual diff comparison</td>
<td>Legal evidence, compliance records, full preservation</td>
</tr>
</tbody>
</table>
<p>Screenshots are great for a quick visual snapshot and for visual change detection (highlighting pixel differences). Web archives go further by preserving the entire page so you can interact with it later exactly as it appeared.</p>
<h3>How PageCrawl Archives Differ from Archive.org</h3>
<p>The Internet Archive (archive.org) and PageCrawl both preserve web pages, but they work very differently:</p>
<p><strong>Archive.org (Wayback Machine):</strong></p>
<ul>
<li>Public, community-driven project that crawls the open web</li>
<li>Snapshots are taken on their own schedule (often weeks or months apart)</li>
<li>No control over when or how often pages are archived</li>
<li>Pages behind logins, paywalls, or bot protection are usually not captured</li>
<li>Anyone can view the archived pages</li>
<li>No change detection or notifications</li>
</ul>
<p><strong>PageCrawl Web Archiving:</strong></p>
<ul>
<li>Private to your account, stored securely in the cloud</li>
<li>Archives are created automatically every time a change is detected</li>
<li>You control the check frequency (every 5 minutes to daily)</li>
<li>Works with pages behind logins using <a href="/help/features/article/perform-actions">browser actions</a> (click, type, wait)</li>
<li>Works with pages behind bot protection</li>
<li>Archives are paired with change detection, so you know exactly what changed and when</li>
<li>Download WACZ files for offline storage or legal use</li>
</ul>
<p>In short, archive.org is best for general public web preservation. PageCrawl archiving is designed for active monitoring where you need precise, private, frequent snapshots tied to detected changes.</p>
<h3>Viewing Archives</h3>
<p>To view an archived page:</p>
<ol>
<li>Open a monitored page and go to its change history</li>
<li>Click on any check that has an archive (indicated by an archive icon)</li>
<li>The archive viewer opens, showing the page exactly as it appeared</li>
<li>Use the previous/next arrows to browse between archived versions</li>
</ol>
<p>The viewer uses ReplayWeb.page to render WACZ archives interactively in your browser. You can scroll, click links, and interact with the page as if you were browsing it live at that point in time.</p>
<h3>Downloading Archives</h3>
<p>You can download any archive file directly:</p>
<ol>
<li>Open the archive viewer for the check you want</li>
<li>Click the download button to save the WACZ file</li>
<li>Open it with any WACZ-compatible viewer (ReplayWeb.page, Webrecorder, etc.)</li>
</ol>
<p>Downloaded archives can be used for legal evidence, compliance records, or offline browsing.</p>
<h3>Use Cases</h3>
<ul>
<li><strong>Legal and compliance</strong> - Preserve evidence of website content at specific dates for disputes, contracts, or regulatory compliance</li>
<li><strong>Competitive intelligence</strong> - Keep a historical record of competitor pages, pricing, and product offerings</li>
<li><strong>Content auditing</strong> - Track how your own website evolves over time with complete snapshots</li>
<li><strong>Journalism</strong> - Archive source pages to preserve evidence that may be modified or removed</li>
</ul>
<h3>Enabling Archives</h3>
<p>Archiving is available on the Ultimate plan. Once it is enabled for your workspace, turn it on for any monitored page with the <strong>Web Archive</strong> toggle described above. If you do not see the toggle, contact support to enable archiving for your workspace.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Workspaces: Organize Monitoring by Project or Team]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/workspaces" />
            <id>https://pagecrawl.io/88</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Workspaces: Organize Monitoring by Project or Team</h1>
<p>Workspaces let you organize your monitored pages into separate environments, each with its own settings, notifications, and team member access. Use workspaces to separate monitoring by project, client, department, or any other grouping that makes sense for your workflow.</p>
<h3>What Each Workspace Gets</h3>
<p>Every workspace has independent settings for:</p>
<ul>
<li><strong>Monitored pages</strong> - Each workspace contains its own set of tracked pages</li>
<li><strong>Notification preferences</strong> - Separate email frequency, Slack/Discord/Teams/Telegram channels</li>
<li><strong>AI configuration</strong> - Different AI provider, model, and focus areas per workspace</li>
<li><strong>Check scheduling</strong> - Custom active hours and days for monitoring</li>
<li><strong>Timezone</strong> - Each workspace can use a different timezone</li>
<li><strong>Labels and tags</strong> - Workspace-specific labels for organizing pages</li>
<li><strong>Templates</strong> - Page discovery templates tied to each workspace</li>
</ul>
<h3>Creating a Workspace</h3>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Team</strong> &gt; <strong>Workspaces</strong></li>
<li>Click <strong>Add Workspace</strong></li>
<li>Enter a name for the workspace</li>
<li>Configure the workspace settings</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-workspaces.png" alt="Workspaces settings listing each workspace with its access, tracked pages, and usage, plus the Add Workspace button">
</div>
<h3>Switching Between Workspaces</h3>
<p>Use the workspace selector dropdown in the sidebar to switch between your workspaces. Each workspace shows its own set of pages, changes, and settings.</p>
<h3>Managing Access</h3>
<p>Administrators can control which team members have access to each workspace:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Team</strong> &gt; <strong>Workspaces</strong></li>
<li>Find the workspace in the list</li>
<li>Click <strong>Update</strong> in the Access column</li>
<li>Add or remove team members</li>
</ol>
<p>Members only see workspaces they've been assigned to. This lets you give client-facing teams access to client workspaces without exposing internal monitoring.</p>
<p>See <a href="/help/account-settings/article/user-access-roles">User Roles &amp; Permissions</a> for details on what each role can do.</p>
<h3>Common Setups</h3>
<p><strong>By client</strong>: One workspace per client, each with its own notification channels and team access.</p>
<p><strong>By department</strong>: Marketing monitors competitor pages, Legal monitors compliance pages, Product monitors feature pages, each in their own workspace.</p>
<p><strong>By priority</strong>: A "Critical" workspace with immediate notifications and frequent checks, and a "Background" workspace with weekly reports and less frequent checks.</p>
<p><strong>By region</strong>: Separate workspaces for different geographic regions, each with region-specific proxy settings and timezones.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Save Screenshots to Dropbox]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/dropbox-screenshot-sync" />
            <id>https://pagecrawl.io/89</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Save Screenshots to Dropbox</h1>
<p>PageCrawl can automatically save page screenshots to your Dropbox whenever a change is detected. This gives you a visual archive of every change, stored in your own cloud storage for easy access and sharing.</p>
<h3>How It Works</h3>
<p>When a change is detected on a monitored page and screenshots are enabled, PageCrawl uploads the screenshot to your chosen Dropbox folder. Files are organized by page name and timestamp:</p>
<pre><code>{your-folder}/{page-name}/{datetime}.jpg</code></pre>
<p>This makes it easy to browse through the history of visual changes for any monitored page.</p>
<h3>Setting Up Dropbox Sync</h3>
<ol>
<li>Go to <strong><a href="/app/settings/workspace/integrations">Settings &gt; Integrations</a></strong></li>
<li>Click <strong>Setup</strong> on the Dropbox integration</li>
<li>In the modal that opens, click <strong>Authenticate with Dropbox</strong></li>
<li>Authorize PageCrawl in the Dropbox OAuth window that opens</li>
<li>Select a folder in your Dropbox where screenshots should be stored</li>
</ol>
<p>Once connected, screenshots will be uploaded automatically whenever a change is detected on any of your monitored pages that have screenshots enabled.</p>
<div class="kb-figure">
  <img src="/images/knowledge/integ-dropbox-setup.png" alt="Dropbox Configuration dialog in PageCrawl for authenticating and choosing a screenshot folder">
</div>
<h3>Managing the Connection</h3>
<p>After connecting your Dropbox account, you can:</p>
<ul>
<li><strong>View account info</strong> - See which Dropbox account is connected</li>
<li><strong>Change folder</strong> - Select a different folder for screenshot storage</li>
<li><strong>Revoke access</strong> - Disconnect your Dropbox account to stop automatic uploads</li>
</ul>
<h3>Troubleshooting</h3>
<p>If your Dropbox access token expires, the connection is automatically disabled and you will receive a notification. Simply reconnect your Dropbox account at <strong>Settings &gt; Integrations</strong> to restore screenshot syncing.</p>
<h3>Availability</h3>
<p>Dropbox screenshot sync is available on all plans.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[AI Assistants (MCP Server)]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/mcp-server-ai-tools" />
            <id>https://pagecrawl.io/90</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>AI Assistants (MCP Server)</h1>
<p>PageCrawl includes a built-in MCP (Model Context Protocol) server that lets AI assistants manage your page monitors. You can add monitors, check history, trigger checks, and more, all through natural conversation with tools like Claude or ChatGPT.</p>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/mcp-flow.png" alt="How MCP works: you ask your AI assistant in natural language, it calls PageCrawl's MCP tools (add, list, history, check), which act on your PageCrawl account">
</div>
<p>MCP is an open protocol that standardizes how AI tools connect to external services. Once connected, your AI assistant can directly interact with your PageCrawl account without you needing to use the web interface or API manually.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Available on all plans. Free plan users can create monitors (add page monitor) and use all read tools (list monitors, view history, check diffs, get latest values). Your plan's monitor-count limits still apply. Triggering checks, managing tags, marking changes seen, and updating monitor defaults require a paid plan (Standard or above).
</div>
<h3>What You Can Do</h3>
<p>The MCP server provides the following tools that your AI assistant can use:</p>
<table>
<thead>
<tr>
<th>Tool</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Add page monitor</strong></td>
<td>Create a new monitor with URL, tracking mode, frequency, and notifications. Available on all plans, including Free</td>
</tr>
<tr>
<td><strong>List monitors</strong></td>
<td>Search and view monitors across all workspaces by URL, domain, or name</td>
</tr>
<tr>
<td><strong>Get monitor details</strong></td>
<td>See full configuration of a specific monitor including tracked elements and latest values. Supports batch requests</td>
</tr>
<tr>
<td><strong>Get monitor history</strong></td>
<td>Retrieve historical checks and detected changes with AI summaries. Supports batch requests</td>
</tr>
<tr>
<td><strong>Get latest values</strong></td>
<td>Quickly retrieve just the current values for one or more monitors (e.g., current price). Supports batch requests</td>
</tr>
<tr>
<td><strong>Get check diff</strong></td>
<td>View the actual text differences detected in a specific check</td>
</tr>
<tr>
<td><strong>Trigger check</strong></td>
<td>Trigger a one-off check on a monitor. Requires a paid plan</td>
</tr>
<tr>
<td><strong>Manage tags</strong></td>
<td>List workspace tags, or add and remove tags from monitors. Requires a paid plan</td>
</tr>
<tr>
<td><strong>Mark changes seen</strong></td>
<td>Mark detected changes as reviewed on one or all monitors. Requires a paid plan</td>
</tr>
<tr>
<td><strong>List templates</strong></td>
<td>View available templates that can be applied when creating monitors</td>
</tr>
<tr>
<td><strong>List workspaces</strong></td>
<td>View all your teams and workspaces with their IDs</td>
</tr>
<tr>
<td><strong>Update monitor defaults</strong></td>
<td>View or update default settings for new monitors created via MCP. Requires a paid plan</td>
</tr>
</tbody>
</table>
<h3>Supported Element Types</h3>
<p>When creating monitors through MCP, you can track the following element types:</p>
<ul>
<li><strong>Full Page</strong> - Entire page text content (no selector needed)</li>
<li><strong>Text</strong> - Text content of a specific element (CSS selector required)</li>
<li><strong>Number</strong> - Numeric values with change thresholds</li>
<li><strong>Price</strong> - Price values with currency detection</li>
<li><strong>HTML</strong> - Raw HTML structure of an element</li>
<li><strong>JavaScript</strong> - Execute JavaScript and track the result</li>
<li><strong>File Hash</strong> - Monitor file changes by checksum (no selector needed)</li>
<li><strong>PDF</strong> - Track changes in PDF documents (no selector needed)</li>
</ul>
<h3>Setting Up with Claude (Web &amp; Desktop)</h3>
<ol>
<li>Open <a href="https://claude.ai">claude.ai</a> or Claude Desktop and go to <strong>Settings</strong></li>
<li>Navigate to the <strong>Connectors</strong> section in the left sidebar</li>
<li>Click <strong>Add custom connector</strong> at the bottom of the page</li>
<li>Enter a name (e.g. "PageCrawl") and set the URL to: <code>https://pagecrawl.io/mcp</code></li>
<li>Click <strong>Add</strong>. You will be redirected to PageCrawl to authorize access</li>
<li>Log in (if not already) and click <strong>Approve</strong></li>
<li>PageCrawl tools are now available in your conversations</li>
</ol>
<div class="kb-figure">
  <img src="/images/help/claude-connectors-settings.png" alt="Connectors page showing PageCrawl configured">
</div>
<div class="kb-figure">
  <img src="/images/help/claude-add-connector-dialog.png" alt="Add custom connector dialog with PageCrawl name and MCP server URL">
</div>
<h3>Setting Up with Claude Code</h3>
<p>Add the following to your <code>.mcp.json</code> file (in your project root or <code>~/.claude/</code>):</p>
<pre><code class="language-json">{
  "mcpServers": {
    "pagecrawl": {
      "url": "https://pagecrawl.io/mcp"
    }
  }
}</code></pre>
<p>When Claude Code first tries to use PageCrawl tools, it will open a browser window for you to authorize the connection via OAuth.</p>
<h3>Setting Up with ChatGPT</h3>
<p>Works with ChatGPT on web, desktop, and mobile. Requires a ChatGPT Plus, Pro, Team, Enterprise, or Edu plan.</p>
<ol>
<li>Go to <a href="https://chatgpt.com">chatgpt.com</a> (or open the ChatGPT desktop app)</li>
<li>Navigate to <strong>Settings</strong> &gt; <strong>Connectors</strong> &gt; <strong>Create</strong></li>
<li>Enter a name (e.g. "PageCrawl"), a short description, and set the URL to: <code>https://pagecrawl.io/mcp</code></li>
<li>Click <strong>Create</strong>. You will be redirected to PageCrawl to authorize access</li>
<li>Log in and click <strong>Approve</strong></li>
<li>To use in a conversation, click the <strong>+</strong> button near the message input, select <strong>More</strong>, and enable PageCrawl</li>
</ol>
<h3>Setting Up with Other MCP Clients (OAuth)</h3>
<p>Any MCP-compatible client that supports OAuth can connect to PageCrawl. The server details:</p>
<ul>
<li><strong>URL:</strong> <code>https://pagecrawl.io/mcp</code></li>
<li><strong>Authentication:</strong> OAuth 2.0 (automatic via MCP protocol)</li>
<li><strong>Protocol:</strong> MCP over HTTP with JSON-RPC 2.0</li>
<li><strong>OAuth Discovery:</strong> <code>https://pagecrawl.io/.well-known/oauth-authorization-server</code></li>
</ul>
<p>The client will handle the OAuth flow automatically. No manual token setup is required.</p>
<h3>Setting Up with API Token (OpenClaw, Cursor, Cline, Windsurf, and others)</h3>
<p>For MCP clients that do not support OAuth, you can connect using a personal API token instead. This works with OpenClaw, Cursor, Cline, Windsurf, VS Code, Claude Code, and any other MCP client that supports custom headers.</p>
<p><strong>Step 1:</strong> Generate an API token in PageCrawl:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>API</strong></li>
<li>Click <strong>Create Token</strong></li>
<li>Give it a name (e.g. "OpenClaw") and click <strong>Create</strong></li>
<li>Copy the token. It will only be shown once.</li>
</ol>
<p><strong>Step 2:</strong> Add the following configuration to your MCP client. The JSON format below works with Cursor (<code>.cursor/mcp.json</code>), Cline, Windsurf (<code>.vscode/mcp.json</code>), Claude Code (<code>.mcp.json</code>), and most other clients:</p>
<pre><code class="language-json">{
  "mcpServers": {
    "pagecrawl": {
      "url": "https://pagecrawl.io/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_TOKEN_HERE"
      }
    }
  }
}</code></pre>
<p>For <strong>OpenClaw</strong>, use the CLI:</p>
<pre><code>openclaw mcp set pagecrawl \
  --transport streamable-http \
  --url https://pagecrawl.io/mcp \
  --header "Authorization: Bearer YOUR_TOKEN_HERE"</code></pre>
<p>For <strong>Cursor</strong>, you can also add via <strong>Settings</strong> &gt; <strong>MCP Servers</strong> &gt; <strong>Add</strong> &gt; <strong>Streamable HTTP</strong> and enter the URL and authorization header there.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> API tokens require a paid plan. Treat your token like a password. You can revoke tokens at any time from <strong>Settings</strong> &gt; <strong>API</strong>.
</div>
<h3>Example Conversations</h3>
<p>Once connected, you can interact with PageCrawl naturally:</p>
<p><strong>Adding monitors:</strong></p>
<blockquote>
<p>"Monitor example.com/pricing every hour and track the full page text"</p>
</blockquote>
<blockquote>
<p>"Set up price tracking for these 3 product pages: [url1], [url2], [url3]. Check every 15 minutes and notify me on Slack when prices drop."</p>
</blockquote>
<p><strong>Checking current values:</strong></p>
<blockquote>
<p>"What's the current price on my Amazon product monitor?"</p>
</blockquote>
<blockquote>
<p>"Compare the prices across all my competitor monitors right now"</p>
</blockquote>
<p><strong>Reviewing changes:</strong></p>
<blockquote>
<p>"Show me all monitors that changed in the last 24 hours with a summary of what changed"</p>
</blockquote>
<blockquote>
<p>"Show me the diff for the terms of service page. What exactly was added or removed?"</p>
</blockquote>
<p><strong>Analysis and reporting:</strong></p>
<blockquote>
<p>"Which of my monitors have had the most changes this month? Are there any patterns?"</p>
</blockquote>
<blockquote>
<p>"Give me a weekly summary: how many changes were detected across all monitors, which ones had price drops, and which ones had errors?"</p>
</blockquote>
<p><strong>Batch operations:</strong></p>
<blockquote>
<p>"Tag all monitors tracking amazon.com with 'competitor' and 'ecommerce'"</p>
</blockquote>
<blockquote>
<p>"Check the latest values for all monitors tagged 'pricing' and tell me which products are currently out of stock"</p>
</blockquote>
<p><strong>Troubleshooting:</strong></p>
<blockquote>
<p>"Are any of my monitors failing? Show me the ones with errors and what the issue is"</p>
</blockquote>
<blockquote>
<p>"The pricing page monitor hasn't detected changes in weeks. Trigger a fresh check and show me what it finds"</p>
</blockquote>
<p><strong>Setting up workflows:</strong></p>
<blockquote>
<p>"Create a monitor for each of these 5 competitor pricing pages. Use the 'competitor-tracking' template and tag them all as 'q2-research'"</p>
</blockquote>
<blockquote>
<p>"Monitor the SEC EDGAR page for new filings from Tesla. Use content-only mode so it ignores the navigation, check every 30 minutes"</p>
</blockquote>
<h3>Working with Workspaces</h3>
<p>All tools automatically search across every workspace you have access to. You do not need to know which workspace a monitor is in to find or interact with it.</p>
<ul>
<li>Use <strong>List monitors</strong> with the <code>search</code> parameter to find monitors by URL, domain, or name</li>
<li>Use <strong>List monitors</strong> with <code>workspace_id</code> to filter results to a specific workspace</li>
<li>Use <strong>List workspaces</strong> to see all your teams and workspaces with their IDs</li>
<li><strong>Add page monitor</strong> only requires a <code>workspace_id</code> if you have more than one workspace</li>
</ul>
<h3>Limits and Quotas</h3>
<p>MCP operations respect your plan's limits:</p>
<ul>
<li><strong>Monitor creation</strong> counts toward your page monitor quota</li>
<li><strong>Triggered checks</strong> are rate limited and run at a lower priority than scheduled checks, so they may take a while to complete. This is designed for occasional, manual use only (one or two checks at a time). It does not support programmatic or automated triggering - requests that exceed rate limits will be rejected with an error. Instead, configure the check frequency on each monitor and use scheduling settings to run checks at specific times.</li>
<li>If you exceed your monitor limit, new monitors are created in a disabled state</li>
<li>If you exceed your check limit, manual check triggers will be rejected</li>
</ul>
<p>See <a href="/help/subscription/article/is-there-limit-of-checks-in-standard-plan">Check Limits</a> and <a href="/help/subscription/article/is-there-limit-how-many-websites-i-can-add-to-monitor">Website Limits</a> for details on plan quotas.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Webhook Integration: Send Change Data to Any External Service]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/webhook-integration" />
            <id>https://pagecrawl.io/91</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Webhook Integration: Send Change Data to Any External Service</h1>
<p>Webhooks allow PageCrawl to send HTTP POST requests to any external URL whenever a page change is detected or an error occurs. Use webhooks to connect PageCrawl with custom applications, automation platforms, databases, or any service that accepts HTTP requests.</p>
<h3>Setting Up a Webhook</h3>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Webhooks</strong> (found under "Other" in the sidebar)</li>
<li>Click <strong>New webhook</strong></li>
<li>Enter your target URL and configure the options below</li>
<li>Click <strong>Save</strong></li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/settings-webhooks.png" alt="Webhooks settings page with the Configured Webhooks list and the New webhook button">
</div>
<h3>Configuration Options</h3>
<p><strong>Target URL</strong>: The HTTP endpoint that will receive the POST request.</p>
<p><strong>Event Triggers</strong>: Choose which events fire the webhook:</p>
<ul>
<li><strong>Change detected</strong> - Fires when page content changes</li>
<li><strong>Error</strong> - Fires when a check fails (timeout, blocked, 404, etc.)</li>
<li>Or both</li>
</ul>
<p><strong>Page Filter</strong>: Optionally limit which pages trigger the webhook. You can filter by:</p>
<ul>
<li>All pages in workspace (default)</li>
<li>By tag</li>
<li>By folder</li>
<li>By website/domain</li>
<li>Specific monitors</li>
</ul>
<p>If no filter is set, the webhook fires for all pages in the workspace.</p>
<p><strong>Active/Inactive Toggle</strong>: Disable a webhook without deleting it.</p>
<h3>Payload Fields</h3>
<p>By default, webhooks send all available fields. You can customize the payload by selecting only the fields you need:</p>
<table>
<thead>
<tr>
<th>Category</th>
<th>Fields</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Basic</strong></td>
<td>id, title, status, event_type, changed_at, visual_diff, difference, human_difference, short_summary</td>
</tr>
<tr>
<td><strong>Tracked Elements</strong></td>
<td>content_type, elements (array of tracked element data)</td>
</tr>
<tr>
<td><strong>Differences</strong></td>
<td>markdown_difference, html_difference</td>
</tr>
<tr>
<td><strong>Images</strong></td>
<td>text_difference_image, page_screenshot_image</td>
</tr>
<tr>
<td><strong>Page Info</strong></td>
<td>page metadata, page_elements array</td>
</tr>
<tr>
<td><strong>Content</strong></td>
<td>contents, original (for extracted values)</td>
</tr>
<tr>
<td><strong>Comparison</strong></td>
<td>previous_check data</td>
</tr>
<tr>
<td><strong>JSON</strong></td>
<td>json, json_patch</td>
</tr>
<tr>
<td><strong>AI</strong></td>
<td>ai_summary, ai_priority_score</td>
</tr>
</tbody>
</table>
<h3>Testing Webhooks</h3>
<p>After saving a webhook, click the <strong>Test</strong> button to send a sample payload to your endpoint. This verifies the connection works before relying on it for real notifications.</p>
<h3>Example Payload</h3>
<pre><code class="language-json">{
  "id": 12345,
  "title": "Product Page - Example.com",
  "status": "ok",
  "event_type": "change_detected",
  "content_type": "fullpage",
  "changed_at": "2026-01-15T10:30:00Z",
  "visual_diff": 12.5,
  "difference": 3,
  "human_difference": "3 lines changed",
  "short_summary": "Price updated from $99 to $89",
  "ai_summary": "The product price was reduced by 10%.",
  "ai_priority_score": 85
}</code></pre>
<h3>The page_elements Array</h3>
<p>When you include the <strong>Page Info</strong> fields, the payload contains a <code>page_elements</code> array with one entry per tracked element on the monitor. Each entry includes two identifiers:</p>
<ul>
<li><strong><code>element_id</code></strong> - the stable id of the tracked element. This id stays the same across every check, so use it to match a value to a specific tracked element in your own system.</li>
<li><strong><code>id</code></strong> - the id of this particular reading (the per-check record). This changes on every check, so do not use it as a stable key.</li>
</ul>
<pre><code class="language-json">{
  "id": 12345,
  "title": "Product Page - Example.com",
  "page_elements": [
    {
      "id": 998877,
      "element_id": 4012,
      "label": "Price",
      "type": "price",
      "contents": "89.00",
      "original": "Price: $89.00",
      "difference": -10,
      "human_difference": "Price dropped 10%",
      "hash": "ab12cd34",
      "changed": true,
      "short_summary": "Price updated from $99 to $89"
    }
  ]
}</code></pre>
<p>The <code>previous_check.page_elements</code> array (when included) uses the same shape, so you can pair an element's current and prior values by <code>element_id</code>.</p>
<h3>Use Cases</h3>
<ul>
<li><strong>Custom dashboards</strong> - Feed change data into your own monitoring dashboard</li>
<li><strong>Database logging</strong> - Store all detected changes in your own database</li>
<li><strong>Automation workflows</strong> - Trigger actions in tools like n8n, Make, or custom scripts</li>
<li><strong>Alerting systems</strong> - Forward high-priority changes to PagerDuty, Opsgenie, or similar tools</li>
</ul>
<h3>Notes</h3>
<ul>
<li>Webhooks send data as HTTP POST with a JSON body</li>
<li>If you need Slack, Discord, or Teams notifications, use the dedicated integrations instead, as they format messages correctly for those platforms</li>
<li>Webhooks are available on all plans</li>
</ul>
<h3>Related Articles</h3>
<ul>
<li><a href="/developers">Full API Reference</a> - Interactive OpenAPI reference for the complete webhook payload schema and all related API endpoints</li>
<li><a href="/help/features/article/api-webhooks-for-custom-integrations">API and Webhooks for Custom Integrations</a> - Authentication and endpoint overview</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Email Notifications for Website Change Detection]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/email-notifications" />
            <id>https://pagecrawl.io/92</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Email Notifications for Website Change Detection</h1>
<p>Email is the default notification channel in PageCrawl. It is enabled on all plans and requires no additional setup. As soon as you add a page to monitor, you will receive email notifications whenever changes are detected.</p>
<h3>What's Included in Email Notifications</h3>
<p>Every change notification email includes:</p>
<ul>
<li><strong>AI summary</strong> - A plain-language explanation of what changed on the page</li>
<li><strong>Priority score</strong> - An importance score from 0 to 100 so you can quickly assess relevance</li>
<li><strong>Text diff with highlighting</strong> - Changed content is highlighted so you can see exactly what was added, removed, or modified</li>
<li><strong>Keyword matches</strong> - If you have keyword rules configured, matching keywords are highlighted in the notification</li>
</ul>
<h3>Email Attachments</h3>
<p>Email notifications can include several attachments to give you a complete picture of the change:</p>
<ul>
<li><strong>Screenshot</strong> - A full-page screenshot of the page at the time of the change (enabled by default)</li>
<li><strong>Visual diff screenshot</strong> - A side-by-side or overlay comparison showing visual differences</li>
<li><strong>Text diff image</strong> - A rendered image of the text diff for easy sharing</li>
<li><strong>Text file</strong> - A plain text file containing the diff content</li>
</ul>
<p>You can configure which attachments are included at <strong>Settings &gt; Workspace &gt; Notifications</strong>. Email is enabled as a notification channel by default in the page editor.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-notify.png" alt="Notify me via section of the page editor with the Email channel enabled">
</div>
<h3>Additional Recipients</h3>
<p>On paid plans, you can add additional recipients to your change notifications:</p>
<ul>
<li><strong>CC</strong> - Add email addresses to receive a copy of every notification</li>
<li><strong>BCC</strong> - Add email addresses to receive a blind copy</li>
</ul>
<p>This is useful for keeping team members, clients, or stakeholders informed without requiring them to have a PageCrawl account.</p>
<h3>Notification Frequency</h3>
<p>Email notification frequency is configured at <strong>Settings &gt; Workspace &gt; Alerts &amp; Reports</strong>. You have two options:</p>
<ul>
<li><strong>Off</strong> - Email notifications are disabled</li>
<li><strong>Send instant notification for every change</strong> - You receive an email as soon as a change is detected</li>
</ul>
<h4>Scheduled Summary Reports</h4>
<p>If you prefer to receive changes in batches (daily, weekly, or monthly digests), use the <strong>Scheduled Summary Reports</strong> feature. You can find it under the <strong>Digests</strong> tab in the same <strong>Settings &gt; Workspace &gt; Alerts &amp; Reports</strong> area. Scheduled summary reports let you bundle changes across multiple monitors into a single consolidated email delivered on the schedule you choose.</p>
<h3>Diff Display Options</h3>
<p>You can customize how text differences are displayed in your email notifications:</p>
<ul>
<li><strong>Highlight mode</strong> - Choose between highlighting by lines, by words, or both</li>
<li><strong>Content filter</strong> - Show everything, changed content only, added content only, or removed content only</li>
</ul>
<p>These options let you focus on the type of changes that matter most to you.</p>
<h3>Domain-Based Grouping</h3>
<p>When you are monitoring 5 or more pages on the same domain, PageCrawl can group notifications by domain. To enable this, turn on the <strong>Group emails by domain</strong> setting in your workspace notification preferences. Once enabled, changes from the same domain are bundled into a single email, keeping your inbox organized and making it easier to review related changes together.</p>
<h3>AI Feedback</h3>
<p>Each email notification includes feedback links that let you mark a change as <strong>Important</strong> or <strong>Noise</strong>. PageCrawl's AI learns from your feedback and uses it to improve future importance scoring, so over time you receive fewer irrelevant notifications.</p>
<h3>Other Supported Notification Channels</h3>
<p>PageCrawl supports several other notification channels to suit your preferences:</p>
<ul>
<li><a href="/help/notifications/article/send-slack-notification-when-changes-detected">Slack notifications</a></li>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-discord-notifications">Discord notifications</a></li>
<li><a href="/help/integrations/article/send-microsoft-teams-notification-when-changes-detected">Microsoft Teams notifications</a></li>
<li><a href="/help/integrations/article/track-website-changes-integrate-with-telegram-notifications">Telegram notifications</a></li>
<li><a href="/help/integrations/article/webhook-integration">Webhook integration</a></li>
<li><a href="/help/integrations/article/pagecrawl-zapier-integration">Zapier integration</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Notification Conditions and Filters]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/notification-conditions" />
            <id>https://pagecrawl.io/93</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Notification Conditions and Filters</h1>
<p>Conditions and Filters let you control which changes trigger notifications on a per-page basis. Instead of receiving a notification for every detected change, you can define rules so that only meaningful changes are reported.</p>
<p>When adding a page, the simple setup mode offers common conditions directly for price, number, and selected area tracking (such as price thresholds, percentage change alerts, and keyword monitoring). For the full set of conditions described below, click "More options" to switch to Advanced Mode. When editing an existing page, toggle <strong>Advanced Mode</strong> on. In both cases, you will find the <strong>Conditions &amp; Filters</strong> section.</p>
<h3>How to Enable Conditions</h3>
<p>In the page editor, switch to Advanced Mode (click "More options" when adding a new page, or toggle "Advanced Mode" when editing an existing page). Look for the <strong>Conditions &amp; Filters</strong> section with the description: "Looking for specific changes or alerts for certain keywords? Customize conditions to minimize unnecessary change alerts."</p>
<p>Toggle the switch on to enable conditions. Once enabled, you can add one or more conditions by clicking the <strong>Add Condition</strong> button.</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-conditions.png" alt="Conditions & Filters section enabled in the page editor with the Add Condition button">
</div>
<h3>AND / OR Logic</h3>
<p>When you have multiple conditions, you can choose how they are evaluated using the <strong>Match all conditions</strong> toggle:</p>
<ul>
<li><strong>On (AND)</strong> - All conditions must be met for the notification to trigger</li>
<li><strong>Off (OR)</strong> - Any single condition being met will trigger the notification</li>
</ul>
<p>This lets you build precise rules. For example, with AND logic you could require that a specific keyword appeared AND a price dropped below a threshold.</p>
<h3>Always Record Change Detections</h3>
<p>By default, when conditions are not met, the change detection is not recorded and no notification is sent. This means the next check compares against the last version that did meet conditions.</p>
<p>Enable <strong>Always record change detections</strong> to record every change regardless of whether conditions are met, but only send notifications when conditions match. This is particularly useful with one-directional conditions like "Keyword appeared" or "Keyword disappeared", where skipping unmatched detections could cause the condition to never trigger again.</p>
<h3>Most Common Condition</h3>
<h4>Keyword Appeared or Disappeared</h4>
<p>The most commonly used condition. It triggers a notification only when a specific keyword is added to or removed from the page.</p>
<p>Enter one or more keywords (each keyword is a separate tag). The condition is met when any of the specified keywords appear in newly added text or disappear from removed text.</p>
<p><strong>Match mode options</strong> control how keywords are compared against the page text:</p>
<table>
<thead>
<tr>
<th>Match Mode</th>
<th>Case Sensitive</th>
<th>Whole Word</th>
<th>Example: keyword "assist"</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Match any text</strong> (default)</td>
<td>No</td>
<td>No</td>
<td>Matches "assist", "Assist", "assistance", "ASSISTANT"</td>
</tr>
<tr>
<td><strong>Match any text (case sensitive)</strong></td>
<td>Yes</td>
<td>No</td>
<td>Matches "assist", "assistance" but not "Assist"</td>
</tr>
<tr>
<td><strong>Match exact words only</strong></td>
<td>No</td>
<td>Yes</td>
<td>Matches "assist", "ASSIST" but not "assistance"</td>
</tr>
<tr>
<td><strong>Match exact words (case sensitive)</strong></td>
<td>Yes</td>
<td>Yes</td>
<td>Matches only "assist" exactly</td>
</tr>
</tbody>
</table>
<h3>Filters</h3>
<p>Filters remove noise by excluding certain types of changes from triggering notifications.</p>
<h4>Ignore Text</h4>
<p>Exclude specific words, sentences, or patterns from change detection. Place each entry on a separate line. This is useful for text that changes frequently but is not relevant, like timestamps, cookie banners, or dynamic counters.</p>
<p><strong>Supported patterns:</strong></p>
<ul>
<li><strong>Exact text</strong> - Enter the exact text to ignore (e.g., <code>This website uses cookies</code>)</li>
<li><strong>Wildcard (%)</strong> - Use <code>%</code> to match any text within a line. For example, <code>%Published at%</code> will ignore any line containing "Published at", such as "Published at: 2024-12-24 by John"</li>
<li><strong>Regular expressions</strong> - Wrap patterns in forward slashes for regex matching (e.g., <code>/custom-regex-pattern-\d+/</code>). Requires a paid plan.</li>
</ul>
<p>Note: If the ignored text line is replaced with a new line that is not in the filter, the change detection will still trigger.</p>
<h4>Ignore Numbers</h4>
<p>Prevents any numeric changes on the page from triggering change detections. Useful when pages contain counters, view counts, or other dynamic numbers that are not relevant to you.</p>
<h3>Text Conditions</h3>
<p>These conditions let you control notifications based on specific text content. They are available for text-based tracked elements (not visual elements).</p>
<h4>Keyword Appeared</h4>
<p>Triggers when a keyword is added to the page. Unlike "Keyword appeared or disappeared", this will <strong>not</strong> notify you when a keyword is removed.</p>
<p><strong>Important:</strong> If "Always record change detections" is not enabled, using this condition alone can cause missed detections. When the keyword is not found, no change is recorded, so the comparison baseline never updates. We recommend using "Keyword appeared or disappeared" instead, or enabling "Always record change detections".</p>
<h4>Keyword Disappeared</h4>
<p>Triggers when a keyword is removed from the page. The condition compares the current check with the previous one and fires if the keyword was present before but is now gone.</p>
<p>The same warning about "Always record change detections" applies here.</p>
<h4>Exact Match</h4>
<p>Available for individual tracked elements (not full page monitors). The condition is met when the element's text matches the specified value exactly.</p>
<h4>Doesn't Match</h4>
<p>Available for individual tracked elements (not full page monitors). The condition is met when the element's text does not match the specified value exactly.</p>
<h4>Text Exists</h4>
<p>The condition is met when the tracked element's text contains any of the specified keywords. Best used in combination with other conditions, for example: "the page must always contain the text 'Welcome' AND a keyword appeared." If you only need to know when text is added or removed, use "Keyword appeared or disappeared" instead.</p>
<h4>Text Doesn't Exist</h4>
<p>The condition is met when the tracked element's text does not contain any of the specified keywords. Useful for combined conditions like "the page does not contain 'Website failed to load' AND a change was detected." If you only need to know when text is added or removed, use "Keyword appeared or disappeared" instead.</p>
<h3>Number and Price Conditions</h3>
<p>These conditions are only available for "Number" and "Price detect" tracked elements. They allow you to set thresholds and track numeric changes with precision.</p>
<h4>Comparison Conditions</h4>
<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Greater than</strong></td>
<td>Triggers when the number exceeds the specified value</td>
<td>Value is 150, triggers when number &gt; 150</td>
</tr>
<tr>
<td><strong>Greater than or equals</strong></td>
<td>Triggers when the number is at or above the specified value</td>
<td>Value is 150, triggers when number &gt;= 150</td>
</tr>
<tr>
<td><strong>Less than</strong></td>
<td>Triggers when the number drops below the specified value</td>
<td>Value is 50, triggers when number &lt; 50</td>
</tr>
<tr>
<td><strong>Less than or equals</strong></td>
<td>Triggers when the number is at or below the specified value</td>
<td>Value is 50, triggers when number &lt;= 50</td>
</tr>
</tbody>
</table>
<h4>Change-Based Conditions</h4>
<p>These conditions compare the current value against the previous value to detect significant changes.</p>
<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
<th>Example</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Increased or Decreased by at least x percent</strong></td>
<td>Triggers when the number changes in either direction by at least x%.</td>
<td>Value is 10, x is 20%. Triggers when value becomes 12+ or 8 or less.</td>
</tr>
<tr>
<td><strong>Increased or Decreased by at least x</strong></td>
<td>Triggers when the number changes in either direction by at least x (absolute).</td>
<td>Value is 10, x is 5. Triggers when value becomes 15+ or 5 or less.</td>
</tr>
<tr>
<td><strong>Increased by at least x percent</strong></td>
<td>Triggers only when the number goes up by at least x%.</td>
<td>Value is 10, x is 20%. Triggers when value becomes 12 or more.</td>
</tr>
<tr>
<td><strong>Increased by at least x</strong></td>
<td>Triggers only when the number goes up by at least x (absolute).</td>
<td>Value is 10, x is 5. Triggers when value becomes 15 or more.</td>
</tr>
<tr>
<td><strong>Decreased by at least x percent</strong></td>
<td>Triggers only when the number goes down by at least x%.</td>
<td>Value is 10, x is 20%. Triggers when value becomes 8 or less.</td>
</tr>
<tr>
<td><strong>Decreased by at least x</strong></td>
<td>Triggers only when the number goes down by at least x (absolute).</td>
<td>Value is 10, x is 5. Triggers when value becomes 5 or less.</td>
</tr>
</tbody>
</table>
<h3>Feed Conditions</h3>
<p>When a monitor uses the <strong>Feed</strong> tracking mode (RSS, Atom, or other feed formats), additional feed-specific conditions become available:</p>
<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Feed item added</strong></td>
<td>Triggers when a new item appears in the feed</td>
</tr>
<tr>
<td><strong>Feed item removed</strong></td>
<td>Triggers when an item is removed from the feed</td>
</tr>
<tr>
<td><strong>Feed item changed</strong></td>
<td>Triggers when an existing feed item's content is modified</td>
</tr>
<tr>
<td><strong>Feed order changed</strong></td>
<td>Triggers when the order of items in the feed changes</td>
</tr>
<tr>
<td><strong>Feed price changed</strong></td>
<td>Triggers when a price value within a feed item changes</td>
</tr>
</tbody>
</table>
<p>These conditions are only shown when the monitor is configured with a Feed tracking mode.</p>
<h3>Comparison Conditions</h3>
<p>When a monitor belongs to a <strong>product comparison group</strong>, additional comparison conditions become available:</p>
<table>
<thead>
<tr>
<th>Condition</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Cheapest in group</strong></td>
<td>Triggers when this monitor's price becomes the lowest in the comparison group</td>
</tr>
<tr>
<td><strong>Most expensive in group</strong></td>
<td>Triggers when this monitor's price becomes the highest in the comparison group</td>
</tr>
<tr>
<td><strong>Price spread</strong></td>
<td>Triggers based on the price difference between the cheapest and most expensive items in the group</td>
</tr>
</tbody>
</table>
<p>These conditions are only shown when the monitor is part of a product comparison group.</p>
<h3>Practical Examples</h3>
<p><strong>Price drop alert:</strong> Monitor a product price with a "Number" tracked element. Add a "Less than" condition with your target price. You will only be notified when the price falls below your threshold.</p>
<p><strong>Stock availability:</strong> Monitor an "In Stock" label with a "Keyword appeared or disappeared" condition. Set the keyword to "Out of Stock" to get notified the moment availability changes.</p>
<p><strong>Ignore cookie banners:</strong> Add an "Ignore text" filter with entries like <code>This website uses cookies</code> and <code>Accept all cookies</code> to prevent cookie consent changes from triggering notifications.</p>
<p><strong>Significant price changes only:</strong> Use "Increased or Decreased by at least x percent" with a value of 10 to only be notified when a price changes by 10% or more, filtering out minor fluctuations.</p>
<p><strong>Combined conditions:</strong> Monitor a product page with AND logic: "Keyword appeared" for "Sale" combined with "Less than" 100 on the price element. You will only be notified when the product goes on sale AND the price drops below 100.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Web Push Notifications for Instant Website Change Alerts]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/web-push-notifications" />
            <id>https://pagecrawl.io/94</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Web Push Notifications for Instant Website Change Alerts</h1>
<p>Web push notifications deliver instant alerts directly to your browser when PageCrawl detects a change on your monitored pages. No extra apps, no browser extensions, and no webhook configuration needed.</p>
<h3>How It Works</h3>
<p>When a monitored page changes, PageCrawl sends a native browser notification to all your subscribed devices. You'll see the notification even when PageCrawl.io isn't open in your browser.</p>
<p>If AI summarization is enabled for the page, the notification includes a brief summary explaining what changed, so you can decide at a glance whether to investigate.</p>
<h3>Setting Up Push Notifications</h3>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Account Settings</strong></li>
<li>Click <strong>Enable Push Notifications</strong></li>
<li>Accept the browser permission prompt</li>
</ol>
<p>That's it. Notifications start immediately.</p>
<div class="kb-figure">
  <img src="/images/knowledge/account-push.png" alt="Browser Push Notifications section in Account Settings with the enable-on-this-device control and registered devices list">
</div>
<h3>Managing Devices</h3>
<p>You can subscribe on multiple devices (desktop, laptop, phone, tablet). Each device receives notifications independently. To manage your subscribed devices:</p>
<ol>
<li>Go to <strong>Settings</strong> &gt; <strong>Account Settings</strong></li>
<li>View your subscribed devices under <strong>Push Notifications</strong></li>
<li>Remove old devices or send a test notification to verify the setup</li>
</ol>
<h3>Supported Browsers</h3>
<table>
<thead>
<tr>
<th>Browser</th>
<th>Desktop</th>
<th>Mobile</th>
</tr>
</thead>
<tbody>
<tr>
<td>Chrome</td>
<td>Yes</td>
<td>Yes (Android)</td>
</tr>
<tr>
<td>Firefox</td>
<td>Yes</td>
<td>Yes (Android)</td>
</tr>
<tr>
<td>Edge</td>
<td>Yes</td>
<td>-</td>
</tr>
<tr>
<td>Safari 16+</td>
<td>Yes (macOS)</td>
<td>Yes (iOS)</td>
</tr>
</tbody>
</table>
<h3>Push Notifications vs. Other Channels</h3>
<table>
<thead>
<tr>
<th>Channel</th>
<th>Setup</th>
<th>Speed</th>
<th>Best For</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Web Push</strong></td>
<td>None</td>
<td>Instant</td>
<td>Personal monitoring, time-sensitive changes</td>
</tr>
<tr>
<td><strong>Email</strong></td>
<td>None</td>
<td>Minutes</td>
<td>Searchable archive, batch review</td>
</tr>
<tr>
<td><strong>Slack</strong></td>
<td>Webhook URL</td>
<td>Instant</td>
<td>Team collaboration</td>
</tr>
<tr>
<td><strong>Discord</strong></td>
<td>Webhook URL</td>
<td>Instant</td>
<td>Community monitoring</td>
</tr>
<tr>
<td><strong>Teams</strong></td>
<td>Webhook URL</td>
<td>Instant</td>
<td>Enterprise environments</td>
</tr>
<tr>
<td><strong>Telegram</strong></td>
<td>Chat ID</td>
<td>Instant</td>
<td>Mobile-first users</td>
</tr>
</tbody>
</table>
<h3>Combining Channels</h3>
<p>You can use push notifications alongside other channels. A common setup:</p>
<ul>
<li><strong>Push</strong> for urgent, time-sensitive alerts (price drops, restocks)</li>
<li><strong>Email</strong> for a searchable archive of all changes</li>
<li><strong>Slack/Teams</strong> for changes that need team discussion</li>
</ul>
<p>Configure different notification channels per page in the page settings.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Compare Product Prices Across Multiple Retailers]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/product-comparison" />
            <id>https://pagecrawl.io/95</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Compare Product Prices Across Multiple Retailers</h1>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Product Comparison is available as a team-level add-on. Contact support or your account manager to enable it for your account.
</div>
<p>PageCrawl can automatically group monitors that track the same product on different websites, giving you a real-time view of how prices compare across retailers. When the competitive landscape shifts, you can get alerts and export comparison spreadsheets.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-what-to-track.png" alt="What to Track panel with the Price tracking type used on each retailer's product page before grouping them">
</div>
<h3>What You Can Do</h3>
<table>
<thead>
<tr>
<th>Capability</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Side-by-side pricing</strong></td>
<td>See all retailer prices for a product in one place via the Matched Pages panel</td>
</tr>
<tr>
<td><strong>Comparison alerts</strong></td>
<td>Get notified when a price becomes the cheapest, most expensive, or when the spread exceeds a threshold</td>
</tr>
<tr>
<td><strong>Cross-retailer export</strong></td>
<td>Download a spreadsheet with one row per product and columns per retailer</td>
</tr>
<tr>
<td><strong>Smart suggestions</strong></td>
<td>When linking monitors, PageCrawl suggests the most relevant candidates</td>
</tr>
<tr>
<td><strong>Automatic grouping</strong></td>
<td>Monitors are grouped automatically when product identifiers match</td>
</tr>
<tr>
<td><strong>Reference labels</strong></td>
<td>Manually group monitors using labels with a shared prefix</td>
</tr>
<tr>
<td><strong>Google Sheets integration</strong></td>
<td>Include comparison data and label-based columns in automated Google Sheets exports</td>
</tr>
</tbody>
</table>
<h3>How Products Are Grouped</h3>
<p>PageCrawl uses multiple signals to determine whether two monitors on different websites track the same product. When a match is found, the monitors are placed into a comparison group automatically.</p>
<p>Matching happens after each page check and when labels are updated. If the same product is listed on five different retailer websites and each monitor is set up with price tracking, PageCrawl will link all five into a single group.</p>
<p>You can also group monitors manually from the comparison panel on any monitor's detail page, or by applying reference labels (covered below).</p>
<p>Each comparison group can contain up to 20 monitors.</p>
<h3>The Matched Pages Panel</h3>
<p>When a monitor belongs to a comparison group, its detail page shows a <strong>Matched Pages</strong> panel. This panel displays:</p>
<ul>
<li>The name and domain of each grouped monitor</li>
<li>The current tracked value (typically a price) for each</li>
<li>Quick navigation links to each compared monitor</li>
</ul>
<p>From this panel you can:</p>
<ol>
<li><strong>Add monitors</strong> - Search for other monitors to add to the group</li>
<li><strong>Remove monitors</strong> - Detach a specific monitor from the group</li>
<li><strong>View suggestions</strong> - See PageCrawl's recommended matches based on product signals</li>
</ol>
<h3>Smart Suggestions</h3>
<p>When adding monitors to a comparison group, PageCrawl ranks candidates by relevance. Suggestions consider multiple factors including product identifiers, reference labels, folder grouping, domain similarity, and name overlap.</p>
<p>If the product comparison feature is enabled, suggestions are enhanced with stronger signals from product identifiers and reference labels. Without the feature enabled, suggestions still work but rely on name and structural similarity only.</p>
<p>You can also type in the search box to filter across all monitors in your workspace.</p>
<h3>Comparison Alerts</h3>
<p>Comparison alerts notify you when a monitor's price changes its competitive position within the group. There are three alert types:</p>
<table>
<thead>
<tr>
<th>Alert Type</th>
<th>When It Fires</th>
<th>Configuration</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Cheapest</strong></td>
<td>This monitor's price is the lowest in the group</td>
<td>No additional configuration needed</td>
</tr>
<tr>
<td><strong>Most Expensive</strong></td>
<td>This monitor's price is the highest in the group</td>
<td>No additional configuration needed</td>
</tr>
<tr>
<td><strong>Price Spread</strong></td>
<td>The gap between the lowest and highest price in the group exceeds a percentage</td>
<td>Set the spread threshold percentage</td>
</tr>
</tbody>
</table>
<h4>How Alerts Work</h4>
<p>Alerts are <strong>transition-based</strong>. You receive a notification when the state changes (e.g., a monitor becomes the cheapest), but not on every subsequent check where it remains the cheapest. When the condition clears, the alert resets and can fire again later.</p>
<p>For example, if Monitor A is tracking a laptop at $999 and becomes the cheapest in a group of five retailers:</p>
<ol>
<li>You receive a notification: "Laptop X is now the cheapest at $999 (range: $999 - $1,299)"</li>
<li>On subsequent checks, as long as Monitor A remains the cheapest, no new notification is sent</li>
<li>If another retailer drops to $949, Monitor A is no longer the cheapest and the alert clears</li>
<li>If Monitor A drops to $929 and becomes cheapest again, you receive a new notification</li>
</ol>
<p>Price Spread alerts work similarly. If you set a 20% threshold and the spread increases from 15% to 25%, you receive a notification. The alert clears when the spread drops below 20%.</p>
<h4>Setting Up Comparison Alerts</h4>
<ol>
<li>Open the monitor's settings (edit page)</li>
<li>Scroll to <strong>Comparison Alerts</strong></li>
<li>Add a new rule and select one of the comparison alert types</li>
<li>For Price Spread, enter the percentage threshold (e.g., 25 for a 25% spread)</li>
<li>Save your changes</li>
</ol>
<p>Comparison alerts are evaluated after every page check, using the most recent values from all group members. Alerts are delivered through your configured notification channels (email, Slack, Discord, Teams, Telegram, webhooks).</p>
<h3>Cross-Retailer Export</h3>
<p>Export a comparison spreadsheet to analyze all your grouped products and their prices in a single file.</p>
<h4>How to Export</h4>
<ol>
<li>Select the pages you want to include from your page list</li>
<li>Click <strong>Export</strong> from the bulk actions toolbar</li>
<li>Choose <strong>Comparison</strong> as the export type</li>
<li>Download the XLSX spreadsheet</li>
</ol>
<h4>What the Export Contains</h4>
<table>
<thead>
<tr>
<th>Column</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Product</strong></td>
<td>Product name from page metadata, or monitor name as fallback</td>
</tr>
<tr>
<td><strong>GTIN</strong></td>
<td>Global Trade Item Number if detected</td>
</tr>
<tr>
<td><strong>SKU</strong></td>
<td>Stock Keeping Unit if detected</td>
</tr>
<tr>
<td><strong>Brand</strong></td>
<td>Product brand if detected</td>
</tr>
<tr>
<td><strong>[retailer domain]</strong></td>
<td>One column per unique retailer domain, containing the current tracked value</td>
</tr>
</tbody>
</table>
<p>Each row represents one comparison group. If a group has members on amazon.com, bestbuy.com, and walmart.com, the spreadsheet will have three retailer columns.</p>
<p>If the same retailer domain appears more than once in a group (e.g., two product variants on the same site), the column headers are disambiguated with the monitor name.</p>
<p>Only monitors that belong to a comparison group are included in the export. Ungrouped monitors are excluded.</p>
<h3>Reference Labels</h3>
<p>Reference labels provide a way to manually group monitors using a label prefix. This is useful when automatic matching is not sufficient, or when you want to define your own product identifiers.</p>
<h4>How Reference Labels Work</h4>
<p>Apply a label with a specific prefix to monitors that track the same product. For example:</p>
<table>
<thead>
<tr>
<th>Monitor</th>
<th>Label</th>
</tr>
</thead>
<tbody>
<tr>
<td>Laptop X on Amazon</td>
<td><code>ref:LAPTOP-X-2024</code></td>
</tr>
<tr>
<td>Laptop X on Best Buy</td>
<td><code>ref:LAPTOP-X-2024</code></td>
</tr>
<tr>
<td>Laptop X on Walmart</td>
<td><code>ref:LAPTOP-X-2024</code></td>
</tr>
</tbody>
</table>
<p>All three monitors share the label <code>ref:LAPTOP-X-2024</code>, so PageCrawl groups them together.</p>
<p>The default prefix is <code>ref</code>, but you can change it in your workspace settings.</p>
<h4>Applying Reference Labels</h4>
<p>You can apply reference labels in several ways:</p>
<ul>
<li><strong>Single page</strong>: Edit the page and add a label in the format <code>prefix:value</code></li>
<li><strong>Bulk edit</strong>: Select multiple pages, click <strong>Bulk Edit</strong>, and apply the label to all at once</li>
<li><strong>API</strong>: Use the tag management API to programmatically assign labels</li>
</ul>
<p>When a reference label is added or changed, PageCrawl automatically re-evaluates comparison groups.</p>
<h3>Tag Prefix Columns</h3>
<p>Tag prefix columns turn label prefixes into structured data columns available in exports and Google Sheets integrations.</p>
<h4>Configuration</h4>
<ol>
<li>Go to <strong>Settings &gt; Workspace &gt; Tag Prefix Columns</strong></li>
<li>Add the prefixes you want as columns (e.g., <code>sku</code>, <code>brand</code>, <code>ref</code>)</li>
<li>Optionally change the <strong>Comparison Prefix</strong> (the prefix used for product grouping)</li>
<li>Save</li>
</ol>
<table>
<thead>
<tr>
<th>Setting</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Prefix Columns</strong></td>
<td>List of prefixes to expose as export/Google Sheets columns (max 10)</td>
</tr>
<tr>
<td><strong>Comparison Prefix</strong></td>
<td>The prefix used for product comparison grouping (default: <code>ref</code>)</td>
</tr>
</tbody>
</table>
<h4>Using Tag Prefix Columns in Exports</h4>
<p>Once configured, tag prefix columns appear as available columns in your Excel and Google Sheets export settings alongside the built-in columns (name, URL, current value, etc.).</p>
<p>For example, if you configure prefixes <code>sku</code> and <code>brand</code>:</p>
<ul>
<li>A monitor with labels <code>sku:WGT-500</code> and <code>brand:Acme</code> will show <code>WGT-500</code> in the SKU column and <code>Acme</code> in the Brand column</li>
<li>Columns appear as <code>tag_sku</code> and <code>tag_brand</code> in column configuration</li>
</ul>
<h4>Changing the Comparison Prefix</h4>
<p>When you change the comparison prefix (e.g., from <code>ref</code> to <code>group</code>), PageCrawl automatically re-evaluates groups for monitors that have labels with the new prefix. Existing groups built from product identifiers are not affected.</p>
<p>Note: Prefix names must be lowercase alphanumeric characters or underscores, with a maximum length of 50 characters.</p>
<h3>Discovered Pages and Product Matching</h3>
<p>When <a href="/help/features/article/page-discovery">Page Discovery</a> finds new pages and product comparison is enabled, PageCrawl checks whether the discovered page matches an existing monitored product. If a match is found, the discovered page shows the matched product's name and domain, helping you decide whether to add it to monitoring.</p>
<p>This is particularly useful for automatically finding the same product on newly discovered retailer pages.</p>
<h3>Best Practices</h3>
<h4>Start with Price Tracking</h4>
<p>Product comparison works best with monitors using <strong>price</strong> or <strong>number</strong> tracking modes, since these produce numeric values that can be compared. Full-page text monitors will appear in groups but cannot generate comparison alerts.</p>
<h4>Use Consistent Reference Labels</h4>
<p>If you manage a large catalog, establish a naming convention for reference labels. Using the same internal product ID across all retailers (e.g., <code>ref:INTERNAL-SKU-001</code>) ensures consistent grouping.</p>
<h4>Combine Automatic and Manual Grouping</h4>
<p>Let automatic matching handle the initial grouping, then review and adjust using reference labels for any products that were not matched correctly. Automatic and manual matching work together and complement each other.</p>
<h4>Set Up Alerts Selectively</h4>
<p>Rather than adding comparison alerts to every monitor, focus on the products where competitive pricing matters most. This keeps your notifications actionable and avoids alert fatigue.</p>
<h4>Use Cross-Retailer Export for Reporting</h4>
<p>Schedule regular exports to track pricing trends over time. Combined with Google Sheets integration, you can build dashboards that update automatically.</p>
<h3>Limits</h3>
<table>
<thead>
<tr>
<th>Limit</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Max group size</strong></td>
<td>20 monitors per comparison group</td>
</tr>
<tr>
<td><strong>Max prefix columns</strong></td>
<td>10 per workspace</td>
</tr>
<tr>
<td><strong>Prefix name length</strong></td>
<td>50 characters</td>
</tr>
</tbody>
</table>
<h3>Requirements</h3>
<p>Product comparison is available as a team-level add-on. Contact support or your account manager to enable it for your account.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/bulk-edit-pages">Bulk Edit</a> - Export and manage multiple pages at once</li>
<li><a href="/help/features/article/organized-page-monitoring">Labels, Folders &amp; Workspaces</a> - Organize your monitored pages</li>
<li><a href="/help/features/article/page-discovery">Page Discovery</a> - Automatically discover new pages to track</li>
<li><a href="/help/features/article/ai-powered-change-detection">AI Change Detection</a> - AI-powered summaries and importance scoring</li>
<li><a href="/help/features/article/advanced-configuration">Advanced Configuration</a> - Templates, tracked elements, and power user settings</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Premium Residential Proxies]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/residential-proxies" />
            <id>https://pagecrawl.io/96</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Premium Residential Proxies</h1>
<p>Premium residential proxies let you monitor websites that block standard datacenter IP addresses. They use real residential internet connections from 200+ countries, making your monitoring checks appear as regular user traffic. Select a proxy location, including residential options, from the <strong>Location</strong> dropdown in the page editor.</p>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/residential-proxy-countries.png" alt="Proxy location set to Residential Proxy, with the country selector open showing Any country plus United States, United Kingdom, Germany, France, Japan, and Canada">
</div>
<h3>When Do You Need Residential Proxies?</h3>
<p>Most websites work fine with the datacenter proxies already included in every PageCrawl plan. You only need residential proxies if:</p>
<ul>
<li>A website actively blocks datacenter IPs (you see 403 errors, timeouts, or blank pages after retries)</li>
<li>You need to see content as it appears in a specific country, state, or city</li>
<li>The site uses advanced bot detection that datacenter proxies cannot bypass</li>
</ul>
<p><strong>Before purchasing, try these free alternatives:</strong></p>
<ol>
<li><strong>Enable Stealth engine</strong> in your monitor settings. Stealth mode is an alternative engine that handles many bot-protected sites and works on most protected websites</li>
<li><strong>Reduce your check frequency</strong>. Many blocks are triggered by frequent requests. Switching from every 15 minutes to hourly or daily often resolves the issue</li>
<li><strong>Switch proxy location</strong> in your monitor settings (e.g., try London instead of New York)</li>
<li><a href="https://pagecrawl.io/contact-us">Contact support</a> for help diagnosing the issue</li>
</ol>
<h3>How Residential Proxy Bandwidth Works</h3>
<p>Residential proxies are priced at <strong>$10/GB</strong> of data transferred. Every page check consumes bandwidth based on the page size:</p>
<table>
<thead>
<tr>
<th>Page Type</th>
<th>Approximate Size Per Check</th>
</tr>
</thead>
<tbody>
<tr>
<td>Simple text page (blog, news article)</td>
<td>~0.5 MB</td>
</tr>
<tr>
<td>Standard e-commerce or listing page</td>
<td>~2 MB</td>
</tr>
<tr>
<td>Heavy page with images and scripts</td>
<td>~5 MB</td>
</tr>
</tbody>
</table>
<p><strong>Bandwidth never expires.</strong> You can purchase 1 GB today and use it over months.</p>
<h3>Cost Impact of Check Frequency</h3>
<p>Check frequency has a large impact on bandwidth consumption. The same 10 pages can cost very different amounts depending on how often you check:</p>
<table>
<thead>
<tr>
<th>Frequency</th>
<th>10 Pages Monthly Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>Daily</td>
<td>~$10 (0.6 GB)</td>
</tr>
<tr>
<td>Hourly</td>
<td>~$150 (14.4 GB)</td>
</tr>
<tr>
<td>Every 15 minutes</td>
<td>~$570 (57.6 GB)</td>
</tr>
</tbody>
</table>
<p>For most monitoring use cases, daily or hourly checks are sufficient. Only use high-frequency residential proxy checks when near real-time monitoring is essential.</p>
<h3>How to Set Up</h3>
<ol>
<li>Go to <strong>Settings &gt; Residential Proxies</strong> in your account</li>
<li>Purchase bandwidth (minimum 1 GB)</li>
<li>Open any monitor and change the <strong>Proxy Location</strong> to <strong>Residential Proxy</strong></li>
<li>Select a target country for geo-targeted monitoring</li>
<li>Save and trigger a check to verify it works</li>
</ol>
<h3>Geo-Targeting</h3>
<p>When using residential proxies, you must select a target country from 200+ supported countries. This is useful for monitoring localized pricing, regional content, or geo-restricted pages.</p>
<h3>Monitoring Your Usage</h3>
<ul>
<li>View your bandwidth balance and daily usage in <strong>Settings &gt; Residential Proxies</strong></li>
<li>Usage statistics update every 15 minutes</li>
<li>When your bandwidth reaches zero, monitors using residential proxies automatically fall back to datacenter proxies (your monitoring does not stop)</li>
</ul>
<h3>Availability</h3>
<p>Premium residential proxy bandwidth is available on <strong>Enterprise</strong> and <strong>Ultimate</strong> plans. <a href="https://pagecrawl.io/contact-us">Contact us</a> if you have questions about upgrading.</p>
<h3>Related</h3>
<ul>
<li><a href="/help/features/article/custom-proxies">Using Custom Proxies</a> for using your own proxy servers</li>
<li><a href="/residential-proxies">Cost Calculator</a> for estimating your monthly bandwidth needs</li>
<li><a href="/help/features/article/bulk-edit-pages">Bulk Edit</a> for applying proxy settings to multiple pages</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Feed Tracking Mode: Structured Monitoring for RSS, Atom, and Sitemaps]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/feed-tracking-mode" />
            <id>https://pagecrawl.io/97</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Feed Tracking Mode: Structured Monitoring for RSS, Atom, and Sitemaps</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/feed-tracking-featured.png" alt="Feed Tracking Mode: structured monitoring for RSS, Atom, and sitemaps">
</div>
<p>Feed tracking mode treats an RSS feed, Atom feed, or XML sitemap as a list of individual items rather than a single blob of text. Instead of "the page changed", you get "2 new posts added: [titles and links]". This matches how you actually want to consume a feed: item by item.</p>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-feed.png" alt="Feed tracking example: a blog feed on the left, and PageCrawl listing the newly added feed items on the right">
</div>
<p><strong>Looking to publish your monitored page changes as an RSS feed instead?</strong> See <a href="/help/features/article/page-monitoring-rss-feeds">Monitor Page Changes via RSS Feeds</a>, which generates a feed URL of detected changes you can plug into any RSS reader or automation tool.</p>
<h3>When to Use Feed Tracking Mode</h3>
<p>Pick Feed mode when the URL you are monitoring is a structured list that updates over time:</p>
<ul>
<li><strong>RSS and Atom feeds</strong> (<code>/feed</code>, <code>/rss.xml</code>, <code>/atom.xml</code>, <code>/feeds/posts/default</code>, <code>/index.xml</code>)</li>
<li><strong>XML sitemaps</strong> (<code>/sitemap.xml</code>, <code>/sitemap_index.xml</code>)</li>
<li><strong>GitHub release and commit Atom feeds</strong> (<code>github.com/owner/repo/releases.atom</code>)</li>
<li><strong>Reddit subreddit feeds</strong> (<code>reddit.com/r/subreddit/.rss</code>)</li>
<li><strong>Podcast feeds</strong></li>
<li><strong>Inventory grids and card-based HTML pages</strong> (detected via DOM pattern matching)</li>
</ul>
<p>PageCrawl auto-detects the feed format when you paste the URL and switches to Feed mode automatically. You can also pick it manually from the tracking mode selector.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-what-to-track.png" alt="What to Track panel with the Feed option among the tracking mode tabs">
</div>
<h3>What You Get With Feed Mode</h3>
<p>Compared to Full Page text tracking, Feed mode gives you:</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>Full Page Text</th>
<th>Feed Mode</th>
</tr>
</thead>
<tbody>
<tr>
<td>Compares raw content</td>
<td>Yes</td>
<td>No (parses items)</td>
</tr>
<tr>
<td>Reports which items changed</td>
<td>No</td>
<td>Yes, with titles and links</td>
</tr>
<tr>
<td>Ignores reordering</td>
<td>No (false alerts)</td>
<td>Yes</td>
</tr>
<tr>
<td>Deduplicates by stable key</td>
<td>No</td>
<td>Yes (guid, id, link)</td>
</tr>
<tr>
<td>Caps item count</td>
<td>No</td>
<td>Yes (configurable limit)</td>
</tr>
<tr>
<td>Runs without a browser</td>
<td>Only if page is plain text</td>
<td>Yes, for XML feeds</td>
</tr>
<tr>
<td>Handles "No exact matches" fallbacks</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>
<p>The end result: fewer false alerts, clearer notifications, and lower monitoring cost per check.</p>
<h3>Supported Formats</h3>
<p>Feed tracking mode parses:</p>
<ul>
<li><strong>RSS 2.0</strong> including <code>&lt;guid&gt;</code>, <code>&lt;enclosure&gt;</code>, <code>&lt;media:content&gt;</code>, and <code>&lt;content:encoded&gt;</code></li>
<li><strong>RSS 1.0 / RDF</strong> including <code>rdf:about</code> identifiers</li>
<li><strong>Atom 1.0</strong> including <code>&lt;link rel="alternate"&gt;</code> and <code>&lt;media:thumbnail&gt;</code></li>
<li><strong>XML Sitemap</strong> (<code>&lt;urlset&gt;</code>) and sitemap index (<code>&lt;sitemapindex&gt;</code>)</li>
<li><strong>Generic repeating XML</strong> when an XML file has a list-like structure</li>
</ul>
<p>For HTML pages like product grids, inventory lists, or news listings, Feed mode falls back to DOM pattern detection, which identifies repeated card-like elements on the page and tracks them as items.</p>
<h3>How Detection Works</h3>
<p>When you paste a URL into Track New Page, PageCrawl performs a content-based check:</p>
<ol>
<li>Fetches the URL</li>
<li>Looks at the content type and first few bytes of the body</li>
<li>If it looks like XML, parses it with a namespace-aware XML parser</li>
<li>Identifies the feed format (RSS / Atom / Sitemap / etc.) by root element</li>
<li>Returns the detected format to the interface, which auto-switches to Feed mode</li>
</ol>
<p>If the detection cannot classify the URL as an XML feed, the tracking mode stays at Full Page and you can switch to Feed manually if you want to use DOM pattern detection on an HTML page.</p>
<h3>Item Limit</h3>
<p>Every feed tracking element has a <strong>Track first N items</strong> cap. The default is 10 for new monitors. You can raise it up to your plan's maximum.</p>
<p>The limit exists for three reasons:</p>
<ol>
<li><strong>Avoid noise from variable-count pages.</strong> Some pages show a different number of items between checks (inventory pages, infinite-scroll feeds). Capping at a fixed count prevents fluctuations from triggering false change alerts.</li>
<li><strong>Keep storage manageable.</strong> A sitemap with 50,000 URLs would create a 50,000-item JSON blob per check. The cap prevents this.</li>
<li><strong>Focus on fresh content.</strong> Most of the time you care about the newest items. Tracking the first 10-20 entries is almost always enough.</li>
</ol>
<h3>How "First N" Is Decided</h3>
<p>For RSS and Atom feeds, "first N" means the first N items in document order, which is the convention these formats use to put the newest items at the top. Reading position 0 through N-1 gives you the N most recent posts.</p>
<p>XML sitemaps are different. There is no convention requiring sitemaps to list new URLs first. New pages can appear anywhere in the file, including appended at the bottom. To handle this, PageCrawl sorts sitemap entries by their <code>&lt;lastmod&gt;</code> date (newest first) before applying the cap, so the most recently modified URLs always win.</p>
<p>For sitemaps that do not include <code>&lt;lastmod&gt;</code> on every URL, the dated entries are sorted first and the dateless entries fall to the bottom of the sort in their original document order. If you need to track every page on a very large sitemap regardless of modification date, use <a href="/help/features/article/page-discovery">Page Discovery</a> instead - it auto-monitors new pages as they appear without depending on the position-based cap.</p>
<table>
<thead>
<tr>
<th>Plan</th>
<th>Maximum Items Per Feed</th>
</tr>
</thead>
<tbody>
<tr>
<td>Free</td>
<td>10</td>
</tr>
<tr>
<td>Standard</td>
<td>100</td>
</tr>
<tr>
<td>Enterprise</td>
<td>1,000</td>
</tr>
<tr>
<td>Ultimate</td>
<td>10,000</td>
</tr>
</tbody>
</table>
<p>The default is 10 across all plans. You can raise it from the tracking mode panel any time after the monitor is created.</p>
<h3>What Triggers a Change Alert</h3>
<p>By default, Feed mode notifies you when items are <strong>added</strong> to the feed. You can also opt into:</p>
<ul>
<li><strong>Items removed</strong> – something disappeared from the feed</li>
<li><strong>Content changed</strong> – an item's title or body was edited after publication</li>
<li><strong>Price changed</strong> – an item's price updated (for product feeds)</li>
<li><strong>Order changed</strong> – items were reordered (off by default since most feeds reorder as new items arrive)</li>
</ul>
<p>Each item is identified by a stable key in this order: GUID → link → title. That means content changes on the same item are correctly recognized as updates, not as a new item.</p>
<h3>Monitoring Frequency</h3>
<p>Feed mode runs via a lightweight HTTP fetch without a browser, so you can check feeds frequently without burning through plan limits:</p>
<table>
<thead>
<tr>
<th>Feed Type</th>
<th>Recommended Frequency</th>
</tr>
</thead>
<tbody>
<tr>
<td>Security advisories</td>
<td>Every 15 minutes</td>
</tr>
<tr>
<td>News and competitor blogs</td>
<td>Every 30 to 60 minutes</td>
</tr>
<tr>
<td>GitHub release feeds</td>
<td>Every 1 to 2 hours</td>
</tr>
<tr>
<td>Podcast feeds</td>
<td>Every 6 to 12 hours</td>
</tr>
<tr>
<td>Sitemaps for large sites</td>
<td>Every 1 to 4 hours</td>
</tr>
<tr>
<td>Low-volume blogs</td>
<td>Daily</td>
</tr>
</tbody>
</table>
<p>Note: if you raise the frequency below 30 minutes on a browser-only feed (an HTML inventory page rather than an XML feed), PageCrawl will use the browser engine for reliability.</p>
<h3>Common Examples</h3>
<p><strong>GitHub release feed:</strong></p>
<pre><code>https://github.com/owner/repo/releases.atom</code></pre>
<p><strong>WordPress blog:</strong></p>
<pre><code>https://example.com/feed/</code></pre>
<p><strong>Reddit subreddit:</strong></p>
<pre><code>https://www.reddit.com/r/webdev/.rss</code></pre>
<p><strong>Site sitemap:</strong></p>
<pre><code>https://example.com/sitemap.xml</code></pre>
<p>For each of these, paste the URL into Track New Page. PageCrawl detects the format, switches to Feed mode, and shows the first 10 items as a preview before you save.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/blog/monitor-rss-feeds">Monitor RSS feeds and get alerts for new content</a> – broader guide comparing RSS monitoring approaches</li>
<li><a href="/help/features/article/page-monitoring-rss-feeds">Publish change alerts as an RSS feed</a> – the inverse: generate a feed URL of detected changes for RSS readers and automation tools</li>
<li><a href="/help/features/article/sitemap-monitoring">Sitemap monitoring</a> – automatically discover new pages across a website</li>
<li><a href="/help/features/article/api-webhooks-for-custom-integrations">Webhook integrations</a> – route feed alerts to Slack, Discord, or custom automations</li>
<li><a href="/help/reduce-false-positives/article/reduce-false-positives-monitoring-website-for-changes">Reduce false positives</a> – tune your monitors for cleaner alerts</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Thumbs Up and Thumbs Down: Giving Feedback on Detected Changes]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/thumbs-up-thumbs-down-feedback" />
            <id>https://pagecrawl.io/98</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Thumbs Up and Thumbs Down: Giving Feedback on Detected Changes</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/feedback-featured.png" alt="A detected change with thumbs up and thumbs down buttons: thumbs up marks it important, thumbs down marks it as noise">
</div>
<p>Every time PageCrawl detects a change, you can give quick feedback with the thumbs up and thumbs down buttons. This feedback helps you organize your review workflow and, over time, helps PageCrawl show you more of the changes that matter and fewer that don't.</p>
<h3>Where to Find the Buttons</h3>
<p>The feedback buttons appear in several places:</p>
<ul>
<li><strong>Page view</strong>, next to each detected change in the timeline</li>
<li><strong>Review Board</strong>, when opening a change card</li>
<li><strong>Email notifications</strong>, as quick-action buttons at the bottom of each change email</li>
<li><strong>Slack, Discord, Microsoft Teams, and Telegram notifications</strong>, as inline action buttons next to each detected change</li>
<li><strong>Browser extension</strong>, when reviewing changes on the go</li>
</ul>
<div class="kb-figure">
  <img src="/images/knowledge/feedback-thumbs.png" alt="A detected change in the page timeline with the thumbs up and thumbs down feedback buttons highlighted on the right">
</div>
<p>You can give feedback directly from any of the notification channels above, no login required. You are taken to a short confirmation page that records the feedback, then returned to the change (or a simple confirmation screen if you are not signed in).</p>
<h3>What Happens When You Press Thumbs Up</h3>
<p>Pressing thumbs up flags the change as <strong>important</strong> or useful. This tells PageCrawl:</p>
<ul>
<li>The change is the kind of update you want to keep being notified about</li>
<li>The change has been reviewed, so it is marked as seen automatically</li>
</ul>
<p>If your workspace has <strong>feedback auto-review</strong> enabled, the change card also moves from "To Review" to your chosen destination lane on the Review Board (for example, a "Reviewed" or "Important" lane). You can configure which lane thumbs-up feedback moves cards to from the Review Board settings.</p>
<h3>What Happens When You Press Thumbs Down</h3>
<p>Pressing thumbs down flags the change as <strong>noise</strong> or irrelevant. This does several things:</p>
<ol>
<li><strong>The change is marked as seen</strong> so it no longer counts as unread</li>
<li><strong>Your monitor gets quieter over time</strong> as PageCrawl uses your feedback to show you fewer low-value alerts like this one</li>
<li><strong>You may be offered a one-tap option</strong> to stop similar alerts from this page. You can accept it or dismiss it</li>
<li><strong>The card moves to your configured "noise" lane</strong> on the Review Board, if feedback auto-review is enabled</li>
</ol>
<p>If a suggested filter could also hide a change you might actually care about, PageCrawl warns you first. Read any such warning before confirming.</p>
<h3>When Should You Press Thumbs Up?</h3>
<p>Press thumbs up when:</p>
<ul>
<li>The detected change is exactly the kind of update you set up this monitor for</li>
<li>You want to confirm that a pricing, availability, or content change was correctly caught</li>
<li>You want to keep a record of meaningful changes in your "Important" or "Reviewed" lane</li>
<li>You want PageCrawl to keep showing you this kind of update</li>
</ul>
<p>Examples:</p>
<ul>
<li>A competitor dropped their price from $49 to $39</li>
<li>A job listing you were tracking has been posted</li>
<li>A terms-of-service page added a new clause</li>
<li>A product page switched from "Out of stock" to "In stock"</li>
</ul>
<h3>When Should You Press Thumbs Down?</h3>
<p>Press thumbs down when:</p>
<ul>
<li>The change is not relevant to your monitoring goal</li>
<li>The detected text is noise, like a timestamp, view counter, random tagline, or rotating banner</li>
<li>The same type of irrelevant change keeps triggering alerts</li>
<li>You want this monitor to get quieter and stop alerting on changes like this</li>
</ul>
<p>Examples:</p>
<ul>
<li>The page says "Last updated 3 minutes ago" and that timestamp keeps changing</li>
<li>A "Users online: 1,234" counter triggered the alert</li>
<li>A rotating testimonial or hero image caption changed</li>
<li>A footer copyright year was updated</li>
<li>A "Trending now" section showed a different product</li>
</ul>
<p>Press thumbs down even if the change is minor. Over time, consistent feedback makes your monitors much quieter and more precise.</p>
<h3>When Should You Not Press Either?</h3>
<p>If a change is neutral (neither clearly useful nor clearly noise), you can leave it without feedback and simply mark it as reviewed. Feedback is not mandatory. Only use it when you have a clear opinion, because consistent signals produce better filtering than mixed ones.</p>
<h3>Clearing Feedback</h3>
<p>If you change your mind, reopen the change and press the same button again to clear the flag, or press the opposite button to overwrite the previous feedback. Clearing feedback does not automatically remove any filter you accepted earlier. Those filters can be reviewed or removed separately in the page's settings.</p>
<h3>Tips for Better Results</h3>
<ul>
<li><strong>Be consistent.</strong> The more feedback you give, the better PageCrawl gets at matching what you care about.</li>
<li><strong>Accept a suggested filter when it looks right.</strong> It can stop most repeat false positives on a page in a single tap.</li>
<li><strong>Configure auto-review lanes</strong> on the Review Board so feedback also organizes your workflow, not just your alerts.</li>
<li><strong>Use feedback from notification channels</strong> (email, Slack, Discord, Teams, Telegram) when you are away from the app. They work with no login required.</li>
<li><strong>Review your filters periodically.</strong> Anything you accepted can be edited or removed at any time in the page's settings.</li>
</ul>
<h3>Related</h3>
<ul>
<li><a href="/help/features/article/review-board">Review Board</a> for organizing changes into lanes based on feedback</li>
<li><a href="/help/reduce-false-positives/article/reduce-false-positives-monitoring-website-for-changes">Reducing False Positives</a> for a complete guide to quieter monitors</li>
<li><a href="/help/features/article/ai-powered-change-detection">AI-Powered Change Detection</a> for how AI helps prioritize the changes that matter</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[API Quick Start: Monitor Your First Page in 60 Seconds]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/api-developer-quickstart" />
            <id>https://pagecrawl.io/99</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>API Quick Start: Monitor Your First Page in 60 Seconds</h1>
<p>This guide walks you through creating your first monitor and webhook using the PageCrawl API. By the end, you will have a page being monitored with change notifications delivered to your endpoint.</p>
<p><em>API access requires a paid plan (Standard or above). Get your API token from Settings &gt; API.</em></p>
<h3>Step 1: Get Your API Token</h3>
<p>Go to <strong>Settings &gt; API &gt; API Tokens</strong> and click <strong>Create Token</strong>. Copy the token immediately (it will not be shown again).</p>
<p>All API requests use this token in the <code>Authorization</code> header:</p>
<pre><code>Authorization: Bearer YOUR_API_TOKEN</code></pre>
<p>For the full API reference with all endpoints and schemas, visit <a href="/developers">pagecrawl.io/developers</a>.</p>
<h3>Step 2: Create a Monitor</h3>
<p>The simplest way to start monitoring is the <code>/api/track-simple</code> endpoint. It only requires a URL.</p>
<h4>curl</h4>
<pre><code class="language-bash">curl -X POST "https://pagecrawl.io/api/track-simple" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/pricing",
    "tracking_mode": "fullpage",
    "frequency": 60
  }'</code></pre>
<h4>Python</h4>
<pre><code class="language-python">import requests

response = requests.post(
    "https://pagecrawl.io/api/track-simple",
    headers={"Authorization": "Bearer YOUR_API_TOKEN"},
    json={
        "url": "https://example.com/pricing",
        "tracking_mode": "fullpage",
        "frequency": 60,
    },
)

page = response.json()
print(f"Monitoring: {page['name']} (ID: {page['id']})")</code></pre>
<h4>Node.js</h4>
<pre><code class="language-javascript">const response = await fetch("https://pagecrawl.io/api/track-simple", {
  method: "POST",
  headers: {
    "Authorization": "Bearer YOUR_API_TOKEN",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com/pricing",
    tracking_mode: "fullpage",
    frequency: 60,
  }),
});

const page = await response.json();
console.log(`Monitoring: ${page.name} (ID: ${page.id})`);</code></pre>
<p><strong>Tracking modes:</strong></p>
<ul>
<li><code>fullpage</code> - all visible text (default)</li>
<li><code>content_only</code> - text without navigation, headers, footers</li>
<li><code>reader</code> - reader mode content only</li>
<li><code>price</code> - auto-detect and track prices</li>
<li><code>specific_text</code> - specific element (requires <code>selector</code>)</li>
<li><code>specific_number</code> - numeric value from element (requires <code>selector</code>)</li>
</ul>
<p><strong>Frequency</strong> is in minutes. Use <code>1440</code> for daily, <code>60</code> for hourly, <code>15</code> for every 15 minutes (depends on your plan).</p>
<h3>Step 3: Set Up a Webhook</h3>
<p>Create a webhook to receive notifications when changes are detected.</p>
<h4>curl</h4>
<pre><code class="language-bash">curl -X POST "https://pagecrawl.io/api/hooks" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "target_url": "https://your-server.com/webhook",
    "match_type": "all",
    "events": ["change_detected"]
  }'</code></pre>
<h4>Python</h4>
<pre><code class="language-python">response = requests.post(
    "https://pagecrawl.io/api/hooks",
    headers={"Authorization": "Bearer YOUR_API_TOKEN"},
    json={
        "target_url": "https://your-server.com/webhook",
        "match_type": "all",
        "events": ["change_detected"],
    },
)

hook = response.json()
print(f"Webhook created (ID: {hook['id']})")</code></pre>
<p><strong>Match types:</strong> <code>all</code> (every page), <code>monitors</code> (specific pages), <code>tags</code> (by tag), <code>folders</code> (by folder), <code>domains</code> (by domain).</p>
<p><strong>Events:</strong> <code>change_detected</code>, <code>error</code>, <code>price_change_detected</code>.</p>
<h3>Step 4: Handle Webhook Payloads</h3>
<p>When a change is detected, PageCrawl POSTs a JSON payload to your webhook URL. Here is an example handler:</p>
<h4>Python (Flask)</h4>
<pre><code class="language-python">from flask import Flask, request

app = Flask(__name__)

@app.route("/webhook", methods=["POST"])
def handle_change():
    data = request.json

    print(f"Change detected: {data['title']}")
    print(f"New content: {data['contents']}")
    print(f"Difference: {data['human_difference']}")

    if data.get("ai_summary"):
        print(f"AI Summary: {data['ai_summary']}")

    return "", 200</code></pre>
<h4>Node.js (Express)</h4>
<pre><code class="language-javascript">app.post("/webhook", (req, res) =&gt; {
  const data = req.body;

  console.log(`Change detected: ${data.title}`);
  console.log(`New content: ${data.contents}`);
  console.log(`Difference: ${data.human_difference}`);

  if (data.ai_summary) {
    console.log(`AI Summary: ${data.ai_summary}`);
  }

  res.sendStatus(200);
});</code></pre>
<p><strong>Key payload fields:</strong></p>
<ul>
<li><code>title</code> - Page name</li>
<li><code>contents</code> - Current value of the tracked element</li>
<li><code>difference</code> - Text difference percentage (0-100)</li>
<li><code>human_difference</code> - Human-readable change description</li>
<li><code>ai_summary</code> - AI-generated plain-language summary of the change</li>
<li><code>ai_priority_score</code> - 0-100 importance score</li>
<li><code>markdown_difference</code> - Change diff in markdown format</li>
<li><code>page_screenshot_image</code> - Signed URL to the page screenshot</li>
</ul>
<p>You can customize which fields are included when creating the webhook via the <code>payload_fields</code> parameter.</p>
<h3>Other Useful Endpoints</h3>
<table>
<thead>
<tr>
<th>Endpoint</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>GET /api/pages</code></td>
<td>List all monitored pages</td>
</tr>
<tr>
<td><code>GET /api/pages/{id}</code></td>
<td>Get page details and latest values</td>
</tr>
<tr>
<td><code>PUT /api/pages/{id}</code></td>
<td>Update page settings</td>
</tr>
<tr>
<td><code>DELETE /api/pages/{id}</code></td>
<td>Delete a page</td>
</tr>
<tr>
<td><code>PUT /api/pages/{id}/check</code></td>
<td>Trigger an immediate check</td>
</tr>
<tr>
<td><code>PUT /api/pages/{id}/status</code></td>
<td>Enable or disable monitoring</td>
</tr>
<tr>
<td><code>GET /api/pages/{id}/history</code></td>
<td>Get full check history</td>
</tr>
<tr>
<td><code>GET /api/pages/{id}/checks/{checkId}/diff.markdown</code></td>
<td>Get text diff as markdown</td>
</tr>
</tbody>
</table>
<h3>Download the OpenAPI Spec</h3>
<p>The full API specification is available as an OpenAPI 3.0 YAML file. Import it into Postman, Insomnia, or any API client:</p>
<pre><code>https://pagecrawl.io/api/openapi.yaml</code></pre>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/api-webhooks-for-custom-integrations">API &amp; Webhooks overview</a></li>
<li><a href="/help/integrations/article/webhook-integration">Webhook Integration guide</a></li>
<li><a href="/help/integrations/article/mcp-server-ai-tools">AI Assistants (MCP Server)</a></li>
<li><a href="/developers">Full API Reference</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-05-06T14:16:21+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Scheduled Reports - Bundle Change Notifications Into Digests]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/notifications/article/scheduled-reports" />
            <id>https://pagecrawl.io/100</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Scheduled Reports - Bundle Change Notifications Into Digests</h1>
<p>Scheduled reports let you group monitors together and receive a single digest summarizing all detected changes on a schedule you choose. Instead of getting an instant notification for every change, you get one consolidated report covering everything that happened since the last digest.</p>
<p>This is especially useful when you monitor many pages and want to review changes in batches rather than reacting to each one individually.</p>
<h3>When to Use Reports vs Instant Notifications</h3>
<table>
<thead>
<tr>
<th>Scenario</th>
<th>Recommended</th>
</tr>
</thead>
<tbody>
<tr>
<td>Monitoring a handful of critical pages</td>
<td>Instant notifications</td>
</tr>
<tr>
<td>Tracking 50+ competitor pages for pricing</td>
<td>Scheduled report (daily or weekly)</td>
</tr>
<tr>
<td>Legal/compliance pages that rarely change</td>
<td>Scheduled report (weekly or monthly)</td>
</tr>
<tr>
<td>Stock availability that needs immediate action</td>
<td>Instant notifications with escalation</td>
</tr>
<tr>
<td>Executive stakeholder updates</td>
<td>Scheduled report with AI summary</td>
</tr>
</tbody>
</table>
<p>You can mix both approaches. Monitors that are not assigned to any report continue to send instant notifications as usual. Monitors assigned to a report will only appear in digests (unless escalation is configured for urgent changes).</p>
<h3>Creating a Report</h3>
<p>Go to <strong>Settings &gt; Workspace &gt; Alerts &amp; Reports</strong> and select the <strong>Scheduled Summary Reports</strong> tab. Click <strong>Add Report</strong>, or start from one of the prebuilt templates, to configure:</p>
<div class="kb-figure">
  <img src="/images/knowledge/settings-scheduled-reports.png" alt="Scheduled Summary Reports page with the Add Report button and prebuilt report templates">
</div>
<p><strong>Name</strong> - Give your report a descriptive name, such as "Weekly competitor pricing" or "Daily legal page updates."</p>
<p><strong>Include changes from</strong> - Choose which monitors to include:</p>
<ul>
<li><strong>All monitors</strong> - Every monitor in the workspace</li>
<li><strong>By tag</strong> - Monitors with specific tags (useful for grouping by category, client, or project)</li>
<li><strong>By folder</strong> - Monitors in specific folders</li>
<li><strong>By website</strong> - Monitors grouped by their website domain</li>
<li><strong>Specific monitors</strong> - Hand-pick individual monitors by name or URL</li>
</ul>
<p><strong>Schedule</strong> - How often the digest is generated and sent:</p>
<ul>
<li><strong>Daily</strong> - Every day at your chosen hour</li>
<li><strong>Weekdays only</strong> - Monday through Friday</li>
<li><strong>Weekly</strong> - On a specific day of the week</li>
<li><strong>Monthly</strong> - On a specific day of the month</li>
<li><strong>On-demand only</strong> - Only generated when you manually click "Generate now"</li>
</ul>
<p>All times are based on your workspace timezone, which you can set in <strong>Settings &gt; Workspace &gt; General</strong>.</p>
<h3>Delivery Channels</h3>
<p>Each report can be delivered through one or more channels:</p>
<ul>
<li><strong>Email</strong> - Select team members and/or Additional Cc Emails as recipients. You can add CC and BCC recipients for stakeholders who need a copy.</li>
<li><strong>Slack</strong> - Enter a webhook URL or leave blank to use your workspace default</li>
<li><strong>Discord</strong> - Enter a webhook URL or leave blank to use your workspace default</li>
<li><strong>Microsoft Teams</strong> - Enter a webhook URL or leave blank to use your workspace default</li>
<li><strong>Telegram</strong> - Enter a chat ID or leave blank to use your workspace default</li>
</ul>
<h3>Content Filters</h3>
<p>Control which changes appear in each digest:</p>
<p><strong>Minimum importance</strong> - Every change is assigned an importance level based on how significant it is. You can filter each report to only include changes above a certain threshold:</p>
<ul>
<li><strong>All changes</strong> - Everything detected</li>
<li><strong>Medium and up</strong> - Skips trivial edits like whitespace or date stamps</li>
<li><strong>Important and up</strong> - Only notable changes like price drops, content rewrites</li>
<li><strong>Critical only</strong> - Only major changes like large price swings, availability shifts</li>
<li><strong>Custom</strong> - Set your own threshold</li>
</ul>
<p><strong>Show only most recent change per monitor</strong> - When a monitor detects multiple changes between digests, only the latest one is shown. This keeps reports concise.</p>
<p><strong>Group by domain</strong> - Groups changes by website domain, useful when monitoring pages across many different sites.</p>
<h3>AI Executive Summary</h3>
<p>When enabled, each digest includes a short AI-written paragraph at the top summarizing the most important changes across all included monitors. This lets you scan the digest quickly without reading every individual change.</p>
<p>You can choose from several summary styles depending on how much detail you want, ranging from a single headline to a full multi-paragraph briefing. Some advanced styles are available on higher-tier plans.</p>
<h3>Priority Escalation</h3>
<p>Reports batch notifications by design, but some changes may need immediate attention. Priority escalation lets you bypass the schedule for high-priority changes.</p>
<p>When enabled, any change with a priority score above your escalation threshold is sent immediately through the escalation channels you configure. These can be different from your regular delivery channels. For example, you might receive daily email digests but get Slack alerts immediately when something critical happens.</p>
<p>Scoring is automatic. Larger, more meaningful changes (like significant price drops or availability shifts) score higher than minor edits. You don't need to configure scoring - it works out of the box for all monitor types.</p>
<h3>Shareable Digest Links</h3>
<p>Every generated digest gets a unique shareable link that works without requiring a PageCrawl account. You can share this link with anyone who needs to see the report.</p>
<p>Share links expire after 30 days by default. From the digest history, you can:</p>
<ul>
<li><strong>Rotate</strong> the link (generates a new URL, invalidating the old one)</li>
<li><strong>Revoke</strong> the link (disables access immediately)</li>
<li><strong>Refresh</strong> the expiration (extends it another 30 days)</li>
</ul>
<h3>Exporting Digests</h3>
<p>Each digest can be exported as:</p>
<ul>
<li><strong>PDF</strong> - Formatted report suitable for printing or archiving</li>
<li><strong>Excel</strong> - Spreadsheet with columns for date, group, monitor name, URL, priority, and AI summary</li>
<li><strong>CSV</strong> - Same data as Excel in CSV format</li>
</ul>
<h3>How Reports Interact with Instant Notifications</h3>
<p>When a monitor is assigned to any scheduled report, its instant workspace-level notifications (email, Slack, Discord, etc.) are bypassed. Changes are collected and delivered in the next digest instead.</p>
<p>The exceptions:</p>
<ul>
<li><strong>Escalation alerts</strong> still fire immediately when a change exceeds the escalation threshold</li>
<li><strong>Public subscriber notifications</strong> (for publicly shared monitors) are unaffected</li>
</ul>
<p>If you delete or disable a report, the monitors it covered go back to receiving instant notifications automatically.</p>
<h3>Plan Limits</h3>
<p>Standard plans include up to 2 reports. Higher-tier plans include unlimited reports with additional features like on-demand generation. If you downgrade your plan, excess reports are automatically paused and you receive an email listing which ones were affected.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Add Pages to PageCrawl from Android]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/add-page-from-android" />
            <id>https://pagecrawl.io/101</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Add Pages to PageCrawl from Android</h1>
<h3>What Is This?</h3>
<p>Install PageCrawl on your Android phone like any other app, then share any webpage to it directly from Chrome's share menu. The URL pre-fills automatically, so you can start monitoring in two taps.</p>
<h3>Step 1: Install PageCrawl on Your Phone</h3>
<p>PageCrawl works as a Progressive Web App (PWA) on Android. Once installed, it gets its own icon, opens without a browser bar, and shows up in the Android share sheet alongside apps like WhatsApp, Gmail, and Twitter.</p>
<p>To install:</p>
<ol>
<li>Open Chrome on your Android phone</li>
<li>Visit <a href="https://pagecrawl.io">PageCrawl.io</a> and sign in</li>
<li>Tap the <strong>menu</strong> (three dots in the top right)</li>
<li>Tap <strong>Install app</strong> (or <strong>Add to Home screen</strong>)</li>
<li>Confirm the install</li>
</ol>
<p>PageCrawl will now appear on your home screen and in your app drawer.</p>
<h3>Step 2: Share a Page to PageCrawl</h3>
<p>Open any page you want to monitor, in any app that has a Share button (Chrome, Firefox, news readers, X, Reddit, etc.):</p>
<ol>
<li>Tap the <strong>Share</strong> button</li>
<li>Tap <strong>PageCrawl</strong> in the share sheet</li>
<li>PageCrawl opens with the URL pre-filled</li>
<li>Adjust the monitoring options and tap <strong>Save</strong></li>
</ol>
<p>That's it. No copy-pasting URLs.</p>
<div class="kb-figure kb-figure--narrow">
  <img src="/images/knowledge/android-share-sheet.png" alt="Android Chrome share sheet open on a product page with PageCrawl highlighted as a share target">
</div>
<h3>Example: Track a Product Price from the Chrome App</h3>
<p>Say you are browsing a product on your phone and want to be alerted when the price drops:</p>
<ol>
<li>In Chrome, open the product page (for example, <code>https://competitor-store.com/product/wireless-headphones</code>).</li>
<li>Tap the <strong>Share</strong> button, then tap <strong>PageCrawl</strong> in the share sheet.</li>
<li>PageCrawl opens with the URL already filled in. Pick the <strong>Price</strong> tracking mode so PageCrawl watches the price value rather than the whole page.</li>
<li>Tap <strong>Save</strong>. The next time the price changes, you get a notification on the same phone.</li>
</ol>
<p>The same two-tap flow works from other apps too: share a Reddit thread from the Reddit app to watch for new replies, or share a job listing from LinkedIn to be alerted when it changes.</p>
<h3>Why You Need to Install First</h3>
<p>The Android share sheet only lists installed apps. Until you install PageCrawl as a PWA, it won't appear as a share target. Once installed, it works like any native app for sharing.</p>
<h3>Sharing Without Installing</h3>
<p>If you don't want to install the app, you can still add pages quickly using our <a href="/bookmarklet">bookmarklet</a>. Add it to your Chrome bookmarks, then tap it when you're on a page you want to monitor.</p>
<h3>Tips</h3>
<ul>
<li><strong>Pin to favorites</strong>: Long-press PageCrawl in the share sheet and pin it so it appears at the top.</li>
<li><strong>Stay signed in</strong>: Sign in once and stay signed in so shared pages open straight to the monitor setup screen.</li>
<li><strong>Works from any app</strong>: Any Android app that exposes a Share button can send to PageCrawl, not just Chrome.</li>
</ul>
<h3>Using an iPhone Instead?</h3>
<p>iOS doesn't support installing apps as system share targets the same way Android does. iPhone users should follow our <a href="/help/tutorials/article/add-page-from-ios-safari">iOS Safari shortcut guide</a> instead.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Packaging a PageCrawl Audit Trail for a Regulator]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/web-archives/article/audit-trail-for-regulators" />
            <id>https://pagecrawl.io/102</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Packaging a PageCrawl Audit Trail for a Regulator</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/web-archiving-featured.png" alt="Web archiving: timestamped snapshots of web pages preserved over time">
</div>
<p>When a regulator requests evidence of a public-facing webpage at a specific point in time, they expect three things: the archive itself, proof that it existed at that time, and a chain of custody that they can independently verify. PageCrawl produces all three by default on Ultimate plans.</p>
<p>This guide explains how to assemble and hand off a complete evidence bundle.</p>
<h3>What's in a PageCrawl evidence bundle</h3>
<p>For each tracked change, PageCrawl retains:</p>
<ul>
<li>The WACZ archive (<code>archive.wacz</code>), a self-contained, replayable archive of the captured page including HTML, screenshots, and linked documents.</li>
<li>An embedded WACZ Auth signature inside the WACZ.</li>
<li>Sidecar proof files from independent providers: <code>archive.wacz.ots</code> (OpenTimestamps Bitcoin anchor), one or more <code>archive.wacz.&lt;provider&gt;.tsr</code> files (RFC 3161 timestamps from commercial Trust Service Providers), and on Custom plans, <code>archive.wacz.qtsa.tsr</code> (qualified electronic timestamp from a QTSP on the EU Trusted List).</li>
<li>The raw underlying WARC (<code>capture.warc</code>) for ingestion into other archival systems.</li>
<li>A manifest hash and per-resource SHA-256 hashes inside the WACZ datapackage.</li>
<li>An access audit log recording every download, view, verify, and export of the archive.</li>
</ul>
<h3>Building an evidence bundle</h3>
<p>From the PageCrawl dashboard, on any tracked change:</p>
<ol>
<li>Select the checks to include in the bundle.</li>
<li>Click "Export evidence bundle".</li>
<li>PageCrawl produces a single zip containing each WACZ, every available sidecar proof, a <code>manifest.json</code> with per-archive integrity fingerprints, and a <code>README.txt</code> with verification instructions.</li>
</ol>
<p>The bundle is portable. Hand it to the regulator on a USB stick, attach it to a regulatory submission, or share it via the customer's own secure file transfer.</p>
<h3>The public verification page</h3>
<p>For regulators who prefer to inspect each archive interactively, generate a public verification link from any tracked archive. The link is a signed URL that grants read-only access to a verification page. The recipient does not need a PageCrawl account.</p>
<p>The verification page shows:</p>
<ul>
<li>The source URL and capture timestamp.</li>
<li>The manifest hash.</li>
<li>Every cryptographic attestation present (embedded signature plus each sidecar provider).</li>
<li>Download buttons for each raw proof file with verification command hints (e.g. <code>ots verify ...</code>, <code>openssl ts -reply -in ...</code>).</li>
</ul>
<p>Anonymous access is logged in the firm's archive access log so chain of custody is preserved.</p>
<h3>Sector-specific guidance</h3>
<h4>SEC examinations (broker-dealers, 17a-4)</h4>
<p>Pair the evidence bundle with the firm's recordkeeping policy and the designated executive officer's attestation. The 2022 amendments to 17a-4(f) explicitly contemplate audit-trail-based tamper evidence as an alternative to WORM storage. PageCrawl's manifest hashes plus multiple independent timestamp providers satisfy the structural tamper-evidence requirement.</p>
<p>For relevant FRE 902(13) / 902(14) framing in a parallel litigation context, see our <a href="/help/web-archives/article/verifying-a-web-archive">verification guide</a>.</p>
<h4>FDA 21 CFR Part 11 inspections (life sciences)</h4>
<p>The validation summary your regulated firm maintains for the PageCrawl system should reference: the URS describing what records the system retains, the audit-trail mechanism (manifest hashes plus timestamp proofs), the retention period, and the retrieval procedure. The bundle gives the inspector everything they need to validate the claim that the firm has accurate copies and audit trail per 11.10.</p>
<h4>HIPAA OCR investigations (healthcare)</h4>
<p>OCR investigators typically request the version of a Notice of Privacy Practices, breach notice page, or business-associate sub-processor list as it existed on a specific date. The public verification link is the single easiest artefact to share: the investigator clicks through, sees the manifest hash, and verifies the timestamp without needing access to internal systems.</p>
<h4>EU DPA inspections (GDPR, DORA)</h4>
<p>For data protection authority inspections under GDPR or DORA, the eIDAS-qualified-timestamp layer (Custom plan) provides Article 41(2) statutory legal presumption of accuracy. Even without it, the OpenTimestamps Bitcoin anchor plus the commercial Trust Service Provider RFC 3161 timestamps give the supervisory authority sufficient evidence of the archive's existence at the recorded time.</p>
<h3>Why multi-provider matters</h3>
<p>Single-provider proofs are vulnerable to the provider's lifecycle: a TSA can revoke a key, sunset a service, or be compromised. Layering attestations across an open blockchain (OpenTimestamps), one or more commercial Trust Service Providers, and optionally a QTSP on the EU Trusted List gives the firm independent backups. If any one layer becomes unverifiable in the future, the others still attest. This redundancy is itself a defensive credential.</p>
<h3>Related articles</h3>
<ul>
<li><a href="/help/web-archives/article/verifying-a-web-archive">Verifying a PageCrawl Web Archive</a></li>
<li><a href="/help/web-archives/article/share-archives-publicly">Sharing archives publicly</a></li>
<li><a href="/help/web-archives/article/wacz-format-explained">What is WACZ?</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[eIDAS Qualified Timestamps (Custom plan add-on)]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/web-archives/article/eidas-qualified-timestamps" />
            <id>https://pagecrawl.io/103</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>eIDAS Qualified Timestamps</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/web-archiving-featured.png" alt="Web archiving: timestamped snapshots of web pages preserved over time">
</div>
<p>EU Regulation 910/2014 (eIDAS) defines a class of cryptographic timestamp called a Qualified Electronic Timestamp, issued by a Qualified Trust Service Provider (QTSP) under EU member-state supervisory oversight. Under Article 41(2), a qualified timestamp carries statutory legal presumption of accuracy of the date and time it indicates and of the integrity of the data bound to it. It is the strongest evidentiary credential available under EU law for proving that a piece of digital content existed at a specific moment.</p>
<p>PageCrawl supports eIDAS qualified timestamping as a Custom plan add-on. This article explains what the feature provides, when you might need it, and how to enable it.</p>
<h3>When you need it</h3>
<p>You probably do not need eIDAS qualified timestamps if your archives are primarily for internal compliance documentation, US-court evidentiary preparation under FRE 902(13)/(14), or routine regulatory inspections. The default Ultimate-plan archive (a domain-identity signature, an OpenTimestamps Bitcoin anchor, and one or more RFC 3161 timestamps from commercial Trust Service Providers) covers those scenarios.</p>
<p>You probably do need eIDAS qualified timestamps if you are an EU regulated entity (financial institution under DORA, life sciences company subject to EMA/national-authority inspection, controller subject to GDPR DPA inspection in a member state where qualified evidence is preferred), and you anticipate that the integrity of the archive will be tested in an EU court, supervisory inspection, or formal regulatory dispute.</p>
<h3>What you get</h3>
<p>When the add-on is enabled for your account, every WACZ archive PageCrawl produces is also stamped with an RFC 3161 timestamp from the QTSP we have contracted with. The resulting <code>.qtsa.tsr</code> file is retained alongside the WACZ in the same Check directory, downloadable via the API or via the public verification page.</p>
<p>The proof file is a standard RFC 3161 TimeStampResp DER-encoded structure. It is verifiable with <code>openssl ts -reply -in archive.wacz.qtsa.tsr -text</code> and with any commercial PKI verification tool. The proof binds the WACZ's SHA-256 hash to a specific moment in time, signed by the QTSP's qualified seal.</p>
<h3>How to enable</h3>
<p>eIDAS qualified timestamping is provisioned manually because each customer's setup involves a per-stamp QTSP cost and may require contractual coordination on specific qualified providers depending on jurisdiction. To enable:</p>
<ol>
<li>Contact our sales team via the contact page on the website.</li>
<li>We discuss your jurisdiction, anticipated stamp volume, and provider preferences.</li>
<li>We enable the feature on your account and configure stamping to route through the chosen QTSP.</li>
<li>From the next detected change onward, every Ultimate-plan WACZ ships with the qualified timestamp attached.</li>
</ol>
<p>Existing archives produced before enablement are not retroactively stamped. Re-stamping historical archives is possible on request but would incur per-stamp costs and is typically only done when a specific historical period needs upgraded evidentiary status for a known regulatory matter.</p>
<h3>Verifying a qualified timestamp</h3>
<pre><code>openssl ts -reply -in archive.wacz.qtsa.tsr -text</code></pre>
<p>The output describes the timestamp authority's identity, the time of stamping, and the SHA-256 hash bound to the timestamp. To complete verification, validate the QTSP's signing certificate against the EU Trusted List for the issuing jurisdiction (<a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2015.235.01.0026.01.ENG">https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2015.235.01.0026.01.ENG</a>). The EU Commission and member-state authorities maintain the trust lists; the QTSP's certificate must appear there for the timestamp to qualify under Article 41.</p>
<h3>What it does not do</h3>
<ul>
<li>Qualified timestamps prove <strong>time</strong>, not <strong>author</strong>. They do not bind the archive to a specific natural or legal person. Identity binding (qualified electronic signatures, qualified electronic seals) is a separate eIDAS service and not currently part of the PageCrawl integration.</li>
<li>Qualified timestamps do not extend to retention obligations. The archive itself must be retained for whatever period your regulatory regime requires, by you. PageCrawl's retention is determined by your plan tier.</li>
<li>Qualified timestamps do not retrofit unverifiable archives. The integrity guarantee applies from the moment of stamping forward.</li>
</ul>
<h3>Related articles</h3>
<ul>
<li><a href="/help/web-archives/article/verifying-a-web-archive">Verifying a PageCrawl Web Archive</a></li>
<li><a href="/help/web-archives/article/audit-trail-for-regulators">Packaging a PageCrawl Audit Trail for a Regulator</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Sharing PageCrawl Archives Publicly]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/web-archives/article/share-archives-publicly" />
            <id>https://pagecrawl.io/104</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Sharing PageCrawl Archives Publicly</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/web-archiving-featured.png" alt="Web archiving: timestamped snapshots of web pages preserved over time">
</div>
<p>Some audiences need to verify a PageCrawl archive without a PageCrawl account: a regulator examining your records, opposing counsel reviewing a docket capture, an auditor packaging evidence for a board, a journalist citing a primary source. PageCrawl supports this through a public verification page accessible via a signed URL.</p>
<p>This article explains how the link works, what the recipient sees, and how to revoke a link if needed.</p>
<h3>Generating a public verification link</h3>
<p>From the PageCrawl interface, on any tracked change with an archive:</p>
<ol>
<li>Click the link icon (or open the archive information panel).</li>
<li>Click "Copy public verification link".</li>
<li>Paste the link into an email, a docket filing, an audit report, or wherever else.</li>
</ol>
<p>The link is a cryptographically signed URL. Anyone who has the link can open the verification page, and the link cannot be guessed by anyone who does not have it.</p>
<h3>What the recipient sees</h3>
<p>The verification page renders without authentication. It shows:</p>
<ul>
<li>The source URL and capture timestamp.</li>
<li>The archive's manifest hash (SHA-256 of the WACZ datapackage).</li>
<li>Every cryptographic attestation present:<ul>
<li>The embedded WACZ Auth signature (with signing-service domain and creation time)</li>
<li>The OpenTimestamps Bitcoin anchor proof</li>
<li>The RFC 3161 timestamps from each commercial Trust Service Provider used at capture time</li>
<li>The eIDAS qualified timestamp (Custom plan only, when applicable)</li>
</ul>
</li>
<li>A download button for each raw proof file with an inline command-line verification hint (e.g. <code>ots verify ...</code>, <code>openssl ts -reply -in ...</code>).</li>
<li>A link to open the WACZ in ReplayWeb.page for a fully interactive replay of the captured page.</li>
</ul>
<p>The page does not expose any other archives, settings, or account information from your workspace. Only the specific archive corresponding to the link.</p>
<h3>Audit log of public access</h3>
<p>Every public verification page view and every public proof download is logged in your archive access audit log, recording whether it was a verification view or a proof download, the recipient's IP address, the user agent, and the timestamp. The log is queryable from the PageCrawl interface and via the API. Chain of custody is preserved even when the recipient is anonymous.</p>
<h3>Link expiry and revocation</h3>
<p>By default the signed link does not expire. The link remains valid for as long as the archive is retained and you have not revoked it.</p>
<p>To revoke previously issued links, use the revoke option in your workspace security settings. All previously issued public verification links become invalid; any links you generate afterward keep working.</p>
<p>For situations where you want time-bound access (for example, sharing with a vendor for a specific audit window), generate a fresh link from the API with an explicit <code>expires_at</code> parameter. The link will reject access after the expiry timestamp.</p>
<h3>Why this matters</h3>
<p>A public verification link is the lightest practical way to deliver an attested archive to an external party. The recipient does not need credentials, does not need to install tooling (although they can, for offline verification), and does not need to take your word for it. The page itself shows them every cryptographic attestation, and the underlying proofs are independently verifiable by any standard tooling.</p>
<p>In an era when AI can fabricate any screenshot or document, the public verification page is how PageCrawl users hand off "this is what the page looked like at this moment, attested by parties we don't control" without friction.</p>
<h3>Related articles</h3>
<ul>
<li><a href="/help/web-archives/article/verifying-a-web-archive">Verifying a PageCrawl Web Archive</a></li>
<li><a href="/help/web-archives/article/audit-trail-for-regulators">Packaging a PageCrawl Audit Trail for a Regulator</a></li>
<li><a href="/help/web-archives/article/wacz-format-explained">What is WACZ?</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Verifying a PageCrawl Web Archive]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/web-archives/article/verifying-a-web-archive" />
            <id>https://pagecrawl.io/105</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Verifying a PageCrawl Web Archive</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/web-archiving-featured.png" alt="Web archiving: timestamped snapshots of web pages preserved over time">
</div>
<p>A PageCrawl web archive carries multiple cryptographic attestations that prove the archive existed at a specific moment in time and any modification invalidates the attestations. Each layer can be verified independently, offline, by anyone with the proof file and standard public tooling. No PageCrawl account is required to verify.</p>
<p>This guide walks through verifying each layer.</p>
<h3>What you need</h3>
<ul>
<li>The WACZ archive file (<code>archive.wacz</code>).</li>
<li>One or more sidecar proof files: <code>archive.wacz.ots</code>, one or more <code>archive.wacz.&lt;provider&gt;.tsr</code> files, and (on Custom plans) <code>archive.wacz.qtsa.tsr</code>. The verification page accompanying each archive lists the exact filenames present for that archive.</li>
<li>Either the public verification page link, or these command-line tools installed locally:<ul>
<li><code>openssl</code> (most systems have it)</li>
<li>The OpenTimestamps client (<code>pip install opentimestamps-client</code>, exposes the <code>ots</code> command)</li>
</ul>
</li>
</ul>
<h3>Verifying the embedded WACZ Auth signature</h3>
<p>Every Ultimate-plan archive includes an embedded <code>signedData</code> block inside <code>datapackage-digest.json</code>, conforming to the WACZ Auth specification. The simplest way to verify it is:</p>
<ol>
<li>Open the WACZ in <a href="https://replayweb.page">ReplayWeb.page</a>.</li>
<li>The viewer renders a verified-signature badge if the embedded signature is intact.</li>
</ol>
<p>If you prefer to inspect the signature manually, extract <code>datapackage-digest.json</code> from the WACZ zip and read the <code>signedData</code> block. The block contains the signing-service domain, the signature, and an embedded RFC 3161 timestamp.</p>
<h3>Verifying the OpenTimestamps Bitcoin anchor</h3>
<p>OpenTimestamps anchors a hash of the archive to the Bitcoin blockchain via a calendar server. The proof is verifiable offline against the public blockchain.</p>
<pre><code>ots verify archive.wacz.ots</code></pre>
<p>Output explains whether the proof is fully Bitcoin-anchored, pending confirmation, or invalid. A pending proof typically becomes anchored within a few hours of capture.</p>
<p>If you do not have the OpenTimestamps client installed, see <a href="https://opentimestamps.org">https://opentimestamps.org</a> for installation instructions.</p>
<h3>Verifying the commercial TSP timestamps</h3>
<p>Each commercial Trust Service Provider issues RFC 3161 timestamp responses signed by its TSP keys. Verification uses standard OpenSSL.</p>
<pre><code>openssl ts -reply -in archive.wacz.digicert.tsr -text</code></pre>
<p>This prints the timestamp response in human-readable form: the time of stamping, the signing TSP's identity, and the hash bound to the timestamp. Pair it with a chain-validation step against the issuing TSP's public certificate to confirm the signature.</p>
<p>Repeat the same command for any additional <code>archive.wacz.&lt;provider&gt;.tsr</code> files present alongside the archive.</p>
<h3>Verifying an eIDAS qualified timestamp (Custom plans)</h3>
<p>Custom-plan archives may also include <code>archive.wacz.qtsa.tsr</code>, an eIDAS qualified timestamp under EU Regulation 910/2014 Article 41(2). Verification uses the same OpenSSL command, against the issuing Qualified Trust Service Provider's certificate chain.</p>
<pre><code>openssl ts -reply -in archive.wacz.qtsa.tsr -text</code></pre>
<p>Under Article 41(2), a successful verification gives the archive statutory legal presumption of accuracy of the date and time and of the integrity of the bound data.</p>
<h3>The public verification page</h3>
<p>Every Ultimate-plan archive can be shared publicly via a signed link generated from the PageCrawl interface. The recipient sees:</p>
<ul>
<li>The source URL and capture timestamp.</li>
<li>The manifest hash.</li>
<li>A list of every layer that stamped the archive, with download links for the raw proof files and verification command hints.</li>
</ul>
<p>The verification page does not require an account. It is intended for sharing with regulators, auditors, opposing counsel, or anyone who needs to independently confirm that the archive is genuine.</p>
<h3>Why this matters in 2026</h3>
<p>Generative AI can produce a plausible screenshot, HTML page, or PDF on demand. A self-stored archive proves nothing on its own. What AI cannot fabricate is a hash anchored to the Bitcoin blockchain, an RFC 3161 timestamp signed by a Trust Service Provider's private key, or a qualified seal from a regulated QTSP. Multi-provider cryptographic attestation is the only practical standard for evidentiary archives in an AI-saturated world.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[What's in a PageCrawl WACZ Archive]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/web-archives/article/wacz-format-explained" />
            <id>https://pagecrawl.io/106</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>What's in a PageCrawl WACZ Archive</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/web-archiving-featured.png" alt="Web archiving: timestamped snapshots of web pages preserved over time">
</div>
<p>WACZ (Web Archive Collection Zipped) is an open specification developed by Webrecorder for packaging web archives in a portable, replayable, tamper-evident format. WACZ is used by the Internet Archive, the Library of Congress, and major eDiscovery and digital-preservation platforms. Storing PageCrawl archives in WACZ means they are interoperable with the wider archival ecosystem.</p>
<p>This article explains what's inside a PageCrawl WACZ, what the embedded signature does, and why we ship additional sidecar proofs alongside.</p>
<h3>Inside the WACZ zip</h3>
<p>A WACZ file is a zip archive with a defined internal structure:</p>
<ul>
<li><code>archive/data.warc.gz</code>, the WARC (Web ARChive) file containing the captured HTTP responses (HTML, images, scripts, stylesheets, linked PDFs, etc.) in their original byte form.</li>
<li><code>pages/pages.jsonl</code>, a list of pages captured, one JSON object per line, with the URL, timestamp, and title.</li>
<li><code>datapackage.json</code>, a manifest listing every file inside the archive along with its size, mime type, and SHA-256 hash. This is the canonical integrity manifest.</li>
<li><code>datapackage-digest.json</code>, a SHA-256 hash of <code>datapackage.json</code> itself, plus an optional <code>signedData</code> block (WACZ Auth specification).</li>
</ul>
<p>The hashes in <code>datapackage.json</code> chain into <code>datapackage-digest.json</code>, which is itself either signed by the WACZ Auth <code>signedData</code> block or simply stored alongside. Modifying any byte of any captured resource invalidates the manifest. The system is structurally tamper-evident.</p>
<h3>The embedded signedData block (WACZ Auth)</h3>
<p>When you enable WACZ capture on an Ultimate-plan page, the archive includes an embedded canonical signature following the <a href="https://specs.webrecorder.net/wacz-auth/0.1.0/">WACZ Auth specification 0.1.0</a>. The signature lives inside <code>datapackage-digest.json</code> and contains:</p>
<ul>
<li>The cryptographic signature of <code>datapackage.json</code>'s hash.</li>
<li>An RFC 3161 timestamp issued by a Trust Service Provider.</li>
<li>The signing service's domain certificate, proving its identity.</li>
</ul>
<p>This is the WACZ-spec-compliant way to sign a WACZ archive. When a WACZ-aware tool (such as ReplayWeb.page) opens the archive, it reads the <code>signedData</code> block and renders an integrity badge if the signature validates. The badge tells a reviewer that the archive was signed by the named domain at the indicated time.</p>
<h3>Sidecar proof files</h3>
<p>Alongside the WACZ, PageCrawl retains additional proof files that don't fit inside the WACZ Auth spec:</p>
<ul>
<li><code>archive.wacz.ots</code>, OpenTimestamps proof, anchored to the Bitcoin blockchain.</li>
<li><code>archive.wacz.&lt;provider&gt;.tsr</code>, an RFC 3161 timestamp file. Each captured archive carries one or more provider TSR files from commercial Trust Service Providers. The current providers are listed on the verification page accompanying each archive.</li>
<li><code>archive.wacz.qtsa.tsr</code> (Custom plans), qualified electronic timestamp from a QTSP on the EU Trusted List.</li>
</ul>
<p>The WACZ Auth spec only supports one embedded signature, so additional providers ship as sidecar files. Sidecars do not violate the WACZ format spec; they live in the same directory and are independent artefacts. Each sidecar is verifiable with public tooling (<code>ots verify</code>, <code>openssl ts -reply -in</code>) without touching the WACZ.</p>
<p>This dual approach gives the best of both worlds: spec-compliant embedded signature for WACZ-aware tooling, plus multi-provider redundancy for evidentiary depth.</p>
<h3>How to read a WACZ</h3>
<p>The simplest way to inspect a WACZ archive is to drag it into <a href="https://replayweb.page">ReplayWeb.page</a>. It renders the captured pages as the user originally saw them, including JavaScript-rendered content where applicable, plus the integrity badge from the embedded signature.</p>
<p>If you want to inspect the WACZ outside ReplayWeb.page, treat it as a regular zip archive. Standard zip tools can list and extract its contents. <code>datapackage.json</code> enumerates the captured resources and their hashes; <code>pages/pages.jsonl</code> enumerates the captured URLs.</p>
<h3>How to download</h3>
<p>Each tracked change with an archive shows a download button in the PageCrawl interface. From the archive details panel you can also download the per-provider timestamp proofs and the underlying WARC file for ingestion into other archival systems.</p>
<h3>Related articles</h3>
<ul>
<li><a href="/help/web-archives/article/verifying-a-web-archive">Verifying a PageCrawl Web Archive</a></li>
<li><a href="/help/web-archives/article/share-archives-publicly">Sharing archives publicly</a></li>
<li><a href="/help/web-archives/article/audit-trail-for-regulators">Packaging a PageCrawl Audit Trail for a Regulator</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Parse: Monitor Any Value with Plain English]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/parse-tracked-element" />
            <id>https://pagecrawl.io/107</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Parse: Monitor Any Value with Plain English</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/monitoring-type-parse.png" alt="Parse example: an event page on the left, and PageCrawl's AI-extracted event date shown as a diff on the right">
</div>
<p>Parse is a tracked element that uses AI to read the page on every check and return one value you described in plain English. You write a sentence describing what to retrieve, and the AI returns the answer each time the page changes. There are no CSS selectors, no XPath, no manual element picking, and no scripts.</p>
<p>If you have ever wanted to monitor a value that "moves around" between page layouts, sits inside a paragraph of prose, or has to be derived from the content rather than scraped from a fixed spot, Parse is the mode for that job.</p>
<h3>What Parse Actually Does</h3>
<p>On each check, PageCrawl reads the page and uses AI to return the single value described by your prompt. That value is stored as the current value of the tracked element.</p>
<p>From that point on, Parse behaves like any other tracked element. The value is compared to the previous value, alerts fire when it changes, history is recorded, and the value is available in notifications, webhooks, exports, and reports.</p>
<p>You are not building a chatbot. You are asking for one piece of data, and the result is treated as plain text.</p>
<h3>How to Set It Up</h3>
<ol>
<li>Open the page you want to monitor, or create a new monitor.</li>
<li>Add a tracked element and choose the <strong>Parse</strong> type from the element list.</li>
<li>In the prompt field, describe the value you want, including the exact format. (See the prompt-writing guidance further down.)</li>
<li>Optionally give the element a friendly label so notifications and history are easy to read (for example "Headline price" or "Next earnings date").</li>
<li>Save the monitor. The first check runs immediately and populates the initial value, which becomes the baseline.</li>
</ol>
<div class="kb-figure">
  <img src="/images/knowledge/simple-what-to-track.png" alt="What to Track panel with the Parse option among the tracking mode tabs">
</div>
<p>That is the whole setup. There is no selector to maintain and no script to debug.</p>
<h3>When Parse Is Useful</h3>
<p>Parse shines whenever the value you care about is <strong>semantic</strong> rather than positional. Good examples:</p>
<ul>
<li>A date hidden inside a paragraph ("Our next investor call is scheduled for March 5, 2026.")</li>
<li>The lowest of several prices on a page, or the price after a discount is applied</li>
<li>A name from an About / team / leadership page</li>
<li>A version number, build identifier, or release tag in release notes</li>
<li>A status word like "Open", "Sold out", "Beta", "Coming soon"</li>
<li>A count or score that appears in different positions depending on the layout</li>
<li>A field on a page that frequently gets redesigned, where a CSS selector would keep breaking</li>
<li>A value that the page expresses in different units or formats, where you want the AI to normalize it</li>
</ul>
<p>If your prompt could be answered by a person glancing at the page in under five seconds, Parse will usually handle it well.</p>
<h3>When Parse Is Not the Right Tool</h3>
<p>Parse is the most expensive tracking mode, and it is not always the best choice. Prefer a dedicated tracking type when:</p>
<ul>
<li><strong>The value is a price you can see clearly.</strong> Use the <strong>Price</strong> tracked element. It auto-detects, normalizes, and handles availability for free.</li>
<li><strong>The value is a clean block of text.</strong> Use <strong>Text</strong> or <strong>Full Page Text</strong>. They run without AI and don't burn AI credits.</li>
<li><strong>You only want to know whether something exists on the page.</strong> Use <strong>Boolean</strong> or <strong>Availability</strong>.</li>
<li><strong>You want a number that is visually prominent.</strong> Use <strong>Number</strong> or <strong>Rating</strong>, they are cheaper and deterministic.</li>
<li><strong>You want to track every item in a feed or grid.</strong> Use the <strong>Feed</strong> tracking mode for structured item lists.</li>
</ul>
<p>Parse is also the wrong tool when:</p>
<ul>
<li>The information you want is only revealed by user interaction (clicking a tab, expanding a section). Add an action to expose the content first.</li>
<li>The value changes constantly for cosmetic reasons (timestamps, "as of now" counters, rotating banner text). You will get noisy alerts.</li>
</ul>
<h3>Writing a Good Prompt</h3>
<p>The single biggest factor in getting stable, useful results is being precise about format. AI can phrase the same answer in many different ways, and any of those variations will look like a change.</p>
<p>Compare these:</p>
<table>
<thead>
<tr>
<th>Vague</th>
<th>Explicit</th>
</tr>
</thead>
<tbody>
<tr>
<td>"the price"</td>
<td>"the listed price as a plain number with no currency symbol, e.g. 24.99"</td>
</tr>
<tr>
<td>"when's the next earnings"</td>
<td>"the next earnings call date in YYYY-MM-DD format"</td>
</tr>
<tr>
<td>"who is the CEO"</td>
<td>"the CEO's full legal name as printed on the page, no titles"</td>
</tr>
<tr>
<td>"is it in stock"</td>
<td>"exactly the word Yes if the product is in stock, otherwise the word No"</td>
</tr>
<tr>
<td>"the latest version"</td>
<td>"the latest released version number in semver format (e.g. 4.2.0)"</td>
</tr>
</tbody>
</table>
<p>Rules of thumb:</p>
<ul>
<li><strong>State the unit.</strong> "USD", "GBP", "as a percentage", "in days".</li>
<li><strong>State the format.</strong> "YYYY-MM-DD", "ISO 8601", "plain number", "uppercase".</li>
<li><strong>State what to do if the value is missing.</strong> For example, "If the value is not present on the page, return the word UNKNOWN." This prevents the AI from inventing something.</li>
<li><strong>Keep it to one value.</strong> Parse returns a single value per element. If you need three values, add three Parse elements.</li>
<li><strong>Reference visible cues.</strong> "The price shown in the largest red text", "the date in the section titled Upcoming Events".</li>
</ul>
<h3>What Not to Do</h3>
<ul>
<li>Don't ask the AI to summarize a page. Parse is for extracting one specific value, not for generating prose.</li>
<li>Don't ask for "anything that changed". The change comparison is automatic; your job is only to describe the value to extract.</li>
<li>Don't request a list, table, or JSON object. Use multiple Parse elements instead.</li>
<li>Don't include private context the AI cannot see ("the price our sales rep quoted yesterday"). It only knows what is on the page right now.</li>
<li>Don't write multi-paragraph prompts. One or two sentences is plenty and usually more reliable.</li>
</ul>
<h3>Cost</h3>
<p>Parse is the most expensive tracking mode on every plan. Use it for the values that really benefit from AI extraction, and leave routine price, text, and availability tracking to the cheaper dedicated modes.</p>
<h3>Plan Recommendation</h3>
<p>Parse is recommended on the <strong>Enterprise or Ultimate plan</strong>. Free and Standard plans include small AI allowances that are fine for testing the feature, but they are not sized for ongoing monitoring of many Parse elements at frequent intervals.</p>
<h3>Troubleshooting</h3>
<ul>
<li><strong>The value flips between two formats and triggers false alerts.</strong> Your prompt is not specific enough about format. Add the exact format you want.</li>
<li><strong>The value is sometimes empty.</strong> Tell the AI what to return when the value is missing (for example "UNKNOWN") so the result is consistent.</li>
<li><strong>The AI returns something that is not on the page.</strong> Tighten the prompt and reference a visible cue ("from the section titled Pricing", "the value labelled Total"). If the value really isn't on the page, Parse cannot find it.</li>
<li><strong>Parse is more expensive than expected.</strong> Check whether the page changes frequently for cosmetic reasons. Consider switching to a cheaper tracking mode or reducing the check frequency.</li>
</ul>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/available-tracked-monitoring-types">Available tracked monitoring types</a></li>
<li><a href="/help/features/article/ai-powered-change-detection">AI-powered change detection</a></li>
<li><a href="/help/tutorials/article/choosing-best-ai-model-website-monitoring">Choosing the best AI model for website monitoring</a></li>
<li><a href="/help/reduce-false-positives/article/reduce-false-positives-monitoring-website-for-changes">Reduce false positives</a></li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Add Pages to PageCrawl from Claude]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/add-page-from-claude" />
            <id>https://pagecrawl.io/108</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Add Pages to PageCrawl from Claude</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/add-from-claude-featured.png" alt="Claude logo">
</div>
<h3>What Is This?</h3>
<p>PageCrawl ships a Model Context Protocol (MCP) server that plugs directly into Claude. Once it is connected, you can describe what you want to monitor in plain English and Claude makes the API call for you, picking the right tracking mode, selector, frequency, and notification channel from your sentence. No tab-switching, no dashboard form.</p>
<p>This guide assumes you already have the MCP server connected. If you don't, the <a href="/knowledge/integrations/mcp-server-ai-tools">MCP integration guide</a> covers setup for Claude.ai web, Claude Code, Cursor, and other MCP-compatible clients in five minutes.</p>
<h3>Step 1: Any Plan Works, Including Free</h3>
<p>Read tools (<code>list-monitors</code>, <code>get-changes-since</code>, <code>get-check-diff</code>) and creating monitors with <code>add-page-monitor</code> work on every plan, including Free. Your plan's monitor-count limit still applies.</p>
<p>Action tools like <code>trigger-check</code>, <code>manage-tags</code> (add and remove), <code>mark-changes-seen</code>, and <code>update-monitor-defaults</code> require Standard or above. On Free those return a clear error pointing you to the upgrade page, but creating and exploring monitors works.</p>
<h3>Step 2: Describe What You Want to Monitor</h3>
<p>Open Claude (web or CLI) and tell it what to watch. A few real prompts that work first-try:</p>
<p><strong>Price tracking:</strong></p>
<blockquote>
<p>Watch <a href="https://www.apple.com/shop/buy-mac/macbook-pro">https://www.apple.com/shop/buy-mac/macbook-pro</a> and tell me when the price changes.</p>
</blockquote>
<p>Claude calls <code>add-page-monitor</code> with <code>tracking_mode: "price"</code>. The monitor tracks both the price and stock availability, runs daily, and notifies through your default channels.</p>
<p><strong>Legal page monitoring:</strong></p>
<blockquote>
<p>Send a weekly Slack message if Anthropic's consumer terms change at <a href="https://www.anthropic.com/legal/consumer-terms">https://www.anthropic.com/legal/consumer-terms</a>.</p>
</blockquote>
<p>Claude picks <code>tracking_mode: "reader"</code> (which strips nav and footer noise), sets <code>frequency: 10080</code> for weekly, and routes notifications to Slack.</p>
<p><strong>API monitoring:</strong></p>
<blockquote>
<p>Track the timestamp of the latest CVE published in <a href="https://services.nvd.nist.gov/rest/json/cves/2.0?resultsPerPage=1">https://services.nvd.nist.gov/rest/json/cves/2.0?resultsPerPage=1</a> and ping me hourly.</p>
</blockquote>
<p>Claude picks <code>tracking_mode: "json_path"</code> and writes the JSONPath expression <code>$.vulnerabilities[0].cve.published</code> from your description.</p>
<p><strong>Bulk setup:</strong></p>
<blockquote>
<p>Set up daily price-page monitors for these six competitor URLs and tag them all <code>competitor-pricing</code>: [list of URLs]</p>
</blockquote>
<p>Claude calls <code>add-page-monitor</code> six times with consistent settings and tags. Six form submissions in the dashboard becomes one chat message.</p>
<h3>Step 3: Verify the Monitor</h3>
<p>After Claude creates a monitor, ask:</p>
<blockquote>
<p>Show me the monitor you just created and confirm the first check ran.</p>
</blockquote>
<p>Claude calls <code>get-monitor-details</code> and <code>get-latest-values</code> and reads back the configuration and the first captured value. If something looks off (wrong selector, wrong frequency, the value field is empty), correct it in the same conversation: "actually use <code>ai_extract</code> with the prompt 'find the headline price'" or "drop the frequency to 15 minutes".</p>
<h3>Examples of Follow-Up Prompts</h3>
<p>Once monitors are in place, the read-side prompts are where the integration earns its keep:</p>
<ul>
<li><strong>Daily triage:</strong> "What changed across all monitors in the last 24 hours, sorted by priority?"</li>
<li><strong>Investigate a specific change:</strong> "Show me the exact diff for the monitor on stripe.com pricing."</li>
<li><strong>Force a recheck:</strong> "Trigger an immediate recheck on every monitor tagged <code>competitor-pricing</code>."</li>
<li><strong>Mark as reviewed:</strong> "Mark everything as seen, I am caught up."</li>
</ul>
<p>Each of these maps to one or two MCP tool calls (<code>get-changes-since</code>, <code>get-check-diff</code>, <code>trigger-check</code>, <code>mark-changes-seen</code>) that Claude picks automatically.</p>
<h3>Claude.ai Web vs Claude Code</h3>
<p>Both surfaces talk to the same MCP server with the same tools. The right one depends on how you work:</p>
<ul>
<li><strong>Claude.ai web</strong>: best for one-off setup, conversational triage, and sharing results with a teammate via a chat link. OAuth handshake the first time, then nothing to maintain.</li>
<li><strong>Claude Code</strong>: best when monitor setup is part of a development workflow. Token-based authentication makes it scriptable. Pipe a CSV of URLs into a single Claude Code session and have it create every monitor with consistent tags, or wire the MCP server into a CI job that creates a monitor for every new deployment URL.</li>
</ul>
<h3>What This Doesn't Do</h3>
<p>Some Advanced Setup features still belong in the web dashboard, not in chat:</p>
<ul>
<li><strong>Authentication flows.</strong> If a page needs a login, configure the credentials in <strong>Settings → Authentications</strong> and reference the saved authentication from the dashboard. Sharing credentials over chat is a bad pattern, and the MCP server intentionally does not expose authentication management.</li>
<li><strong>Complex action sequences with custom JavaScript.</strong> Claude can configure clicks, waits, scrolls, and short JavaScript snippets through MCP, but for monitors that need multi-step interaction with framework-aware event dispatching, the dashboard's Advanced Setup editor (with autocomplete and syntax highlighting) is the better surface.</li>
</ul>
<h3>Tips</h3>
<ul>
<li><strong>Lead with the page URL.</strong> "Watch https://… and …" almost always picks the right tracking mode. Putting the URL first gives Claude the strongest signal.</li>
<li><strong>Name the channel explicitly when you want one.</strong> "Ping me on Slack" or "email only" overrides the workspace default.</li>
<li><strong>Use tags for groups you will query later.</strong> "Tag this <code>competitor-pricing</code>" makes future "show me changes in this group" prompts trivial.</li>
<li><strong>Correct in the same turn.</strong> Claude does not need a perfect prompt. "Use <code>ai_extract</code> instead, with the prompt 'find the EU launch date'" works as a follow-up.</li>
</ul>
<h3>Want the Deeper Tour?</h3>
<p>The <a href="/blog/claude-mcp-natural-language-monitor-setup">Setting up PageCrawl monitors by talking to Claude (web and CLI) via MCP</a> blog post walks through five worked examples in detail, including the JavaScript tracked element and the <code>actions</code> array for monitors that need a click before extraction. Each example also shows the equivalent REST API call if you would rather script it from a shell.</p>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[How PageCrawl Uses AI]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/features/article/how-pagecrawl-uses-ai" />
            <id>https://pagecrawl.io/109</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>How PageCrawl Uses AI</h1>
<p>PageCrawl uses AI to make change monitoring easier to act on. When you enable AI for a workspace, it works automatically on detected changes. This page explains what AI produces and how to interpret it.</p>
<h3>Where AI is used</h3>
<ul>
<li><strong>Change summaries</strong> - When a monitored page changes, AI describes what changed in plain language instead of leaving you to read a raw text diff.</li>
<li><strong>Importance scoring</strong> - Each change is scored so you can filter out noise and be notified only about meaningful updates.</li>
<li><strong>Report digests</strong> - Periodic digests can include an AI-written overview of the changes in the period, grouped by importance.</li>
<li><strong>Setup assistance</strong> - When setting up a monitor, AI can suggest what to track on a page.</li>
</ul>
<p>AI is optional and configured per workspace under Settings. You can turn it off, adjust the minimum importance threshold, and add instructions describing what matters to you for more accurate results.</p>
<div class="kb-figure">
  <img src="/images/knowledge/simple-ai-notify.png" alt="What matters AI panel in the page editor with the AI filtering toggle and a focus instructions field">
</div>
<h3>AI-generated content may contain mistakes</h3>
<p>Summaries, importance scores, and digest overviews are generated by AI. They are intended to help you triage changes quickly, not to replace the underlying change record.</p>
<p>AI can occasionally misread a change, summarize it imprecisely, or score it higher or lower than you would. Before acting on a summary or score, review the actual detected change, which PageCrawl always records and highlights for you.</p>
<h3>Your data and AI training</h3>
<p>When AI is enabled, the content needed to produce a result is sent to an AI provider, processed to generate the summary or score, and the result is returned. We keep this deliberately narrow:</p>
<ul>
<li><strong>We do not train AI models on your content.</strong> PageCrawl does not use your monitored pages, change data, or AI settings to train any model.</li>
<li><strong>Managed AI runs under business API terms.</strong> When you use the AI credits included with your plan, content is processed through PageCrawl's managed AI infrastructure, which runs on commercial provider APIs operated under business terms that prohibit using your content to train their models.</li>
<li><strong>Only what is needed is sent.</strong> AI runs only when a change is detected, not on every check, and only the content required for that summary or score is sent. AI is never the system of record. The underlying change, diff, and screenshot are stored in your account and remain readable with AI turned off.</li>
</ul>
<p>For more control over where your content goes:</p>
<ul>
<li><strong>Bring your own key (BYOK).</strong> Connect your own AI provider key and content is sent directly to your provider instead of through PageCrawl's managed infrastructure. See the <a href="/help/integrations/article/ai-byok-setup-guide">BYOK setup guide</a>.</li>
<li><strong>Privacy Mode.</strong> With BYOK via OpenRouter, enabling Privacy Mode restricts routing to providers that do not use your data for training. See <a href="/help/tutorials/article/choosing-best-ai-model-website-monitoring">choosing the best AI model</a> for the per-provider data-usage breakdown.</li>
<li><strong>Turn AI off.</strong> AI is optional per workspace. With it disabled, no page content is sent to any AI provider, and you continue to receive change detection and diffs.</li>
</ul>
<p>If you monitor confidential or regulated content, PageCrawl does not train on it either way. For maximum control you can additionally use BYOK with a provider whose API terms exclude training, or leave AI disabled for that workspace.</p>
<h3>Where AI labels appear</h3>
<p>AI-generated content is labeled where it is shown:</p>
<ul>
<li>In the change detail view, summaries appear under an <strong>AI Summary</strong> / <strong>AI Analysis</strong> heading.</li>
<li>In report digests (web and shared links), an AI-written overview is marked as an <strong>AI Summary</strong>, and a note indicates that summaries and priority scores are AI-generated.</li>
<li>In the workspace AI settings, a note describes what AI produces and that its output may contain mistakes.</li>
</ul>
<p>If you prefer to work only from the raw change data, you can leave AI disabled for the workspace and continue to receive the underlying change detection and diffs.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/features/article/ai-powered-change-detection">AI-Powered Change Detection and Smart Filtering</a> - How AI summarization and importance scoring filter out noise</li>
<li><a href="/help/integrations/article/ai-byok-setup-guide">AI Integration Setup Guide (BYOK)</a> - Connect your own AI provider key</li>
<li><a href="/help/tutorials/article/choosing-best-ai-model-website-monitoring">Choosing the Right AI Model for Website Monitoring</a> - Compare models and pricing for BYOK users</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Home Assistant Integration]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/home-assistant-integration" />
            <id>https://pagecrawl.io/110</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Home Assistant Integration</h1>
<div class="kb-figure kb-figure--flush">
  <img src="/images/knowledge/home-assistant-flow.png" alt="Data flow: PageCrawl monitors web pages, detects changes, updates a Home Assistant sensor per monitor, which triggers your dashboards and automations">
</div>
<p>The PageCrawl Home Assistant integration turns each of your page monitors into native Home Assistant entities you can view on your dashboard, chart over time, and build automations on. Entities update in real time whenever PageCrawl detects a change.</p>
<div style="background:#f5f7fa;border:1px solid #e6e9ef;border-radius:8px;padding:14px 18px;margin:20px 0;">
  <strong>Note:</strong> Works on free PageCrawl accounts. You connect with OAuth (one click, no API token to create or paste). Your plan's monitor and check limits still apply.
</div>
<h3>Great Use Cases (what <code>rest</code> and <code>scrape</code> sensors can't do)</h3>
<p>Home Assistant already fetches URLs and scrapes CSS selectors on simple, static pages. PageCrawl is for the pages those built-in sensors fall short on:</p>
<ul>
<li><strong>Sites that block ordinary scrapers.</strong> A <code>rest</code> sensor pointed at a concert resale page, an airline fare, or most big retailers comes back with an error page or a "are you human?" challenge instead of the value. PageCrawl reads these reliably, so a ticket price, a flight fare, or a product page keeps working as a sensor.</li>
<li><strong>Pages behind a login, including ones that send a one-time code.</strong> Your energy tariff, broadband usage, council tax balance, or parcel status only shows once you are signed in, and some of those email a one-time passcode (OTP) every time. PageCrawl signs in for you, completes the OTP step, and surfaces the value. A <code>scrape</code> sensor just lands on the login screen.</li>
<li><strong>JavaScript-rendered pages.</strong> A GPU or console restock counter, a live delivery-slot grid, an EV charger's status, or a price that is injected by the page's scripts is simply not in the HTML a plain fetch downloads, so <code>scrape</code> returns nothing. PageCrawl loads the page fully first, then reads the rendered value.</li>
<li><strong>Pages that need steps first.</strong> Your council's next bin-collection date appears only after you type a postcode and submit the form. A GP or DVSA test slot appears only after you pick a location. PageCrawl performs those clicks and form fills once you configure them, and Home Assistant reads the result.</li>
<li><strong>AI extraction instead of brittle selectors.</strong> Ask in plain language for "the next bin collection date" or "is this product in stock", and it keeps working when the page is redesigned and the CSS selector you would have written breaks. No <code>div:nth-child(7) &gt; span</code> to re-find every time the site changes.</li>
<li><strong>Change detection without false positives.</strong> A raw <code>scrape</code> of a planning-application page or a terms-of-service page fires on every rotating banner, view counter, or reordered block. PageCrawl filters that noise out, so you are alerted only when the date, the price, or the actual text changes, with a human-readable summary of what changed.</li>
<li><strong>Visual change detection.</strong> Know when a status page, a webcam still, or a product image actually looks different, backed by screenshots, even when there is no clean text value to scrape at all.</li>
</ul>
<h3>When to Use rest/scrape vs PageCrawl</h3>
<p>Home Assistant's built-in <code>rest</code> and <code>scrape</code> sensors are a great fit for many pages, and they run entirely locally. Reach for PageCrawl only when they fall short:</p>
<table>
<thead>
<tr>
<th>Use a built-in <code>rest</code> / <code>scrape</code> sensor when</th>
<th>Use PageCrawl when</th>
</tr>
</thead>
<tbody>
<tr>
<td>The page is static, public HTML or a JSON API</td>
<td>The page needs JavaScript to render the value</td>
</tr>
<tr>
<td>A stable CSS selector or JSON path exists</td>
<td>No reliable selector, or it breaks when the page changes</td>
</tr>
<tr>
<td>The page has no login and allows automated requests</td>
<td>The page needs a login or blocks ordinary scrapers</td>
</tr>
<tr>
<td>The value is visible on first load</td>
<td>The value only appears after a login, click, or form submission</td>
</tr>
<tr>
<td>You only need the current value</td>
<td>You want change history, diffs, or a human or AI summary of what changed</td>
</tr>
<tr>
<td>Any change to the value is meaningful</td>
<td>The page is noisy (ads, timestamps, reordered blocks) and you only want real changes</td>
</tr>
<tr>
<td>You are happy maintaining the selector yourself</td>
<td>You want AI extraction and no scraping logic to maintain</td>
</tr>
</tbody>
</table>
<p>If a <code>scrape</code> sensor already returns the value you need, keep using it. Bring in PageCrawl for the pages where it comes back empty, gets blocked, or needs constant selector fixes.</p>
<h3>What You Get</h3>
<div class="kb-figure">
  <img src="/images/knowledge/home-assistant-device.png" alt="A PageCrawl monitor as a Home Assistant device, showing its sensors, a Check now control, and diagnostic entities for last checked, last change date, and status">
</div>
<ul>
<li>One Home Assistant device per monitor.</li>
<li>One entity per tracked element on that monitor, typed correctly (numeric, on/off, text, or item counts).</li>
<li>Real-time push updates when a change is detected, with polling as a fallback.</li>
<li>A <strong>Check now</strong> button on every monitor, plus actions to create new monitors.</li>
<li>A <code>pagecrawl_change</code> event you can trigger automations from.</li>
<li>A choice of what to import (everything, selected folders, or selected monitors), with support for multiple workspaces.</li>
</ul>
<h3>Installation (HACS Custom Repository)</h3>
<p>The integration installs through HACS as a custom repository.</p>
<ol>
<li>In Home Assistant, open <strong>HACS</strong>.</li>
<li>Open the menu (top right) and choose <strong>Custom repositories</strong>.</li>
<li>Add the repository URL <code>https://github.com/pagecrawl/hass-pagecrawl</code> with category <strong>Integration</strong>, then add it.</li>
<li>Find <strong>PageCrawl</strong> in HACS, install it, and restart Home Assistant.</li>
</ol>
<p>If you are reading this on the device running Home Assistant, the button below opens the repository in HACS directly:</p>
<p><a href="https://my.home-assistant.io/redirect/hacs_repository/?owner=pagecrawl&amp;repository=hass-pagecrawl&amp;category=integration"><img src="https://my.home-assistant.io/badges/hacs_repository.svg" alt="Open your Home Assistant instance and open a repository inside the Home Assistant Community Store." /></a></p>
<h3>Connecting Your Account</h3>
<ol>
<li>
<p>Go to <strong>Settings &gt; Devices &amp; Services &gt; Add Integration</strong> and search for <strong>PageCrawl</strong>.</p>
<p><a href="https://my.home-assistant.io/redirect/config_flow_start/?domain=pagecrawl"><img src="https://my.home-assistant.io/badges/config_flow_start.svg" alt="Open your Home Assistant instance and start setting up a new integration." /></a></p>
</li>
<li>
<p>You are redirected to PageCrawl to sign in and authorize Home Assistant. There is no token to paste, and a free account is enough.</p>
</li>
<li>
<p>If your account has more than one workspace, pick the one to add. Each workspace becomes its own entry with its own devices and entities. To add another workspace later, run <strong>Add Integration</strong> again and choose a different one.</p>
</li>
</ol>
<h3>Choosing What to Import</h3>
<p>During setup you pick how much of the workspace to bring into Home Assistant:</p>
<ul>
<li><strong>All monitors (default):</strong> every monitor in the workspace becomes a device.</li>
<li><strong>Selected folders:</strong> only monitors in the folders you choose are imported.</li>
<li><strong>Selected monitors:</strong> you hand-pick the exact monitors to import.</li>
</ul>
<p>You can change this later in the integration's <strong>Configure</strong> screen. If you narrow the selection, the devices and entities for the de-selected monitors are removed automatically. Widening it again imports the newly in-scope monitors on the next update.</p>
<h3>Real-Time Updates vs Polling</h3>
<p>The update mode is set in the integration's <strong>Configure</strong> screen.</p>
<ul>
<li><strong>Auto (default):</strong> uses push when Home Assistant has a reachable URL, otherwise falls back to polling. The integration tells you which mode is active.</li>
<li><strong>Push and poll:</strong> forces push, with a slow reconciliation poll to catch any missed deliveries. Needs a reachable URL.</li>
<li><strong>Polling only:</strong> never registers a webhook and checks on the interval you set. Use this for local-only installs that cannot expose an endpoint.</li>
</ul>
<p>Push needs a URL that PageCrawl can reach from the internet. A Home Assistant Cloud (Nabu Casa) cloudhook is the recommended way to get one, and it is configured automatically. If no reachable URL is available, the integration falls back to polling. The poll interval has a 60 second minimum to respect rate limits.</p>
<h3>How Monitors Map to Entities</h3>
<p>Each monitor becomes a device, and each tracked element becomes one entity chosen by its type:</p>
<table>
<thead>
<tr>
<th>Element type</th>
<th>Entity</th>
<th>State</th>
</tr>
</thead>
<tbody>
<tr>
<td>Price</td>
<td>sensor (monetary)</td>
<td>numeric value</td>
</tr>
<tr>
<td>Number</td>
<td>sensor (measurement)</td>
<td>numeric value</td>
</tr>
<tr>
<td>Rating</td>
<td>sensor (measurement)</td>
<td>numeric value</td>
</tr>
<tr>
<td>Reviews</td>
<td>sensor (measurement)</td>
<td>numeric value</td>
</tr>
<tr>
<td>HTTP status</td>
<td>sensor</td>
<td>numeric status code</td>
</tr>
<tr>
<td>Boolean</td>
<td>binary sensor</td>
<td>on when truthy</td>
</tr>
<tr>
<td>Availability</td>
<td>binary sensor</td>
<td>on when in stock</td>
</tr>
<tr>
<td>Text, Full Page, HTML, AI Extract, and similar</td>
<td>sensor</td>
<td>text value (full value in an attribute when truncated)</td>
</tr>
<tr>
<td>Links, Feed, Leaderboard, and other lists</td>
<td>sensor</td>
<td>item count (items in an attribute)</td>
</tr>
</tbody>
</table>
<p>Every monitor also gets diagnostic entities (status, last checked, last change date, change percent), so a device is never empty even if its element types are unrecognized. Common details such as the URL, status, change percent, and diff and screenshot links are exposed as attributes on the primary sensor.</p>
<p>Each monitor also gets a few per-monitor sensors that describe its latest change:</p>
<ul>
<li><strong>Last change:</strong> a short, human-readable summary of what changed at the last check (the full text is available in an attribute).</li>
<li><strong>AI summary:</strong> the AI summary of the latest change. It appears only when AI analysis is enabled on that monitor.</li>
<li><strong>AI priority:</strong> a diagnostic score for how important the latest change is. It appears only when AI analysis is enabled on that monitor.</li>
</ul>
<h3>Actions</h3>
<p>The integration provides two actions you can call from automations, scripts, or the Developer Tools:</p>
<ul>
<li><strong>Check now</strong> (<code>pagecrawl.check_now</code>): trigger an immediate check of one or more monitors, then refresh their entities. Target any entity or device that belongs to the monitor, or name the monitor directly by <code>slug</code> or <code>monitor_id</code>:</li>
</ul>
<pre><code class="language-yaml">service: pagecrawl.check_now
data:
  slug: openai-about</code></pre>
<ul>
<li><strong>Track a new page</strong> (<code>pagecrawl.track_page</code>): create a new monitor from a URL, name, and tracking mode (for example <code>price</code> or <code>ai_extract</code>). Its device and entities appear after the next refresh. If you have more than one workspace, add the entry to choose which one it is created in.</li>
</ul>
<h3>Automations</h3>
<p>The integration fires a <code>pagecrawl_change</code> event whenever a monitor's latest change advances, so you can react to it in automations.</p>
<pre><code class="language-yaml">alias: Notify on PageCrawl change
trigger:
  - platform: event
    event_type: pagecrawl_change
action:
  - service: notify.notify
    data:
      title: "PageCrawl: {{ trigger.event.data.name }}"
      message: &gt;-
        {{ trigger.event.data.human_difference }}
        {{ trigger.event.data.diff_url }}</code></pre>
<p>Event data includes the monitor name, URL, slug, status, the change contents and difference, a human-readable summary, a diff link, and a timestamp. When AI analysis is enabled on the monitor, the event also carries the AI summary and an AI priority score, so you can filter and route changes straight from the event without looking up a per-monitor sensor.</p>
<h3>Editing and Removing Monitors</h3>
<p>The integration can create monitors and read and check them, but editing and deleting monitors is done in the PageCrawl web app. Changes you make there are reflected in Home Assistant on the next update.</p>
<h3>Related</h3>
<ul>
<li><a href="/help/integrations/article/webhook-integration">Webhook Integration</a> explains the underlying change payloads, which power the real-time push updates.</li>
<li>See the <a href="/developers">developer documentation</a> for the full API and webhook reference.</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-18T04:36:57+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[Advanced Integrations: Build a Custom Integration in Python, Node.js, or PHP]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/tutorials/article/reference-implementations" />
            <id>https://pagecrawl.io/111</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>Advanced Integrations: Build a Custom Integration in Python, Node.js, or PHP</h1>
<p>This guide shows three ways to connect your own application to PageCrawl and provides working code for each in Python, Node.js, and PHP. These are the same patterns the official Home Assistant integration uses, distilled into minimal examples you can adapt.</p>
<div class="kb-figure">
  <img src="/images/knowledge/developers-api-reference.png" alt="Interactive API reference at pagecrawl.io/developers with endpoints and code examples to build a custom integration">
</div>
<p>Pick the pattern that fits your needs:</p>
<ul>
<li><strong>Polling</strong> is the simplest. You read the API on a timer. Best for dashboards and reports that do not need instant updates.</li>
<li><strong>Webhooks (push)</strong> deliver changes to your server the moment they happen. Best for real-time automation and alerting.</li>
<li><strong>Hybrid</strong> combines a webhook for instant updates with a slow reconcile poll that catches anything missed. This is the most robust option and what the Home Assistant integration runs.</li>
</ul>
<h3>Authentication</h3>
<p>All API requests use a bearer token in the <code>Authorization</code> header:</p>
<pre><code>Authorization: Bearer YOUR_TOKEN</code></pre>
<p>You can use an API token (Settings &gt; API) or an OAuth access token. Free accounts can use the API. Treat the token like a password and keep it server-side.</p>
<h3>Rate Limits</h3>
<ul>
<li>Free accounts: 60 requests per minute.</li>
<li>Paid accounts: 300 requests per minute.</li>
</ul>
<p>When you exceed the limit the API responds with HTTP <code>429</code>. Honor the <code>Retry-After</code> response header (seconds to wait) before retrying. Choose a poll interval that stays well under your limit, especially if you paginate across many monitors.</p>
<h3>Polling</h3>
<p>Poll <code>GET /api/pages?simple=1</code> on an interval. Each page object includes a <code>latest</code> snapshot and a <code>checks</code> array. Read <code>latest.contents</code> for the primary tracked element, and read per-element values from <code>checks[0].elements</code>, keyed by <code>element_id</code> so each value maps to a stable tracked element in your own system. Use pagination if your workspace returns multiple pages of results.</p>
<p><strong>Python</strong></p>
<pre><code class="language-python">import time
import requests

BASE = "https://pagecrawl.io"
TOKEN = "YOUR_TOKEN"
SESSION = requests.Session()
SESSION.headers["Authorization"] = f"Bearer {TOKEN}"


def fetch_pages():
    """Fetch all monitors, following pagination and honoring 429."""
    pages, url = [], f"{BASE}/api/pages?simple=1"
    while url:
        resp = SESSION.get(url, timeout=30)
        if resp.status_code == 429:
            wait = int(resp.headers.get("Retry-After", "5"))
            time.sleep(wait)
            continue
        resp.raise_for_status()
        body = resp.json()
        pages.extend(body.get("data", []))
        url = body.get("links", {}).get("next")
    return pages


def poll_once():
    for page in fetch_pages():
        latest = page.get("latest") or {}
        print(page["id"], page.get("title"), "-&gt;", latest.get("contents"))

        checks = page.get("checks") or []
        elements = checks[0].get("elements", []) if checks else []
        for el in elements:
            # element_id is stable across every check; use it as your key.
            print("  ", el.get("element_id"), el.get("label"), el.get("contents"))


if __name__ == "__main__":
    while True:
        poll_once()
        time.sleep(300)  # stay well under the rate limit</code></pre>
<p><strong>Node.js</strong></p>
<pre><code class="language-js">const BASE = "https://pagecrawl.io";
const TOKEN = "YOUR_TOKEN";
const HEADERS = { Authorization: `Bearer ${TOKEN}` };

const sleep = (ms) =&gt; new Promise((r) =&gt; setTimeout(r, ms));

async function fetchPages() {
  const pages = [];
  let url = `${BASE}/api/pages?simple=1`;
  while (url) {
    const resp = await fetch(url, { headers: HEADERS });
    if (resp.status === 429) {
      const wait = parseInt(resp.headers.get("Retry-After") || "5", 10);
      await sleep(wait * 1000);
      continue;
    }
    if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
    const body = await resp.json();
    pages.push(...(body.data || []));
    url = body.links?.next || null;
  }
  return pages;
}

async function pollOnce() {
  for (const page of await fetchPages()) {
    const latest = page.latest || {};
    console.log(page.id, page.title, "-&gt;", latest.contents);

    const elements = page.checks?.[0]?.elements || [];
    for (const el of elements) {
      // element_id is stable across every check; use it as your key.
      console.log("  ", el.element_id, el.label, el.contents);
    }
  }
}

async function main() {
  while (true) {
    await pollOnce();
    await sleep(300_000); // stay well under the rate limit
  }
}

main();</code></pre>
<h3>Webhooks (Push)</h3>
<p>Create a hook so PageCrawl POSTs to your server the instant a change is detected, then verify every delivery.</p>
<p><strong>1. Create the hook</strong></p>
<pre><code>POST /api/hooks
Authorization: Bearer YOUR_TOKEN
Content-Type: application/json

{
  "target_url": "https://your-server.example.com/pagecrawl",
  "match_type": "all",
  "event_type": "change_detected"
}</code></pre>
<p>The response includes a <code>signing_secret</code>. Store it securely. You will use it to verify deliveries. (You can also create hooks in the UI under Settings &gt; API &gt; Webhooks.)</p>
<p><strong>2. Verify each delivery</strong></p>
<p>Every webhook includes two headers:</p>
<ul>
<li><code>X-PageCrawl-Signature: sha256=&lt;hmac&gt;</code></li>
<li><code>X-PageCrawl-Timestamp: &lt;unix&gt;</code></li>
</ul>
<p>The HMAC is <code>HMAC_SHA256(signing_secret, "{timestamp}.{body}")</code> where <code>{body}</code> is the exact raw request body. Compute the same value, compare it in constant time, and reject deliveries whose timestamp is too old (to prevent replay). Always verify against the raw bytes, not a re-serialized object.</p>
<p><strong>Python</strong></p>
<pre><code class="language-python">import hashlib
import hmac
import time

MAX_AGE = 300  # seconds


def verify_signature(secret: str, timestamp: str, raw_body: bytes, header: str) -&gt; bool:
    if not secret or not timestamp or not header:
        return False
    try:
        ts = int(timestamp)
    except (TypeError, ValueError):
        return False
    if abs(time.time() - ts) &gt; MAX_AGE:
        return False  # stale, possible replay

    expected = hmac.new(
        secret.encode("utf-8"),
        f"{timestamp}.".encode("utf-8") + raw_body,
        hashlib.sha256,
    ).hexdigest()

    provided = header[len("sha256="):] if header.startswith("sha256=") else header
    return hmac.compare_digest(expected, provided)</code></pre>
<p>A minimal Flask receiver:</p>
<pre><code class="language-python">from flask import Flask, request, abort

app = Flask(__name__)
SIGNING_SECRET = "YOUR_SIGNING_SECRET"


@app.post("/pagecrawl")
def receive():
    sig = request.headers.get("X-PageCrawl-Signature")
    ts = request.headers.get("X-PageCrawl-Timestamp")
    if not verify_signature(SIGNING_SECRET, ts, request.get_data(), sig):
        abort(401)
    payload = request.get_json()
    print("change on", payload.get("id"), payload.get("short_summary"))
    return "", 204</code></pre>
<p><strong>Node.js</strong></p>
<pre><code class="language-js">const crypto = require("crypto");
const express = require("express");

const SIGNING_SECRET = "YOUR_SIGNING_SECRET";
const MAX_AGE = 300; // seconds

function verifySignature(secret, timestamp, rawBody, header) {
  if (!secret || !timestamp || !header) return false;
  const ts = parseInt(timestamp, 10);
  if (Number.isNaN(ts)) return false;
  if (Math.abs(Date.now() / 1000 - ts) &gt; MAX_AGE) return false; // stale

  const expected = crypto
    .createHmac("sha256", secret)
    .update(`${timestamp}.${rawBody}`)
    .digest("hex");

  const provided = header.startsWith("sha256=") ? header.slice(7) : header;
  const a = Buffer.from(expected);
  const b = Buffer.from(provided);
  return a.length === b.length &amp;&amp; crypto.timingSafeEqual(a, b);
}

const app = express();
// Capture the raw body exactly as received so the HMAC matches.
app.use(express.raw({ type: "*/*" }));

app.post("/pagecrawl", (req, res) =&gt; {
  const sig = req.get("X-PageCrawl-Signature");
  const ts = req.get("X-PageCrawl-Timestamp");
  const raw = req.body.toString("utf8");
  if (!verifySignature(SIGNING_SECRET, ts, raw, sig)) {
    return res.sendStatus(401);
  }
  const payload = JSON.parse(raw);
  console.log("change on", payload.id, payload.short_summary);
  res.sendStatus(204);
});

app.listen(8080);</code></pre>
<p><strong>PHP</strong></p>
<pre><code class="language-php">&lt;?php

function verify_signature(string $secret, ?string $timestamp, string $rawBody, ?string $header): bool
{
    $maxAge = 300; // seconds
    if ($secret === '' || $timestamp === null || $header === null) {
        return false;
    }
    if (! ctype_digit($timestamp)) {
        return false;
    }
    if (abs(time() - (int) $timestamp) &gt; $maxAge) {
        return false; // stale, possible replay
    }

    $expected = hash_hmac('sha256', "{$timestamp}.{$rawBody}", $secret);
    $provided = str_starts_with($header, 'sha256=') ? substr($header, 7) : $header;

    return hash_equals($expected, $provided);
}

$signingSecret = 'YOUR_SIGNING_SECRET';
$rawBody = file_get_contents('php://input');
$sig = $_SERVER['HTTP_X_PAGECRAWL_SIGNATURE'] ?? null;
$ts = $_SERVER['HTTP_X_PAGECRAWL_TIMESTAMP'] ?? null;

if (! verify_signature($signingSecret, $ts, $rawBody, $sig)) {
    http_response_code(401);
    exit;
}

$payload = json_decode($rawBody, true);
error_log('change on '.$payload['id'].' '.($payload['short_summary'] ?? ''));
http_response_code(204);</code></pre>
<h3>Hybrid (Push Plus Reconcile)</h3>
<p>The most robust integration uses a webhook for instant updates and a slow background poll that reconciles state. The webhook keeps you current in real time. The reconcile poll catches anything a webhook might miss (for example if your server was briefly offline) and refreshes monitors that did not change. This is the model the Home Assistant integration runs: push updates the in-memory snapshot, and a slow loop re-fetches the full list on a long interval.</p>
<p><strong>Python (sketch)</strong></p>
<pre><code class="language-python">import threading
import time

state = {}  # element_id -&gt; latest value, shared between push and poll
lock = threading.Lock()


def on_webhook(payload):
    """Called from your verified webhook receiver. Instant update."""
    with lock:
        for el in payload.get("page_elements", []):
            state[el["element_id"]] = el.get("contents")


def reconcile_loop():
    """Slow safety net. Re-reads everything on a long interval."""
    while True:
        for page in fetch_pages():  # from the polling example above
            checks = page.get("checks") or []
            for el in (checks[0].get("elements", []) if checks else []):
                with lock:
                    state[el["element_id"]] = el.get("contents")
        time.sleep(3600)  # reconcile hourly; the webhook handles real time


threading.Thread(target=reconcile_loop, daemon=True).start()</code></pre>
<p><strong>Node.js (sketch)</strong></p>
<pre><code class="language-js">const state = new Map(); // element_id -&gt; latest value

function onWebhook(payload) {
  // Called from your verified webhook receiver. Instant update.
  for (const el of payload.page_elements || []) {
    state.set(el.element_id, el.contents);
  }
}

async function reconcileLoop() {
  // Slow safety net. Re-reads everything on a long interval.
  while (true) {
    for (const page of await fetchPages()) {
      // fetchPages from the polling example
      for (const el of page.checks?.[0]?.elements || []) {
        state.set(el.element_id, el.contents);
      }
    }
    await new Promise((r) =&gt; setTimeout(r, 3_600_000)); // reconcile hourly
  }
}

reconcileLoop();</code></pre>
<p>Keep the reconcile interval long (hourly or slower) so the webhook does the real-time work and the poll stays comfortably within your rate limit.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/help/integrations/article/webhook-integration">Webhook Integration</a> - Full webhook payload reference, including the <code>page_elements</code> array and <code>element_id</code></li>
<li><a href="/developers">Full API Reference</a> - Interactive OpenAPI reference for every endpoint</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-15T22:16:51+00:00</updated>
        </entry>
            <entry>
            <title><![CDATA[PageCrawl.io + Make.com integration]]></title>
            <link rel="alternate" href="https://pagecrawl.io/help/integrations/article/pagecrawl-make-integration" />
            <id>https://pagecrawl.io/112</id>
            <author>
                <name><![CDATA[PageCrawl.io]]></name>
            </author>
            <summary type="html">
                <![CDATA[<h1>PageCrawl.io + Make.com integration</h1>
<div class="kb-figure">
  <img src="/images/knowledge/integrations-overview.png" alt="PageCrawl Integrations settings showing available connections">
</div>
<p>The native PageCrawl app for Make.com lets you start a scenario the moment PageCrawl detects a change, and create new monitors from inside any scenario. You connect once with a single click (OAuth, no API token to paste), then build scenarios that route changes into thousands of apps.</p>
<h3>Native app vs custom webhook</h3>
<p>There are two ways to use PageCrawl with Make.com:</p>
<ul>
<li><strong>Native app (recommended):</strong> Connect with OAuth and use the <strong>Watch Changes</strong> instant trigger. PageCrawl registers the webhook for you, so there is no URL to copy or data structure to redetermine. You can also use the <strong>Create Monitor</strong> action.</li>
<li><strong>Custom webhook:</strong> Add a PageCrawl webhook notification by hand and receive it with Make.com's Custom Webhook module. This still works and is covered in the <a href="/blog/make-com-website-monitoring-automation">Make.com automation guide</a>, but the native app is simpler to set up and maintain.</li>
</ul>
<h3>What the app provides</h3>
<ul>
<li><strong>Watch Changes</strong> (instant trigger) - fires the moment a change is detected, for a single monitor or every monitor in the workspace.</li>
<li><strong>Create Monitor</strong> (action) - create a new monitor for a URL, with the track type, frequency, folder, labels, and notifications you choose.</li>
<li><strong>List Monitors</strong> (search) - look up monitors to reference in later modules.</li>
</ul>
<h3>Setting up the integration</h3>
<h4>Step 1: Sign in to PageCrawl.io</h4>
<p>If you are not already a PageCrawl.io user, sign up for an account. The free tier is enough to build and test scenarios.</p>
<h4>Step 2: Configure a page to monitor</h4>
<p>Set up the page you want to track, choosing the elements to watch and your notification preferences.</p>
<h4>Step 3: Add the PageCrawl app in Make.com</h4>
<ol>
<li>In Make.com, create a new scenario and add a module.</li>
<li>Search for <strong>PageCrawl</strong> and add the <strong>Watch Changes</strong> trigger.</li>
<li>Click <strong>Create a connection</strong>. You are redirected to PageCrawl.io to authorize Make.com. There is no token to paste.</li>
<li>Choose the monitor to watch, or leave it empty to watch every monitor in the workspace.</li>
</ol>
<h4>Step 4: Build the scenario</h4>
<p>Add the modules that should run when a change is detected: post to Slack, append a row to Google Sheets, open a Jira issue, or any of the thousands of apps Make.com supports. Reference PageCrawl fields like <strong>AI summary</strong>, <strong>Short summary</strong>, <strong>Page URL</strong>, and <strong>Screenshot URL</strong> directly in those modules.</p>
<h4>Step 5: Activate</h4>
<p>Save the scenario and turn it on. PageCrawl pushes each detected change to Make.com in real time.</p>
<h3>The trigger payload</h3>
<p>The <strong>Watch Changes</strong> trigger delivers the same payload as a PageCrawl webhook. The fields you will use most often:</p>
<ul>
<li><strong><code>title</code></strong> - the monitor name.</li>
<li><strong><code>short_summary</code></strong> and <strong><code>ai_summary</code></strong> - human-readable descriptions of what changed. <code>ai_summary</code> requires AI to be enabled for the workspace.</li>
<li><strong><code>ai_priority_score</code></strong> - importance score you can filter on.</li>
<li><strong><code>page.url</code></strong> and <strong><code>page.link</code></strong> - the monitored URL and the PageCrawl page link.</li>
<li><strong><code>page_screenshot_image</code></strong> - a link to the screenshot for visual verification.</li>
<li><strong><code>markdown_difference</code></strong> - the change as a readable diff.</li>
<li><strong><code>changed_at</code></strong> - ISO 8601 timestamp.</li>
</ul>
<p>For the full field reference, see <a href="/help/features/article/api-webhooks-for-custom-integrations">API &amp; Webhooks</a>.</p>
<h3>Related Articles</h3>
<ul>
<li><a href="/blog/make-com-website-monitoring-automation">Make.com automation guide</a> - Scenarios, patterns, and the custom-webhook approach</li>
<li><a href="/help/integrations/article/pagecrawl-zapier-integration">Zapier Integration</a> - Automate with Zaps</li>
<li><a href="/help/integrations/article/pagecrawl-n8n-integration">n8n Integration</a> - Open-source workflow automation</li>
<li><a href="/help/features/article/api-webhooks-for-custom-integrations">API &amp; Webhooks</a> - Programmatic access and full payload reference</li>
</ul>]]>
            </summary>
                                    <updated>2026-06-21T12:06:10+00:00</updated>
        </entry>
    </feed>
