Help Center
Topic: Reduce false positives
Complete Guide to Reducing False Positive Notifications When Monitoring Websites for Changes
False positive notifications can be frustrating when monitoring websites. These alerts signal changes that are either irrelevant or nonexistent, leading to wasted time and reduced efficiency.
When using PageCrawl.io to monitor website changes, the rate of false-positive alerts is typically low if pages are correctly configured. However, some detected changes may not be relevant to your specific monitoring needs. This comprehensive guide will show you how to effectively reduce unnecessary alerts and ensure you only receive notifications for meaningful changes.
1. Choose the Right Element to Track
Selecting the wrong type of element to monitor is one of the most common causes of false positives. With multiple monitoring options available, it's easy to get overwhelmed, especially if you're new to website monitoring.
Getting Started
Begin by tracking the text of the full page. This approach works best as a starting point for most monitoring scenarios, particularly when you need to monitor a large number of websites. If you notice frequent false positives, you can always revisit your setup and focus on specific page sections instead.
Optimizing Full-Page Text Tracking
Monitoring Content Only is the first step to reduce false positives. This option filters out common page elements like headers, navigation menus, sidebars, and footers, focusing only on the main content area of the page. It's an effective way to eliminate noise from less relevant sections while still capturing most important content changes.
Reader mode takes content filtering a step further, similar to the reader mode you may have used on your phone. This mode monitors only the primary article text, using advanced algorithms to identify and extract the core content while filtering out everything else.
Reader mode is more restrictive than "Content Only" and works best for:
- News articles and blog posts with clear article structure
- Documentation pages with structured content
- Research papers and academic content
- Press releases and announcements
- Tutorial and how-to articles
- Terms of service and privacy policy pages
- Legal documents and policy updates
However, Reader mode may not work well on:
- Landing pages with mixed content types
- E-commerce product pages with specifications, reviews, and pricing
- Dashboard pages with multiple data sections
- Pages with pricing tables, feature lists, or comparison charts
- Forum discussions or comment sections
- Complex layouts with multiple content blocks
Note: If you find that important content changes are being missed, consider switching back to "Content Only" for broader coverage.
When to Be More Selective
If tracking “Content Only” or "Reader mode" still results in unnecessary notifications, switch to the "Text" tracked element type and use our "Visual Selector" (click on the blue button) to pinpoint the exact area you want to monitor. Be aware that significant page redesigns can cause these selectors to stop working.
Advanced Tips:
- AI Suggest feature: You may use "AI Suggest" when adding a new page to monitor. Describe what you want to monitor (e.g., "product price" or "availability status"), and PageCrawl.io's AI will suggest an optimal monitoring configuration for you.
- Manual selectors: For maximum precision, manually create CSS or XPath selectors to track specific sections of the page. This approach works best for users with a technical background, but you can also use tools like ChatGPT to craft selectors by pasting the relevant HTML code.
2. Filter Out Irrelevant Updates
Websites frequently undergo minor updates, such as date changes, without substantial alterations to their content. These small updates can create unnecessary alerts that distract from meaningful changes. Here's how to avoid them.
Ignore Repeatedly Changing Text
In Timeline, when reviewing detected changes, you can select irrelevant text and ignore any line that contains the selected text. For example, if a page has a section with a latest news headline like "Latest News: Bitcoin has reached a new all-time high," you can select "Latest News" and all lines containing this text will be ignored in future change detections. If you monitor multiple pages on the same website, this will be applied to all pages with the same domain name.
Alternatively, you can add an "Ignore Text" condition or create a global filter (update your team settings) to ignore it across all pages. Use % as a wildcard to indicate that any line containing a %specific word% or sentence should be ignored.
Remove Specific Page Elements
If a specific page area keeps triggering change detections, add a "Remove page element" action and select an area to suppress it completely.
Remove Dates
Use the "Remove dates" action to replace dates with placeholders like [DATE REMOVED]. This prevents alerts for irrelevant updates like "updated 3 minutes ago" or publication timestamps such as "Updated at: 2025-02-25" that change frequently even when nothing was updated on the page.
Set a Change Threshold
You can configure a threshold to be alerted only when significant changes occur (e.g., when more than 1% of the page content changes). Before setting the threshold, review historic changes in Timeline to avoid setting it too high and missing important updates.
Ignore Numbers
If numeric changes aren't relevant to you, you can add an action to ignore all numbers from triggering change detections. This is particularly useful for pages with counters, view counts, or other metrics that change frequently.
3. Handling Dynamic Content
Dynamic websites load or update parts of their content after the initial page load. For example, prices, stock availability, or user-specific recommendations might load dynamically, leading to unnecessary notifications. Here's how to handle these scenarios.
Expand Collapsed Sections and Hidden Content
PageCrawl.io only captures text that is visible when in "Full-page text" mode. This can be problematic if the page contains collapsible sections (accordions, panels, etc.) that are only revealed when clicked.
To address this, add the "Reveal hidden text" action, which will automatically expand any collapsed sections on the page before capturing content.
Wait Until Page is Fully Loaded
PageCrawl.io waits until the page is fully loaded. However, in some situations, certain page elements only appear after additional time or after specific actions are executed (clicking, form submission, redirects, etc.).
You can add wait actions to ensure the page is completely ready before capturing content. Multiple "Wait" actions are available:
- "Wait for Text to appear": Waits until specific text appears on the page.
- "Wait for Text to disappear": Waits until specific text disappears from the page.
- "Wait for Element to appear": Waits for a specific page element to become visible.
- "Wait for Redirect": Waits for page redirects to complete. This is especially helpful when redirects are not immediate and take longer to process.
- "Wait for Seconds": Waits between 1 to 9 seconds (least recommended option).
Note: Actions will wait up to 15 seconds before continuing. To avoid unnecessarily long wait times, different subscription tiers have varying timeout limits: Free (45 seconds), Standard (90 seconds), Enterprise (180 seconds). If loading takes longer than the timeout limit, the page will result in a timeout error.
4. Changes in Headers, Footers, and Sidebars
Frequently updated areas like footers, headers, and sidebars can result in irrelevant notifications. These sections often include changing elements such as timestamps, menus, or recent updates that are unrelated to the main content.
How to Avoid This
- Switch to "Content Only": When tracking the full page, this option automatically filters out these less important areas. Change the Element from "Everything on the page" to "Content Only."
- Remove Specific Elements: Use the "Remove Elements" action with the selector
header,nav,aside,footerto exclude them. This directly alters the page, and these areas will not be visible in screenshots. You may want to use this approach when using a Tracked Element other than "Full page text." - Focus on the Main Section: Track only the main content using the "Text" tracked element and the
mainselector. If no such element exists (e.g., the website is not semantically structured), you will see a "No selector found" error.
5. Page Errors or Blank Content
Occasionally, a monitored page may fail to load properly, leading to blank content or error messages. While PageCrawl.io detects these situations in most cases, it can still trigger false positives. This often happens when a website doesn't report errors properly, relies on external data sources that fail to load, or when dynamic content is not displayed correctly.
How to Avoid This
Use the "Mark Check as Failed When" action to flag a page as failed without recording changes. For example:
- If a product's price unexpectedly drops to $0 due to an error and a message such as "Not available" is shown, PageCrawl.io can mark the page as failed instead of notifying you about a false change from $9.99 to $0.00.
- Add "Mark Check as Failed When" with "Text Contains" set to "Not available"
Additionally, customize the "Report Errors" setting to trigger only after a certain number of consecutive failures (e.g., after 10 consecutive failed checks) to avoid being overwhelmed by temporary issues.
If you check pages frequently, ensure the "Delay when Failed" setting is deactivated (in Advanced preferences) to prevent page failures from reducing the page-checking frequency.
6. Appearing/Disappearing Content
Websites may display varying content based on user sessions, location, or elements that frequently appear and disappear. This can lead to false positive notifications.
Once sufficient sample data is collected, the monitored page overview may provide suggested settings or filters to help you reduce these false triggers.
Potential Solutions
- Use the "Ignore text" filter to filter out text that frequently appears and disappears by using the "Conditions/Filters" and "Ignore text" options.
- If text appears and disappears very often, you may see "Text lines that appear and disappear frequently" shown in page details. You can click on sentences to ignore them from triggering notifications.
- Add "Ignore numbers" filter if you are not interested in numeric changes.
- Ensure the page is fully loaded: In some cases, the page may not have loaded completely, causing issues. You can add an "Action" to wait for a few more seconds until specific text or elements appear on the page before capturing page contents.
- Consider deactivating "Intelligent Reconnect" if the page content changes depending on the user's location or session (e.g., different regions showing varied layouts). This setting can be found under Advanced Preferences to improve monitoring accuracy.
7. Cookie Banners and Overlay Popups (Default Settings)
By default, PageCrawl.io enables "Block cookie banners and ads" and "Hide website overlays and popups" actions to reduce unnecessary notifications. However, you can disable these settings if not needed.
Cookie Banners
Cookie banners often appear dynamically after the page loads, altering the content and triggering false positives.
- Default Setting: Cookie banners are automatically suppressed during monitoring.
- Optional: You can disable this feature in your settings if necessary.
Overlay Popups
Overlay popups, such as ads or newsletter subscription prompts, may appear sporadically and interfere with accurate monitoring.
- Default Setting: PageCrawl.io hides overlay popups by default to ensure they don’t trigger false positives.
- Optional: This feature can also be turned off if not required.
These default settings simplify the monitoring process but can be adjusted based on your specific needs.
8. Scroll-Triggered Content
Sometimes pages use animations to reveal content sections that only appear as you scroll down the page.
Solutions
- Use the "Scroll to Bottom" action to automatically scroll to the bottom of the page before capturing content.
- Use the "Disable JavaScript" action which will likely disable all animations. Note that this may cause issues with loading dynamic content on some websites.
Conclusion
By implementing these strategies, you can significantly reduce false positive notifications when monitoring websites with PageCrawl.io. These techniques ensure that you receive notifications only for meaningful changes while minimizing wasted time on irrelevant alerts.
Remember to:
- Start with broad monitoring (full page text) and refine as needed
- Regularly review your settings and filters
- Use the suggested actions when they appear
- Test different approaches to find what works best for your specific use case
With proper configuration and ongoing fine-tuning, you'll achieve efficient and reliable website change monitoring.
If you're still experiencing issues with false positives after trying these solutions, don't hesitate to contact our support team for personalized assistance with your specific monitoring setup.
Topics
Get Started with PageCrawl.io Software
Ready to track changes on your websites? Set up monitoring in under 60 seconds and never miss important updates again.
