PageCrawl can automatically create a full web archive of your monitored pages every time a change is detected. Archives capture the complete page (HTML, CSS, images, scripts) so you can replay it exactly as it appeared at that moment.
Archives are saved in the WACZ (Web Archive Collection Zipped) format, an open standard for web archiving used by libraries, governments, and legal teams worldwide.
Available on Ultimate plan.
How It Works
- PageCrawl detects a change on a monitored page
- A full WACZ archive is created capturing the complete page state
- The archive is stored securely in the cloud
- You can replay the archived page at any time from the change history
If WACZ generation fails (e.g., due to complex page structure), PageCrawl falls back to creating a self-contained HTML archive instead.
How Archives Differ from Screenshots
PageCrawl offers both screenshots and web archives, but they serve different purposes:
| Screenshot | Web Archive (WACZ) | |
|---|---|---|
| What it captures | A flat image of the visible page | The complete page: HTML, CSS, JavaScript, images, fonts |
| Interactivity | None (static image) | Fully interactive: scroll, click links, hover over elements |
| Content below the fold | Only if full-page screenshot is enabled | Always included, the entire page is preserved |
| Dynamic content | Shows one visual state | Preserves interactive elements, dropdowns, tabs |
| File size | Small (typically under 1 MB) | Larger (includes all page assets) |
| Best for | Quick visual reference, visual diff comparison | Legal evidence, compliance records, full preservation |
Screenshots are great for a quick visual snapshot and for visual change detection (highlighting pixel differences). Web archives go further by preserving the entire page so you can interact with it later exactly as it appeared.
How PageCrawl Archives Differ from Archive.org
The Internet Archive (archive.org) and PageCrawl both preserve web pages, but they work very differently:
Archive.org (Wayback Machine):
- Public, community-driven project that crawls the open web
- Snapshots are taken on their own schedule (often weeks or months apart)
- No control over when or how often pages are archived
- Pages behind logins, paywalls, or bot protection are usually not captured
- Anyone can view the archived pages
- No change detection or notifications
PageCrawl Web Archiving:
- Private to your account, stored securely in the cloud
- Archives are created automatically every time a change is detected
- You control the check frequency (every 5 minutes to daily)
- Works with pages behind logins using browser actions (click, type, wait)
- Works with pages behind Cloudflare or other bot protection
- Archives are paired with change detection, so you know exactly what changed and when
- Download WACZ files for offline storage or legal use
In short, archive.org is best for general public web preservation. PageCrawl archiving is designed for active monitoring where you need precise, private, frequent snapshots tied to detected changes.
Viewing Archives
To view an archived page:
- Open a monitored page and go to its change history
- Click on any check that has an archive (indicated by an archive icon)
- The archive viewer opens, showing the page exactly as it appeared
- Use the previous/next arrows to browse between archived versions
The viewer uses ReplayWeb.page to render WACZ archives interactively in your browser. You can scroll, click links, and interact with the page as if you were browsing it live at that point in time.
Downloading Archives
You can download any archive file directly:
- Open the archive viewer for the check you want
- Click the download button to save the WACZ file
- Open it with any WACZ-compatible viewer (ReplayWeb.page, Webrecorder, etc.)
Downloaded archives can be used for legal evidence, compliance records, or offline browsing.
Use Cases
- Legal and compliance - Preserve evidence of website content at specific dates for disputes, contracts, or regulatory compliance
- Competitive intelligence - Keep a historical record of competitor pages, pricing, and product offerings
- Content auditing - Track how your own website evolves over time with complete snapshots
- Journalism - Archive source pages to preserve evidence that may be modified or removed
Enabling Archives
Archives are enabled at the workspace level. Contact support or check your workspace settings to enable archiving for your monitored pages.
