Web Archiving with WACZ: Preserve Full Page Snapshots

PageCrawl can automatically create a full web archive of your monitored pages every time a change is detected. Archives capture the complete page (HTML, CSS, images, scripts) so you can replay it exactly as it appeared at that moment.

Archives are saved in the WACZ (Web Archive Collection Zipped) format, an open standard for web archiving used by libraries, governments, and legal teams worldwide.

Available on Ultimate plan.

How It Works

  1. PageCrawl detects a change on a monitored page
  2. A full WACZ archive is created capturing the complete page state
  3. The archive is stored securely in the cloud
  4. You can replay the archived page at any time from the change history

If WACZ generation fails (e.g., due to complex page structure), PageCrawl falls back to creating a self-contained HTML archive instead.

How Archives Differ from Screenshots

PageCrawl offers both screenshots and web archives, but they serve different purposes:

Screenshot Web Archive (WACZ)
What it captures A flat image of the visible page The complete page: HTML, CSS, JavaScript, images, fonts
Interactivity None (static image) Fully interactive: scroll, click links, hover over elements
Content below the fold Only if full-page screenshot is enabled Always included, the entire page is preserved
Dynamic content Shows one visual state Preserves interactive elements, dropdowns, tabs
File size Small (typically under 1 MB) Larger (includes all page assets)
Best for Quick visual reference, visual diff comparison Legal evidence, compliance records, full preservation

Screenshots are great for a quick visual snapshot and for visual change detection (highlighting pixel differences). Web archives go further by preserving the entire page so you can interact with it later exactly as it appeared.

How PageCrawl Archives Differ from Archive.org

The Internet Archive (archive.org) and PageCrawl both preserve web pages, but they work very differently:

Archive.org (Wayback Machine):

  • Public, community-driven project that crawls the open web
  • Snapshots are taken on their own schedule (often weeks or months apart)
  • No control over when or how often pages are archived
  • Pages behind logins, paywalls, or bot protection are usually not captured
  • Anyone can view the archived pages
  • No change detection or notifications

PageCrawl Web Archiving:

  • Private to your account, stored securely in the cloud
  • Archives are created automatically every time a change is detected
  • You control the check frequency (every 5 minutes to daily)
  • Works with pages behind logins using browser actions (click, type, wait)
  • Works with pages behind Cloudflare or other bot protection
  • Archives are paired with change detection, so you know exactly what changed and when
  • Download WACZ files for offline storage or legal use

In short, archive.org is best for general public web preservation. PageCrawl archiving is designed for active monitoring where you need precise, private, frequent snapshots tied to detected changes.

Viewing Archives

To view an archived page:

  1. Open a monitored page and go to its change history
  2. Click on any check that has an archive (indicated by an archive icon)
  3. The archive viewer opens, showing the page exactly as it appeared
  4. Use the previous/next arrows to browse between archived versions

The viewer uses ReplayWeb.page to render WACZ archives interactively in your browser. You can scroll, click links, and interact with the page as if you were browsing it live at that point in time.

Downloading Archives

You can download any archive file directly:

  1. Open the archive viewer for the check you want
  2. Click the download button to save the WACZ file
  3. Open it with any WACZ-compatible viewer (ReplayWeb.page, Webrecorder, etc.)

Downloaded archives can be used for legal evidence, compliance records, or offline browsing.

Use Cases

  • Legal and compliance - Preserve evidence of website content at specific dates for disputes, contracts, or regulatory compliance
  • Competitive intelligence - Keep a historical record of competitor pages, pricing, and product offerings
  • Content auditing - Track how your own website evolves over time with complete snapshots
  • Journalism - Archive source pages to preserve evidence that may be modified or removed

Enabling Archives

Archives are enabled at the workspace level. Contact support or check your workspace settings to enable archiving for your monitored pages.

Ready to Track Changes?

Set up monitoring in under 60 seconds and never miss important updates again.

Track a New Page