Web Archiving with WACZ: Preserve Full Page Snapshots

Last updated: 30 July, 20265 min read

PageCrawl can automatically create a full web archive of your monitored pages every time a change is detected. Archives capture the complete page (HTML, CSS, images, scripts) so you can replay it exactly as it appeared at that moment.

Archives are saved in the WACZ (Web Archive Collection Zipped) format, an open standard for web archiving used by libraries, governments, and legal teams worldwide.

Note: Available on Ultimate plan.

How It Works

PageCrawl detects a change on a monitored page
A full WACZ archive is created capturing the complete page state
The archive is stored securely in the cloud
You can replay the archived page at any time from the change history

If WACZ generation fails (e.g., due to complex page structure), PageCrawl falls back to creating a self-contained HTML archive instead.

Enable archiving with the Web Archive toggle in the page editor's Crawling Preferences:

Crawling Preferences with the Web Archive toggle enabled to capture a WACZ archive on every change

How Archives Differ from Screenshots

PageCrawl offers both screenshots and web archives, but they serve different purposes:

	Screenshot	Web Archive (WACZ)
What it captures	A flat image of the visible page	The complete page: HTML, CSS, JavaScript, images, fonts
Interactivity	None (static image)	Fully interactive: scroll, click links, hover over elements
Content below the fold	Only if full-page screenshot is enabled	Always included, the entire page is preserved
Dynamic content	Shows one visual state	Preserves interactive elements, dropdowns, tabs
File size	Small (typically under 1 MB)	Larger (includes all page assets)
Best for	Quick visual reference, visual diff comparison	Legal evidence, compliance records, full preservation

Screenshots are great for a quick visual snapshot and for visual change detection (highlighting pixel differences). Web archives go further by preserving the entire page so you can interact with it later exactly as it appeared.

How Do I Prove an Archive Has Not Been Edited?

Every archive is sealed and date-stamped as it is created, so you can show that the copy you hold is the copy we captured and that nothing has been altered since. Two things happen automatically:

A signature is embedded in the archive. It follows the open WACZ Auth specification, so any compliant tool can verify it, not only PageCrawl.
The archive is date-stamped by independent third parties. More than one of them stamps every archive, and none are connected to PageCrawl, so the date does not rest on our word alone. Each stamp is saved as a separate file alongside the archive and can be checked on its own.

Because the stamps cover a cryptographic fingerprint of the archive, changing even a single byte breaks verification. That is what makes an archive usable as evidence in a way a screenshot is not: a screenshot has nothing to check it against.

Note: if you have eIDAS requirements, archives can additionally be stamped by a qualified trust service provider. Contact support to enable it.

How PageCrawl Archives Differ from Archive.org

The Internet Archive (archive.org) and PageCrawl both preserve web pages, but they work very differently:

Archive.org (Wayback Machine):

Public, community-driven project that crawls the open web
Snapshots are taken on their own schedule (often weeks or months apart)
No control over when or how often pages are archived
Pages behind logins, paywalls, or bot protection are usually not captured
Anyone can view the archived pages
No change detection or notifications

PageCrawl Web Archiving:

Private to your account, stored securely in the cloud
Archives are created automatically every time a change is detected
You control the check frequency (every 5 minutes to daily)
Works with pages behind logins using browser actions (click, type, wait)
Works with pages behind bot protection
Archives are paired with change detection, so you know exactly what changed and when
Download WACZ files for offline storage or legal use

In short, archive.org is best for general public web preservation. PageCrawl archiving is designed for active monitoring where you need precise, private, frequent snapshots tied to detected changes.

Viewing Archives

To view an archived page:

Open a monitored page and go to its change history
Click on any check that has an archive (indicated by an archive icon)
The archive viewer opens, showing the page exactly as it appeared
Use the previous/next arrows to browse between archived versions

The viewer uses ReplayWeb.page to render WACZ archives interactively in your browser. You can scroll, click links, and interact with the page as if you were browsing it live at that point in time.

Downloading Archives

You can download any archive file directly:

Open the archive viewer for the check you want
Click the download button to save the WACZ file
Open it with any WACZ-compatible viewer (ReplayWeb.page, Webrecorder, etc.)

Downloaded archives can be used for legal evidence, compliance records, or offline browsing.

Use Cases

Legal and compliance - Preserve evidence of website content at specific dates for disputes, contracts, or regulatory compliance
Competitive intelligence - Keep a historical record of competitor pages, pricing, and product offerings
Content auditing - Track how your own website evolves over time with complete snapshots
Journalism - Archive source pages to preserve evidence that may be modified or removed

Enabling Archives

Archiving is available on the Ultimate plan. Once it is enabled for your workspace, turn it on for any monitored page with the Web Archive toggle described above. If you do not see the toggle, contact support to enable archiving for your workspace.

Ready to Track Changes?

Set up monitoring in under 60 seconds and never miss important updates again.

Track a New Page