# Web Archiving with WACZ: Preserve Full Page Snapshots

Source: PageCrawl.io Help Center
URL: https://pagecrawl.io/help/features/article/web-archiving-wacz

---

PageCrawl can automatically create a full web archive of your monitored pages every time a change is detected. Archives capture the complete page (HTML, CSS, images, scripts) so you can replay it exactly as it appeared at that moment.

Archives are saved in the WACZ (Web Archive Collection Zipped) format, an open standard for web archiving used by libraries, governments, and legal teams worldwide.

  <strong>Note:</strong> Available on Ultimate plan.

### How It Works

1. PageCrawl detects a change on a monitored page
2. A full WACZ archive is created capturing the complete page state
3. The archive is stored securely in the cloud
4. You can replay the archived page at any time from the change history

If WACZ generation fails (e.g., due to complex page structure), PageCrawl falls back to creating a self-contained HTML archive instead.

Enable archiving with the **Web Archive** toggle in the page editor's Crawling Preferences:

  [Image: Crawling Preferences with the Web Archive toggle enabled to capture a WACZ archive on every change]

### How Archives Differ from Screenshots

PageCrawl offers both screenshots and web archives, but they serve different purposes:

| | Screenshot | Web Archive (WACZ) |
|---|---|---|
| **What it captures** | A flat image of the visible page | The complete page: HTML, CSS, JavaScript, images, fonts |
| **Interactivity** | None (static image) | Fully interactive: scroll, click links, hover over elements |
| **Content below the fold** | Only if full-page screenshot is enabled | Always included, the entire page is preserved |
| **Dynamic content** | Shows one visual state | Preserves interactive elements, dropdowns, tabs |
| **File size** | Small (typically under 1 MB) | Larger (includes all page assets) |
| **Best for** | Quick visual reference, visual diff comparison | Legal evidence, compliance records, full preservation |

Screenshots are great for a quick visual snapshot and for visual change detection (highlighting pixel differences). Web archives go further by preserving the entire page so you can interact with it later exactly as it appeared.

### How PageCrawl Archives Differ from Archive.org

The Internet Archive (archive.org) and PageCrawl both preserve web pages, but they work very differently:

**Archive.org (Wayback Machine):**
- Public, community-driven project that crawls the open web
- Snapshots are taken on their own schedule (often weeks or months apart)
- No control over when or how often pages are archived
- Pages behind logins, paywalls, or bot protection are usually not captured
- Anyone can view the archived pages
- No change detection or notifications

**PageCrawl Web Archiving:**
- Private to your account, stored securely in the cloud
- Archives are created automatically every time a change is detected
- You control the check frequency (every 5 minutes to daily)
- Works with pages behind logins using [browser actions](/help/features/article/perform-actions.md) (click, type, wait)
- Works with pages behind bot protection
- Archives are paired with change detection, so you know exactly what changed and when
- Download WACZ files for offline storage or legal use

In short, archive.org is best for general public web preservation. PageCrawl archiving is designed for active monitoring where you need precise, private, frequent snapshots tied to detected changes.

### Viewing Archives

To view an archived page:

1. Open a monitored page and go to its change history
2. Click on any check that has an archive (indicated by an archive icon)
3. The archive viewer opens, showing the page exactly as it appeared
4. Use the previous/next arrows to browse between archived versions

The viewer uses ReplayWeb.page to render WACZ archives interactively in your browser. You can scroll, click links, and interact with the page as if you were browsing it live at that point in time.

### Downloading Archives

You can download any archive file directly:

1. Open the archive viewer for the check you want
2. Click the download button to save the WACZ file
3. Open it with any WACZ-compatible viewer (ReplayWeb.page, Webrecorder, etc.)

Downloaded archives can be used for legal evidence, compliance records, or offline browsing.

### Use Cases

- **Legal and compliance** - Preserve evidence of website content at specific dates for disputes, contracts, or regulatory compliance
- **Competitive intelligence** - Keep a historical record of competitor pages, pricing, and product offerings
- **Content auditing** - Track how your own website evolves over time with complete snapshots
- **Journalism** - Archive source pages to preserve evidence that may be modified or removed

### Enabling Archives

Archiving is available on the Ultimate plan. Once it is enabled for your workspace, turn it on for any monitored page with the **Web Archive** toggle described above. If you do not see the toggle, contact support to enable archiving for your workspace.

---

Need more? The complete PageCrawl.io help center, with every article, is available as a single document at https://pagecrawl.io/llms-full.txt. Read it for context on anything this page does not cover.
