# SEC 17a-4 Web Archive: How Broker-Dealers Monitor and Preserve Web Content

Source: PageCrawl.io Blog
URL: https://pagecrawl.io/blog/sec-17a-4-web-archive-monitoring

---

In a 2023 enforcement sweep, the SEC fined twenty-six broker-dealer firms a combined $400 million for failures around recordkeeping. The headline failures involved off-channel communications on personal devices, but several of the orders also cited recordkeeping for public-facing materials, including content posted to firm websites and social media pages. One settlement order described the firm's inability to produce, on demand, the version of a web page as it existed on a specific date. The firm had the page. It did not have the version.

SEC Rule 17a-4 is one of the most prescriptive recordkeeping rules in US financial regulation. It applies to every broker-dealer registered with the SEC and lays out, in technical detail, what records must be retained, in what format, and for how long. The rule was substantially modernized in late 2022 to allow electronic recordkeeping in formats other than write-once-read-many (WORM) media, but the substantive obligation, that broker-dealers must be able to produce the actual record as it existed at any point during the retention period, has not changed.

For most broker-dealers, the records-on-paper portion of the obligation is well-handled by an established document management system. The harder portion, in 2026, is web content. Public-facing webpages, blog posts, marketing pages, fund prospectuses linked from the website, fee schedules, FAQ pages, and disclosures all fall within the universe of "communications with the public" that 17a-4 retains. None of these are static artefacts. Most update continuously.

This guide covers what 17a-4 requires for web content, what to retain, how to operate a compliant web archive in 2026, the SEC's 2022 amendments and what they changed, and how to set up the monitoring and archival tooling to do this without dedicating a full headcount.

### What SEC 17a-4 Requires for Web Content

17a-4 sits inside the broader 17a-3 / 17a-4 / 17a-5 recordkeeping regime. 17a-3 lists the types of records that must be created. 17a-4 dictates how those records are retained, for how long, and in what format. 17a-5 covers reports.

#### Records of communications with the public

Rule 17a-4(b)(4) requires broker-dealers to retain originals or copies of all communications received and sent by the firm, including communications "relating to the firm's business as such." This is interpreted broadly, including marketing and advertising materials posted on firm-controlled pages, fee schedules, account agreements, prospectus documents linked from the website, and any other content that a customer or prospective customer might rely on.

The retention period for these records is generally three years, with the first two years in a readily accessible format. FINRA Rule 4511 incorporates the SEC's recordkeeping requirements and extends them in specific contexts.

The practical implication is that every public-facing page on a broker-dealer's website that contains business-related content needs to be retained in a way that the firm can produce, on regulatory request, the page as it appeared on a specific date in the past three years.

#### What "as it appeared" means in practice

A regulator asking for the version of a page on a specific date does not want a description, a screenshot summary, or the current page with a note that the language used to be different. They want the actual record, in a form that demonstrates what was published.

In practice, regulators typically ask for one or more of:

- A web archive (WARC or WACZ format) that can be replayed at a given timestamp
- A timestamped screenshot of the page
- A timestamped HTML capture
- A combination of the above with an audit log showing the capture process

The format doesn't matter as much as the demonstrability. If your archive cannot show that a specific URL contained specific content on a specific date, it does not satisfy the rule.

#### What 17a-4(f) actually says about format (post-2022)

The 2022 amendments to 17a-4(f) replaced the prior WORM-only requirement with a more flexible technology-neutral approach. Broker-dealers now have two compliant paths:

**Electronic recordkeeping system meeting either:**
- WORM (write-once-read-many) characteristics that prevent records from being overwritten or erased, or
- An audit-trail-based alternative that preserves a complete record of all modifications and deletions and prevents the alteration of any retained record

For web archives specifically, this means a system that captures pages at scheduled intervals and prevents the captured records from being modified satisfies the rule. WACZ (Web Archive Collection Zipped) is well-suited to this requirement because the format includes cryptographic hashes of every captured resource, so any tampering is detectable.

The 2022 amendments also clarified the role of designated executive officers and external auditors in attesting that the recordkeeping system meets the requirements.

### What to Monitor and Archive

Not every page on a broker-dealer website is a 17a-4 record. The discipline is to identify which pages contain business-related content covered by the rule and archive those continuously, while leaving pure marketing surface alone.

#### Categories typically in scope

**Fee schedules and pricing pages.** Any page disclosing commissions, fees, expense ratios, advisory fees, or other charges. These typically change quarterly or more often.

**Account agreements and customer agreements.** Any agreement template available on the website, including margin agreements, options trading agreements, IRA agreements, and retail customer relationship summaries (Form CRS).

**Disclosure documents.** Best execution disclosures, payment for order flow disclosures, conflicts of interest disclosures, anti-money-laundering notices, business continuity disclosures.

**Marketing and advertising materials.** Pages making any claim about the firm's services, performance, product features, or comparisons. Includes blog posts, white papers, FAQ pages, and educational content.

**Prospectuses and product documentation.** PDFs and HTML pages for proprietary funds, structured products, and other offerings linked from the firm website.

**Form CRS and Form ADV.** For dual-registered broker-dealers, the publicly posted Form CRS and the publicly available portion of Form ADV.

**Privacy and security policies.** Privacy notices required under Reg S-P, security disclosures, and notices that may relate to identity theft prevention obligations.

#### Categories typically out of scope

Pure marketing pages with no business-related claims, internal employee pages, vendor logos, and similar non-substantive content typically fall outside 17a-4. Where a page straddles the line, the safer practice is to retain.

### Setting Up a Compliant Web Archive

The pattern that scales for most broker-dealer compliance teams is a continuous monitoring and capture pipeline that detects changes to in-scope pages and writes a new archive entry every time content changes. Weekly or monthly batch captures are an inferior approach because they miss intra-period changes and force the firm into "we think this is what it looked like" instead of "here is the captured record."

#### Identify the in-scope URLs

Start with the firm's website. Walk it and identify every page that contains business-related content per the categories above. Tag each URL with its category (fee schedule, disclosure, marketing, agreement, etc.) and its retention obligation (typically three years for 17a-4 records).

Most broker-dealer sites have between 50 and 500 in-scope URLs depending on the breadth of the business. Keep the inventory in a single source of truth, ideally inside the recordkeeping system itself.

#### Set up continuous monitoring

Add each URL as a monitor with appropriate check frequency. Daily for fee schedules and disclosures, weekly for agreements and marketing, on-demand for time-sensitive disclosures during product launches. The frequency does not need to be aggressive; the point is to catch changes when they happen, not to capture irrelevant intra-day variation.

For each detected change, the monitoring system should:
- Capture a new full-page screenshot
- Capture the HTML content
- Capture any linked PDFs
- Record the timestamp of capture
- Record a cryptographic hash of the captured content
- Store the artefacts in tamper-evident storage

#### Maintain the audit trail

The recordkeeping system has to log not just what was captured, but when and how. The 2022 amendments specifically reference the need for an audit trail of modifications to records. In a continuous-capture system, this means logging every capture event, every retention extension, every retrieval, and every deletion (deletions only after retention expires).

PageCrawl's monitoring captures the screenshot, the HTML, and a timestamp on every detected change. On Ultimate plans you can enable WACZ archive capture per page, and on enabled pages each detected change produces a WACZ archive carrying multiple independent cryptographic attestations: a domain-identity signature using a Let's Encrypt certificate, an RFC 3161 timestamp from a commercial Trust Service Provider, and a Bitcoin blockchain anchor via OpenTimestamps. The manifest hashes inside the WACZ plus the multi-provider timestamp proofs satisfy the structural tamper-evidence pillar contemplated by the 2022 17a-4(f) amendments. The full change history is exportable to PDF, Excel, or a structured evidence bundle for retention in the firm's records repository.

For litigation use, the multi-layer attestations align with FRE 902(13)/(14) self-authenticating evidence standards. Counsel can hand a regulator or opposing counsel a public verification link rather than a custodian deposition; the recipient sees the manifest hash and every cryptographic attestation, and verifies the proofs against public certificate chains and the Bitcoin blockchain offline.

#### The AI fabrication problem

In 2026 a generative model can produce a plausible screenshot of any web page in seconds. A self-stored archive proves nothing on its own to a regulator, auditor, or court because the firm could have generated it after the fact. What AI cannot fabricate is a hash anchored to the Bitcoin blockchain, an RFC 3161 timestamp signed by a Trust Service Provider's private key, or a qualified seal from a regulated QTSP. PageCrawl attaches several of these in parallel on every detected change, so the archive's existence at a specific moment is attested by parties no single actor can spoof. That is the only practical bar for evidentiary archives in an AI-saturated world.

#### Document the process for the designated executive

Under the 2022 amendments, the firm must designate one or more executive officers responsible for representing to the SEC that the firm's recordkeeping system meets the requirements. The system documentation that the designated officer attests to should describe:

- Which pages are in scope and why
- The capture frequency for each category
- The capture format and tamper-evidence mechanism
- The retention period for each category
- The retrieval procedure
- The audit-trail mechanism
- The deletion procedure for records that have aged out of retention

A monitoring system that produces a clear, exportable specification of all of the above is materially easier to attest to than an ad-hoc screenshot pipeline.

### A Worked Example: Quarterly Fee Schedule Update

A common pattern: a broker-dealer publishes its fee schedule at firm.com/disclosures/fees. The schedule changes quarterly. A retail customer alleges in 2027 that a specific fee was charged in March 2025 in violation of the fee schedule that was published at the time. The firm needs to produce the version of the fee schedule that was actually published on the date of the disputed charge.

In a manual screenshot regime, the firm has whatever was captured at the most recent quarterly review. If the relevant version was overwritten without an explicit capture, the firm cannot produce the actual record.

In a continuous monitoring regime, the system captured a new archive entry every time the fee schedule changed. The firm pulls the archive entry dated immediately before the disputed charge, demonstrates the fee schedule as it existed at that time, and resolves the dispute on the record.

### Common Pitfalls

A few patterns separate teams that keep 17a-4 web archival sustainable from teams that rebuild it after the next exam.

#### Treating screenshots as sufficient

A bare screenshot does not capture the underlying HTML, the linked documents, or the styling. If a fee schedule includes a reference to a PDF disclosure, a screenshot of the fee page does not retain the PDF. The system has to capture the linked document as well, ideally on the same capture event.

#### Capturing only on schedule

Quarterly captures miss changes that happen between captures. The first time this becomes a problem is during litigation or a regulatory inquiry that asks about a specific date that does not align with a capture. Continuous capture (monitor for change, capture when it changes) avoids this entirely.

#### Forgetting the dynamic content

Modern firm websites pull content from CMS systems, A/B testing tools, and personalization engines. A page captured at 9am may differ from the same page captured at 11am. The capture system needs to detect that a difference is meaningful (content change) versus cosmetic (tracking pixel rotation, A/B variant) and act accordingly.

#### Ignoring linked PDFs

Linked PDFs (account agreements, disclosures, prospectuses) are records in their own right. The capture system has to follow links and archive the documents, not just the host page.

#### No audit trail

A capture pipeline that produces archives without logging capture events, retrieval events, and retention status is missing the 2022 amendment's audit-trail requirement. Logging needs to be tamper-evident itself.

### Choosing your PageCrawl plan

PageCrawl's **Free plan** lets you monitor **6 pages** with **220 checks per month**, which is enough to validate the approach on your most critical pages. Most teams graduate to a paid plan once they see the value.

| Plan | Price | Pages | Checks / month | Frequency |
|------|-------|-------|----------------|-----------|
| Free | $0 | 6 | 220 | every 60 min |
| Standard | $8/mo or $80/yr | 100 | 15,000 | every 15 min |
| Enterprise | $30/mo or $300/yr | 500 | 100,000 | every 5 min |
| Ultimate | $99/mo or $999/yr | 1,000 | 100,000 | every 2 min |

Annual billing saves two months across every paid tier. Enterprise and Ultimate scale up to 100x if you need thousands of pages or multi-team access.

Compliance monitoring is the cheapest insurance you can buy. A single missed regulatory change can trigger fines in the tens or hundreds of thousands, not to mention the audit overhead of proving you did not see it coming. Enterprise at $300/year covers 500 regulatory pages with unlimited history and timestamped screenshots, which is usually exactly what an assessor wants to see. All plans include the **PageCrawl MCP Server**, so your compliance team can ask Claude to summarize every change to a specific regulation over the last quarter and pull the exact diff, turning your monitoring history into a queryable audit trail. AI assistants can create monitors through conversation on every plan, including Free, and paid plans add on-demand checks and monitor management. Standard at $80/year is enough to cover 100 pages across your primary regulatory bodies if your program is smaller.

### Getting Started

Set up a 17a-4 web archive in three steps:

1. **Inventory your in-scope URLs.** Walk the firm site and identify every page that contains business-related content. Tag by category and retention obligation.
2. **Add each URL as a continuous monitor with appropriate frequency.** Daily for fee schedules, weekly for agreements, on-demand for time-sensitive launches. Capture screenshot, HTML, and linked PDFs on every detected change.
3. **Document the system for the designated executive.** Scope, frequency, format, retention, audit trail, retrieval, deletion. Attach the documentation to the firm's recordkeeping policy.

For related reading, see [DORA compliance monitoring](/blog/dora-compliance-monitoring), [SEC filings monitoring on EDGAR](/blog/sec-filings-monitoring-edgar-alerts), and [banking regulatory compliance monitoring](/blog/banking-regulatory-compliance-monitoring).

If your broker-dealer is standing up a 17a-4 web archive, the [Banking Intelligence](/use-cases/banking-intelligence) use case walks through the broader supervisor monitoring program these archives sit inside, with audit-grade timestamps, team routing, and alert escalation for regulatory communications.

---

Need more? The complete PageCrawl.io help center, with every article, is available as a single document at https://pagecrawl.io/llms-full.txt. Read it for context on anything this page does not cover.
