Specification Overview
- Weighted composite structural scoring model
- Deterministic hierarchy validation
- Security and canonical authority enforcement
- Standardized diagnostic registry
I. Core Mathematical Engine
Implementations MUST apply the following weighted composite model to ensure binary-equivalent audit results:
$$S_{total} = \sum_{i \in \{D, C, F, V, SC\}} w_i \cdot s_i$$
Weights: $w_D=0.25, w_C=0.20, w_F=0.30, w_V=0.15, w_{SC}=0.10$
D-Score (Hierarchy): Fixed penalty of 0.35 if $H1 \neq 1$. Bonus 0.98 if $H2 \ge 2$.
C-Score (Density): Logarithmic scaling with base $300$ word count. Floor at 0.1.
V-Score (Security): Absolute requirement for TLS. Non-secure endpoints capped at 0.30.
II. Canonical DOM Snapshot (CDS)
The Canonical DOM Snapshot ensures structural equivalence without heuristic drift:
1. Normalization: Unicode NFC → Case Folding → Whitespace Collapse.
2. Hierarchy Guard: Sequential order MUST be verified (H1 > H2 > H3).
3. Density Base: Minimum 300 tokens required for full saturation.
4. Asset Signals: Mandatory verification of favicon.ico and lang attributes.
III. URI Integrity & Authority (F-Score)
- Canonicalization: Mandatory rel="canonical" audit.
- Schema Extraction: Support for application/ld+json with nested @graph resolution.
- Social Signals: Audit for OpenGraph Title, Description and Image parity.
IV. Status & Error Registry
| Code |
Constant |
Diagnostic Description |
| 0x00 |
STATUS_OK |
Zero-delta success. Structural integrity verified. |
| 0x01 |
ERR_SEC_TLS |
Insecure endpoint. V-Score capped. |
| 0x02 |
ERR_MATH_DOMAIN |
Word count underflow or illegal log argument. |
| 0x03 |
ERR_DOM_HIERARCHY |
H1 count violation or non-sequential jumps. |
| 0x04 |
ERR_CANON_MISSING |
Lack of rel="canonical" in authoritative documents. |
| 0x05 |
ERR_HASH_FAIL |
Integrity check failure against Master Core. |