The 20-Year File:
A Zero-Dependency Architecture for Digital Paratypes

T. M. Jones, PhD

Abstract

This technical note describes a practical way to publish datasets and visualizations as a single, durable HTML file that runs offline. The implementation uses native browser standards, just Vanilla JavaScript (standard ECMAScript) and CSS3. It vendors all visualization code into the document. The dataset is stored inline, eliminating fetch calls and external dependencies. Borrowing from taxonomy: if the digital holotype is the raw dataset, this describes a "digital paratype," packaging analysis and figures alongside data in a single archival file. The methodology is demonstrated in the Michigan Cannabis Market Analysis (DOI: 10.5281/zenodo.18065297), a longitudinal economic dataset (74 months at publication).

Unlike a static PDF, the document rewrites itself at load time, regenerating durations, refreshing labels, and updating summary text from embedded data. This keeps the narrative aligned as the dataset grows. The "evergreen" workflow means replacing the monthly JSON payload triggers automatic regeneration without rebuilds. The same approach supports multilingual publication; six languages were added without an API. Performance validation using PageSpeed Insights shows strong interactivity and accessibility on typical devices. Standard growth models suggest multi-century operability is mathematically possible.

Unlike physical specimens that require protection from damage, these digital artifacts survive through distribution. The preservation strategy shifts from custody to replication. At 325KB for seven years of data, the artifact functions as "viral data"; distributable via email, thumb drives, or web hosting while remaining completely self-contained.

Availability: The live artifact is accessible at https://tjid3.org/

The Context: Archival Risk

This methodology paper describes a self-modifying HTML architecture demonstrated in the Michigan Cannabis Market Analysis (DOI: 10.5281/zenodo.18065297), a longitudinal economic dataset tracking 74 months of regulatory data. The artifact executes its own code to regenerate narrative sections each time new data is added, maintaining alignment between analysis and evidence without external build processes.

This is neither a static webpage, a web application, nor self-modifying code in the traditional sense. It's a single HTML file that, when opened, executes code to rewrite its own narrative sections from embedded data - generating current summaries, conclusions, and analytical text based on the dataset's current state. This is a category that doesn't yet have a name. Borrowing from taxonomic nomenclature, these are termed digital paratypes.

Longitudinal datasets often retain scientific value for decades. Interactive visualizations and their data payloads that accompany them often do not. Many modern web deployments depend on layered toolchains, build pipelines, CDNs, and versioned packages that can disappear or become difficult to reproduce over time. When a visualization is tightly coupled to that tooling stack, the research may remain intact while the interface becomes unusable. This is an archival risk: the record still exists, but the instrument for exploring it cannot be reliably executed in the future.

Physical archives can be lost to fire. Water damage can erase them. War can destroy them. Simple attrition can remove them, quietly. Digital artifacts fail in a different way. They depend on infrastructure that degrades over time. Frameworks churn. CDNs break. APIs are deprecated. Package managers become inaccessible.

This pattern might be termed "complexity theater", sophisticated architectural choices designed for enterprise-scale challenges applied to research artifacts that require longevity over scalability. The infrastructure is impressive. The tooling is elaborate. But for archival purposes, the complexity becomes the vulnerability.

The digital preservation community has long recognized that replication protects against institutional failure. Stanford's LOCKSS program ("Lots of Copies Keep Stuff Safe") pioneered distributed preservation networks for academic journals, demonstrating that geographically dispersed copies under independent administration provide robust defense against content loss. However, these systems require coordinated institutional infrastructure: networked repositories, automated auditing protocols, and persistent organizational commitment.

A zero-dependency architecture inverts this model. Instead of institutions maintaining copies through active curation, the artifact's self-contained portability enables friction-free distribution. The preservation strategy shifts from custody to replication. A single HTML file can be copied indefinitely. Redundant storage is easy. Opening the file requires no installation. A network connection is optional. Physical type specimens are fragile because they cannot be duplicated. Framework-dependent visualizations are fragile because they cannot be duplicated as working objects. Zero-dependency artifacts survive through distribution itself.

There is no single specimen to protect. The file functions as the reference artifact. Every copy remains complete. Distribution is the defense.

This approach applies Fernand Braudel's la longue durée to digital preservation, framing the work in centuries rather than quarterly cycles. By looking past transient frameworks and architectural trends, the focus remains on enduring web standards—the infrastructural layer that survives technological shifts.

For the Michigan Cannabis market dataset, the design constraint was explicit: the complete interactive paper must run as a single HTML file opened locally in a standard browser. The artifact must function offline, without installation steps, and without external requests.

The dataset is structured using a domain-specific ontology optimized for compactness - multiple years of regulatory data totaling 34KB through consistent schema design. Each monthly update triggers the document to rewrite its own narrative sections, regenerating summaries and updating visualizations from the embedded data payload.

The "Evergreen" Architecture

The solution utilizes a pattern I call "Gojira & Gamera." It separates the data payload from the visualization engine, keeping both within the DOM but logically distinct.

1. Gojira (The Data Engine)

Instead of fetching data from an external database or a CSV file, the full dataset is embedded directly into the HTML as a pre-hydrated JSON object. This avoids asynchronous loading states and network latency entirely.

// Example of the "Gojira" embedded data structure
const rawData = [
    ["10/1/2019", "Oct", 2019, "$28,594,402", ...],
    ["11/1/2019", "Nov", 2019, "$26,628,235", ...],
    // ... 60+ months of data
];
Listing 1: Data embedding strategy. The raw array structure injects the complete dataset into the DOM, eliminating fetch requests.

2. Gamera (The Visualization Logic)

The application logic uses an IIFE (Immediately Invoked Function Expression) to encapsulate state, preventing pollution of the global namespace. It utilizes Vanilla JavaScript to manipulate the DOM directly, bypassing the Virtual DOM overhead inherent in complex frameworks.

// Scope-Safe Application Logic
const App = (() => {
    const state = {
        data: [],
        charts: {},
        activeFilter: 'flower'
    };

    const init = () => {
        // Hydrate data directly from memory
        state.data = Logic.process(rawData);
        Render.charts();
    };

    return { init };
})();
Listing 2: Scope encapsulation. This design creates a secure, encapsulated execution context for the logic.

For the visualization layer, the project relies on Chart.js. To strictly adhere to the zero-dependency constraint, the library was treated not as a remote dependency to be fetched, but as a brick cemented into the wall of the application. Technically known as "vendoring," the source code was minified and embedded directly into the document, ensuring the visualization engine functions as immutable infrastructure that is intended to outlast any package manager or CDN.

3. Zero-Dependency Styling

The architecture uses CSS Custom Properties for theming. This allows for instant Dark/Light mode switching through a single attribute change, eliminating the need to parse or load external CSS libraries.

Performance & "Living" Behavior

The architecture is consistent with Native ECMAScript. When architected correctly, native ECMAScript can outperform complex frameworks for evergreen artifacts; where the narrative evolves dynamically as the dataset grows.

Unlike a static report, this document functions as a state engine. Upon loading the JSON payload, the document performs a "self-audit," recalculating the temporal duration (e.g., updating "6-Year Trends" to "7-Year Trends"), refreshing the citation year, and revising summary statistics in the DOM text. Because this logic executes linearly without the overhead of framework hydration or Virtual DOM diffing, the document achieves near-immediate First Contentful Paint (FCP) and near-instant interactivity, regardless of the dataset's growing size.

Catchpoint Synthetic Monitoring reports provide controlled, repeatable performance measurements from their global test nodes under standardized conditions (100ms latency, 5Mbps throughput). These lab-grade metrics, including start render, first contentful paint, and total blocking time, validate the artifact’s sub-second rendering performance in a consistent testing environment—offering a reproducible benchmark distinct from real-user variability.

Google PageSpeed Insights (PSI) audits were conducted on the production build to evaluate the effectiveness of the optimization strategy. The results confirmed the stability goals while highlighting the trade-offs inherent in a monolithic design.

The artifact is a functional template. To fork it: save the single HTML page, then update the <title> and replace the domain-specific content with your own. This works through direct derivation: each new artifact starts as a copy of this complete and self-contained reference.

Snapshot Date: 2026-01-06 (Production Build)
0.5s
Start Render
0.8s
FCP
0.8s
LCP
0.8s
Speed Index
0.1s
TBT
0.002
CLS
250KB
Page Weight
0.8s
DC Time
250KB
DC Bytes
1.0s
Total Time
Figure 1a: Catchpoint performance metrics for the artifact, captured from the production deployment to provide a repeatable baseline.

Limitations: The 5MB Ceiling

This system prioritizes stability and long-term preservation over the scalability required for large transactional or streaming data applications. It is designed for typical scientific publishing, where datasets are modest in size but require reliable longevity.

The architecture is implemented as a zero-dependency, single HTML file. This design introduces a practical performance ceiling of approximately 5MB, beyond which user experience degrades due to download and parsing latency. Within this constraint, performance remains predictable and manageable.

This architecture is designed for single-author research workflows, not collaborative team development. The monolithic structure prevents the division of labor typical in modern web development - where CSS specialists, JavaScript developers, backend engineers, and content writers work in parallel on separate files merged through build processes. Digital paratypes trade collaborative infrastructure for archival independence.

However, this constraint enables a different form of collaboration: derivative reuse. The artifact can be copied, modified, and forked for new work without institutional permission or infrastructure dependencies. The embedded data can be extracted and repurposed. A digital paratype can be forked to create new independent artifacts - each complete and functional, requiring no coordination with the original author. This differs from collaborative platforms, where data access depends on ongoing institutional support and server infrastructure.

Arithmetic Projections

The data structure is lean and purpose‑built. The dataset grows at approximately 5.3 KB per year. With a 34 KB payload representing seven years of production data and a 5 MB ceiling, the architecture can remain performant for centuries, well beyond the stated 20‑year archival target.

Values Maths
Current payload size 34 KB
Annual growth rate ≈5.3 KB/year
Ceiling (practical limit) 5 MB
Years to ceiling ≈900 years
Size at 20 years ≈140 KB
Derived from Michigan CRA data

Source: Michigan Digital Paratype archive index (74 months, October 2019–November 2025).

Conclusion

This work shows that a zero-dependency, single-file HTML publication can support interactive data storage and multi-series visualization while minimizing long-term operational requirements. By embedding the dataset and vendoring the visualization code, the artifact remains executable offline and avoids failure modes associated with external infrastructure.

This work demonstrates what appears to be a novel document architecture: artifacts that regenerate their own analysis from embedded data. While self-contained HTML documents and executable research compendia exist separately, no examples were found combining zero-dependency execution with self-modifying narrative generation. The Michigan Cannabis analysis contains the complete logic to rewrite its narrative sections when new monthly data is inserted. Each version can produce the next version. This is not traditional reproducible research, where external users verify results by re-running code. This is self-modification by design; where the document maintains its own coherence through execution.

The implications extend beyond technical novelty. If documents can contain the process of their own regeneration, they become digital paratypes in a deeper sense than taxonomic metaphor suggests. They preserve not just content, but the machinery of authorship itself. Each copy distributed is a complete, functional replica capable of producing future iterations.

The approach is intended for modest-to-medium datasets where durability and reproducibility take precedence over large-scale streaming, complex authentication, or rapid feature turnover. Within the practical file-size ceiling described above, a monolithic HTML artifact can function as a stable, citable interface for longitudinal research and archival distribution. At 325KB for several years of data, this approach demonstrates preservation through viral distribution rather than institutional custody. The artifact survives by being copied, not protected.

For data that must persist, distribution remains the defense. This is a working implementation, not a proposal. The methodology is demonstrated at https://tjid3.org/.