The 20-Year File:
A Zero-Dependency Architecture for Digital Paratypes

T. M. Jones, PhD

TJID3 Technical Note • Published: December 2025

DOI: 10.5281/zenodo.18054789

Abstract

This technical note describes a practical way to publish datasets and visualizations as a single, durable HTML file that runs offline. The implementation uses native browser standards, just Vanilla JavaScript (standard ECMAScript) and CSS3. It vendors all visualization code into the document. The dataset is stored inline, eliminating fetch calls and external dependencies. Borrowing from taxonomy: if the digital holotype is the raw dataset, this describes a "digital paratype," packaging analysis and figures alongside data in a single archival file. The methodology is demonstrated in the Michigan Cannabis Market Analysis (DOI: 10.5281/zenodo.18065297), a longitudinal economic dataset (74 months at publication).

Unlike a static PDF, the document rewrites itself at load time, regenerating durations, refreshing labels, and updating summary text from embedded data. This keeps the narrative aligned as the dataset grows. The "evergreen" workflow means replacing the monthly JSON payload triggers automatic regeneration without rebuilds. The same approach supports multilingual publication; six languages were added without an API. Performance validation using PageSpeed Insights shows strong interactivity and accessibility on typical devices. Standard growth models suggest multi-century operability is mathematically possible.

Unlike physical specimens that require protection from damage, these digital artifacts survive through distribution. The preservation strategy shifts from custody to replication. At 325KB for seven years of data, the artifact functions as "viral data"; distributable via email, thumb drives, or web hosting while remaining completely self-contained.

Availability: The live artifact is accessible at https://tjid3.org/

The Context: Archival Risk

This methodology paper describes a self-modifying HTML architecture demonstrated in the Michigan Cannabis Market Analysis (DOI: 10.5281/zenodo.18065297), a longitudinal economic dataset tracking 74 months of regulatory data. The artifact executes its own code to regenerate narrative sections each time new data is added, maintaining alignment between analysis and evidence without external build processes.

This is neither a static webpage, a web application, nor self-modifying code in the traditional sense. It's a single HTML file that, when opened, executes code to rewrite its own narrative sections from embedded data - generating current summaries, conclusions, and analytical text based on the dataset's current state. This is a category that doesn't yet have a name. Borrowing from taxonomic nomenclature, these are termed digital paratypes.

Longitudinal datasets often retain scientific value for decades. Interactive visualizations and their data payloads that accompany them often do not. Many modern web deployments depend on layered toolchains, build pipelines, CDNs, and versioned packages that can disappear or become difficult to reproduce over time. When a visualization is tightly coupled to that tooling stack, the research may remain intact while the interface becomes unusable. This is an archival risk: the record still exists, but the instrument for exploring it cannot be reliably executed in the future.

Physical archives can be lost to fire. Water damage can erase them. War can destroy them. Simple attrition can remove them, quietly. Digital artifacts fail in a different way. They depend on infrastructure that degrades over time. Frameworks churn. CDNs break. APIs are deprecated. Package managers become inaccessible.

This pattern might be termed "complexity theater", sophisticated architectural choices designed for enterprise-scale challenges applied to research artifacts that require longevity over scalability. The infrastructure is impressive. The tooling is elaborate. But for archival purposes, the complexity becomes the vulnerability.

The digital preservation community has long recognized that replication protects against institutional failure. Stanford's LOCKSS program ("Lots of Copies Keep Stuff Safe") pioneered distributed preservation networks for academic journals, demonstrating that geographically dispersed copies under independent administration provide robust defense against content loss. However, these systems require coordinated institutional infrastructure: networked repositories, automated auditing protocols, and persistent organizational commitment.

A zero-dependency architecture inverts this model. Instead of institutions maintaining copies through active curation, the artifact's self-contained portability enables friction-free distribution. The preservation strategy shifts from custody to replication. A single HTML file can be copied indefinitely. Redundant storage is easy. Opening the file requires no installation. A network connection is optional. Physical type specimens are fragile because they cannot be duplicated. Framework-dependent visualizations are fragile because they cannot be duplicated as working objects. Zero-dependency artifacts survive through distribution itself.

There is no single specimen to protect. The file functions as the reference artifact. Every copy remains complete. Distribution is the defense.

This approach applies Fernand Braudel's la longue durée to digital preservation, framing the work in centuries rather than quarterly cycles. By looking past transient frameworks and architectural trends, the focus remains on enduring web standards—the infrastructural layer that survives technological shifts.

For the Michigan Cannabis market dataset, the design constraint was explicit: the complete interactive paper must run as a single HTML file opened locally in a standard browser. The artifact must function offline, without installation steps, and without external requests.

The dataset is structured using a domain-specific ontology optimized for compactness - multiple years of regulatory data totaling 34KB through consistent schema design. Each monthly update triggers the document to rewrite its own narrative sections, regenerating summaries and updating visualizations from the embedded data payload.

The "Evergreen" Architecture

The solution utilizes a pattern I call "Gojira & Gamera." It separates the data payload from the visualization engine, keeping both within the DOM but logically distinct.

1. Gojira (The Data Engine)

Instead of fetching data from an external database or a CSV file, the full dataset is embedded directly into the HTML as a pre-hydrated JSON object. This avoids asynchronous loading states and network latency entirely.

// Example of the "Gojira" embedded data structure
const rawData = [
    ["10/1/2019", "Oct", 2019, "$28,594,402", ...],
    ["11/1/2019", "Nov", 2019, "$26,628,235", ...],
    // ... 60+ months of data
];

Listing 1: Data embedding strategy. The raw array structure injects the complete dataset into the DOM, eliminating fetch requests.

2. Gamera (The Visualization Logic)

The application logic uses an IIFE (Immediately Invoked Function Expression) to encapsulate state, preventing pollution of the global namespace. It utilizes Vanilla JavaScript to manipulate the DOM directly, bypassing the Virtual DOM overhead inherent in complex frameworks.

// Scope-Safe Application Logic
const App = (() => {
    const state = {
        data: [],
        charts: {},
        activeFilter: 'flower'
    };

    const init = () => {
        // Hydrate data directly from memory
        state.data = Logic.process(rawData);
        Render.charts();
    };

    return { init };
})();

Listing 2: Scope encapsulation. This design creates a secure, encapsulated execution context for the logic.

For the visualization layer, the project relies on Chart.js. To strictly adhere to the zero-dependency constraint, the library was treated not as a remote dependency to be fetched, but as a brick cemented into the wall of the application. Technically known as "vendoring," the source code was minified and embedded directly into the document, ensuring the visualization engine functions as immutable infrastructure that is intended to outlast any package manager or CDN.

3. Zero-Dependency Styling

The architecture uses CSS Custom Properties for theming. This allows for instant Dark/Light mode switching through a single attribute change, eliminating the need to parse or load external CSS libraries.

Performance & "Living" Behavior

The architecture is consistent with Native ECMAScript. When architected correctly, native ECMAScript can outperform complex frameworks for evergreen artifacts; where the narrative evolves dynamically as the dataset grows.

Unlike a static report, this document functions as a state engine. Upon loading the JSON payload, the document performs a "self-audit," recalculating the temporal duration (e.g., updating "6-Year Trends" to "7-Year Trends"), refreshing the citation year, and revising summary statistics in the DOM text. Because this logic executes linearly without the overhead of framework hydration or Virtual DOM diffing, the document achieves near-immediate First Contentful Paint (FCP) and near-instant interactivity, regardless of the dataset's growing size.

Catchpoint Synthetic Monitoring reports provide controlled, repeatable performance measurements from their global test nodes under standardized conditions (100ms latency, 5Mbps throughput). These lab-grade metrics, including start render, first contentful paint, and total blocking time, validate the artifact’s sub-second rendering performance in a consistent testing environment—offering a reproducible benchmark distinct from real-user variability.

Google PageSpeed Insights (PSI) audits were conducted on the production build to evaluate the effectiveness of the optimization strategy. The results confirmed the stability goals while highlighting the trade-offs inherent in a monolithic design.

The artifact is a functional template. To fork it: save the single HTML page, then update the <title> and replace the domain-specific content with your own. This works through direct derivation: each new artifact starts as a copy of this complete and self-contained reference.

Snapshot Date: 2026-01-06 (Production Build)

0.5s

Start Render

0.8s

FCP

0.8s

LCP

0.8s

Speed Index

0.1s

TBT

0.002

CLS

250KB

Page Weight

0.8s

DC Time

250KB

DC Bytes

1.0s

Total Time

Figure 1a: Catchpoint performance metrics for the artifact, captured from the production deployment to provide a repeatable baseline.

Limitations: The 5MB Ceiling

This system prioritizes stability and long-term preservation over the scalability required for large transactional or streaming data applications. It is designed for typical scientific publishing, where datasets are modest in size but require reliable longevity.

The architecture is implemented as a zero-dependency, single HTML file. This design introduces a practical performance ceiling of approximately 5MB, beyond which user experience degrades due to download and parsing latency. Within this constraint, performance remains predictable and manageable.

This architecture is designed for single-author research workflows, not collaborative team development. The monolithic structure prevents the division of labor typical in modern web development - where CSS specialists, JavaScript developers, backend engineers, and content writers work in parallel on separate files merged through build processes. Digital paratypes trade collaborative infrastructure for archival independence.

However, this constraint enables a different form of collaboration: derivative reuse. The artifact can be copied, modified, and forked for new work without institutional permission or infrastructure dependencies. The embedded data can be extracted and repurposed. A digital paratype can be forked to create new independent artifacts - each complete and functional, requiring no coordination with the original author. This differs from collaborative platforms, where data access depends on ongoing institutional support and server infrastructure.

Arithmetic Projections

The data structure is lean and purpose‑built. The dataset grows at approximately 5.3 KB per year. With a 34 KB payload representing seven years of production data and a 5 MB ceiling, the architecture can remain performant for centuries, well beyond the stated 20‑year archival target.

Values	Maths
Current payload size	34 KB
Annual growth rate	≈5.3 KB/year
Ceiling (practical limit)	5 MB
Years to ceiling	≈900 years
Size at 20 years	≈140 KB

Derived from Michigan CRA data

Source: Michigan Digital Paratype archive index (74 months, October 2019–November 2025).

Conclusion

This work shows that a zero-dependency, single-file HTML publication can support interactive data storage and multi-series visualization while minimizing long-term operational requirements. By embedding the dataset and vendoring the visualization code, the artifact remains executable offline and avoids failure modes associated with external infrastructure.

This work demonstrates what appears to be a novel document architecture: artifacts that regenerate their own analysis from embedded data. While self-contained HTML documents and executable research compendia exist separately, no examples were found combining zero-dependency execution with self-modifying narrative generation. The Michigan Cannabis analysis contains the complete logic to rewrite its narrative sections when new monthly data is inserted. Each version can produce the next version. This is not traditional reproducible research, where external users verify results by re-running code. This is self-modification by design; where the document maintains its own coherence through execution.

The implications extend beyond technical novelty. If documents can contain the process of their own regeneration, they become digital paratypes in a deeper sense than taxonomic metaphor suggests. They preserve not just content, but the machinery of authorship itself. Each copy distributed is a complete, functional replica capable of producing future iterations.

The approach is intended for modest-to-medium datasets where durability and reproducibility take precedence over large-scale streaming, complex authentication, or rapid feature turnover. Within the practical file-size ceiling described above, a monolithic HTML artifact can function as a stable, citable interface for longitudinal research and archival distribution. At 325KB for several years of data, this approach demonstrates preservation through viral distribution rather than institutional custody. The artifact survives by being copied, not protected.

For data that must persist, distribution remains the defense. This is a working implementation, not a proposal. The methodology is demonstrated at https://tjid3.org/.

Valeurs	Mathématiques
Taille actuelle de la charge utile	34 Ko
Taux de croissance annuel	≈5,3 Ko/an
Plafond (limite pratique)	5 Mo
Années jusqu'au plafond	≈900 ans
Taille à 20 ans	≈140 Ko

Werte	Mathematik
Aktuelle Nutzlastgröße	34 KB
Jährliche Wachstumsrate	≈5,3 KB/Jahr
Obergrenze (praktisches Limit)	5 MB
Jahre bis zur Obergrenze	≈900 Jahre
Größe nach 20 Jahren	≈140 KB

Valores	Matemáticas
Tamaño actual de carga útil	34 KB
Tasa de crecimiento anual	≈5,3 KB/año
Techo (límite práctico)	5 MB
Años hasta el techo	≈900 años
Tamaño a 20 años	≈140 KB

Valores	Matemática
Tamanho atual da carga útil	34 KB
Taxa de crescimento anual	≈5,3 KB/ano
Teto (limite prático)	5 MB
Anos até o teto	≈900 anos
Tamanho em 20 anos	≈140 KB

数值	数学
当前有效载荷大小	34 KB
年增长率	≈5.3 KB/年
上限（实际限制）	5 MB
达到上限的年数	≈900 年
20 年时的大小	≈140 KB

値	計算
現在のペイロードサイズ	34 KB
年間成長率	≈5.3 KB/年
上限（実用的限界）	5 MB
上限到達までの年数	≈900 年
20年後のサイズ	≈140 KB

Abstract

The Context: Archival Risk

The "Evergreen" Architecture

1. Gojira (The Data Engine)

2. Gamera (The Visualization Logic)

3. Zero-Dependency Styling

Performance & "Living" Behavior

Limitations: The 5MB Ceiling

Arithmetic Projections

Conclusion

Résumé

Le contexte : Risque d'archivage

L'architecture « evergreen »

1. Gojira (le moteur de données)

2. Gamera (la logique de visualisation)

3. Style zéro dépendance

Performance et comportement « vivant »

Limites : le plafond des 5 MB

Projections mathématiques

Conclusion

Zusammenfassung

Der Kontext: Archivierungsrisiko

Die „evergreen"-Architektur

1. Gojira (die Daten-Engine)

2. Gamera (die Visualisierungslogik)

3. Zero-dependency Styling

Leistung und „lebendes" Verhalten

Grenzen: die 5-MB-Obergrenze

Mathematische Projektionen

Fazit

Resumen

El contexto: Riesgo de archivo

La arquitectura «evergreen»

1. Gojira (el motor de datos)

2. Gamera (la lógica de visualización)

Estilo "zero dependency"

Rendimiento y comportamiento "vivo"

Límites: el techo de 5 MB

Proyecciones Matemáticas

Conclusión

Resumo

O contexto: Risco de arquivo

A arquitetura "evergreen"

1. Gojira (o motor de dados)

2. Gamera (a lógica de visualização)

Estilo "zero dependency"

Desempenho e comportamento "vivo"

Limites: o teto de 5 MB

Projeções Matemáticas

Conclusão

摘要

背景：归档风险

"常青"架构

1. 哥斯拉（数据引擎）

2. 加美拉（可视化逻辑）

3. 零依赖样式

性能与"活"文档行为

限制：5MB上限

数学预测

结论

要旨

背景：アーカイブ・リスク

「長寿命」アーキテクチャ

1. ゴジラ（データエンジン）

2. ガメラ（可視化ロジック）

3. ゼロ依存のスタイリング

性能と「生きた」振る舞い

制限：5 MBの上限

数学的予測

結論