The 20-Year File: A Self-Contained Format for Durable Scholarly Publication
Abstract
The 20-Year File is a single archival HTML artifact that carries its own data, logic, styling, figures, and narrative. It opens locally in a standard browser, runs without installation, and makes no external requests. The point is not nostalgia. The point is reducing the number of things that can break. This note describes the architecture through the Michigan Cannabis Market Analysis (DOI: 10.5281/zenodo.18065297), a longitudinal economic dataset with 74 months of regulatory data at publication. The raw data are embedded as compact JSON. The browser reads that payload, recalculates the summaries, refreshes labels, and redraws the figures at load time. Borrowing from taxonomy, the raw dataset can be treated as a digital holotype. The working paper, with data, analysis, interface, and explanatory machinery bundled together, functions as a digital paratype. It is not a PDF imitation. It is an executable scholarly object.
A durable research artifact should not depend on a CDN, package manager, server route, framework hydration step, or institutional dashboard to remain legible.
Availability: The live artifact is accessible at:
Live Artifact https://tjid3.org/The Context: Archival Risk
Longitudinal datasets can remain useful for decades. Their interfaces often do not. A chart may be scientifically valuable while the framework, CDN, build pipeline, or API that rendered it has already disappeared. This is the failure mode addressed here: the record survives, but the instrument for exploring it does not. Physical archives fail through fire, water, or neglect. Digital artifacts fail through dependency decay.
Modern publishing often solves short-term deployment problems by adding layers. For archival work, those layers become liabilities. Frameworks churn. APIs are deprecated. Package managers change. Browsers remain, but the scaffolding around a project may vanish.
The 20-Year File inverts that risk. The artifact collapses the stack into one browser-readable file: no server, no build step, no login. The complete interactive paper runs as one HTML file, locally, in a standard browser, with data and logic embedded inside it. A file that can be emailed, mirrored, forked, and opened offline is easier to preserve because it is easier to distribute. It is a digital postcard.
The Evergreen Architecture
The architecture separates the artifact into three internal parts, but keeps all three inside the same file. The pattern is “Gojira & Gamera”: one block carries the data, another carries the logic, and inline CSS carries the presentation.
The architecture did not begin as a theory. It began as impatience. Wanted a living document that could be updated without rebuilding a website, repairing a dashboard, or changing the same values in six places. The practical answer was to pull the data into the file, let the browser recalculate the summaries, and let the page rewrite its visible state from that embedded payload. Only later did the preservation argument become obvious. Once the data, logic, figures, styling, and narrative were traveling together, the artifact had fewer ways to break. What began as a lazy maintenance trick became an archival design rule: keep the object complete, keep the stack small, and make each copy capable of running on its own.
The file is not a framework, but it can be used like one. A careful reader can save it, strip the dataset, replace the payload, revise the labels, and produce a new working artifact without installing anything. The method travels as an object, not a service.
1. Gojira, the data engine
The dataset is embedded directly into the document as a compact JSON-style array. There is no loading spinner because there is no fetch.
// Example embedded data structure
const rawData = [
["10/1/2019", "Oct", 2019, "$28,594,402", ...],
["11/1/2019", "Nov", 2019, "$26,628,235", ...]
];
2. Gamera, the state engine
The logic runs as a small, scoped JavaScript module. It reads the embedded data, computes the current state, and writes the visible page.
const App = (() => {
const state = { data: [], charts: {}, activeFilter: 'flower' };
const init = () => {
state.data = Logic.process(rawData);
Render.charts();
Render.narrative();
};
return { init };
})();
3. Vendored or native visualization
When a library is necessary, it is treated as material, not infrastructure. The code is vendored into the artifact rather than fetched from a CDN. When native SVG is enough, the artifact uses SVG directly.
The result is a working paper whose parts cannot drift apart. Data, code, interface, and explanation ship together as one object, a digital postcard that still runs.
Performance & Behavior
A zero-dependency artifact should feel fast because the browser is doing less ceremony. The file already contains its data. The logic is local. The interface does not wait for a framework to hydrate.
State engine behavior
On load, Gamera checks the embedded Gojira payload. It recalculates durations, updates labels, refreshes charts, and rewrites summary text from the data already inside the file.
This is the “evergreen” part of the method. A monthly JSON update does not require rebuilding the page. Replace Gojira, open the file, and the paper reconstitutes itself.
Reproducible baseline
Catchpoint and Google PageSpeed Insights provide a practical performance check. The production build remains small, fast, and stable because it avoids network calls and framework overhead.
Forking the artifact
To fork the method, save the HTML file, rename the paper, replace the domain content, and swap in a new embedded dataset. The artifact starts complete. No build chain has to be resurrected.
Limitations: The 5MB Ceiling
This architecture is not a general substitute for databases, collaborative software, authenticated dashboards, or streaming applications. It is for modest-to-medium research artifacts where longevity matters more than platform scale.
The practical ceiling is file size. Around 5MB, download and parsing latency begin to matter on ordinary devices and networks. Below that ceiling, behavior remains predictable.
The monolithic structure also changes collaboration. It is poor for teams that need separate CSS, JavaScript, backend, content, and deployment workflows. It is excellent for copying, preserving, teaching, forking, and archiving a finished scholarly object.
The tradeoff is deliberate: fewer moving parts, fewer future failure points.
Arithmetic Projections
The longevity claim is not mystical. It is arithmetic. The current payload is small, and the annual data growth rate is modest.
At roughly 5.3 KB per year, a 34 KB payload reaches only about 140 KB after 20 years. Against a conservative 5 MB practical ceiling, that leaves centuries of headroom for this class of dataset.
| Values | Maths |
|---|---|
| Current payload size | 34 KB |
| Annual growth rate | ≈5.3 KB/year |
| Ceiling (practical limit) | 5 MB |
| Years to ceiling | ≈900 years |
| Size at 20 years | ≈140 KB |
Source: Michigan Digital Paratype archive index (74 months, October 2019–November 2025).
Conclusion
The 20-Year File makes a simple swing: if the data, logic, styling, figures, and explanation travel together, the work has a better chance of surviving. The Michigan Cannabis artifact is the working example. The data sit inside the file. The browser opens it, runs the logic, recalculates the summaries, redraws the figures, and gets on with it. No server has to wake up. No package manager has to behave. No dashboard has to still exist.
That is different from traditional reproducible research, where the reader may have to rebuild an environment before checking the result. Here, the environment is part of the paper. Each copy is the work. This method is not for everything, but for durable, modest-scale datasets, the value is plain: no server dependency, no package dependency, no CDN dependency, no framework hostage situation. For data that must persist, distribution is still the defense. Make the file small enough to move, complete enough to trust, and boring enough to survive.
Working Implementation https://tjid3.org/