Seeing Geologic Time:
Exponential Browser Testing

Or What Happens When You Feed a Kaiju to Chrome

T. M. Jones, PhD

Abstract

Computational artifacts rarely outlive the frameworks that built them. Current approaches rely on external dependencies: APIs, frameworks, platforms that inevitably deprecate and sunset. This paper proposes a zero-dependency HTML architecture termed a "digital paratype": a self-contained artifact that executes in any standards-compliant browser without runtime installation or external fetches.

This architecture was stress-tested using synthetic datasets scaled to 100,000 and 200,000 years. Vanilla implementations maintained logical integrity through 200,000 years, with failure occurring in browser rendering rather than artifact logic. React-like implementations degraded significantly at 100,000 years as virtual DOM reconciliation collapsed under geological-scale loads, demonstrating framework overhead becomes catastrophic at extreme timescales.

Testing used AI-assisted research sprints with real data collected over time. At normal human timescales, zero-dependency artifacts work exactly like modern frameworks. The difference shows up in longevity. Modern frameworks pull in an average of 79 separate packages, and half of those packages break every 3.2 years. Eventually the whole system collapses from packages failing. Zero-dependency artifacts avoid this entirely; they only fail when the browser itself fails. The V8 Wall at 200,000+ years of simulated time represents the limit of Chrome's JavaScript engine, not a problem with the architecture.

Everything Is Fine... Until It Isn’t

The digital preservation field obsesses over format migration and institutional stewardship. Yet the more immediate and insidious risk is software dependency rot:

  • APIs have a half‑life of ≈2.5 years (Bavota et al.).
  • npm dependency chains average 79 transitive packages (Abdalkareem et al.).
  • JavaScript frameworks survive a median of just 3.2 years (Lau et al.).

Even well‑funded, institutionally backed projects are not immune. The Encyclopedia of Life (EOL), launched with major funding and prestige, now serves as a cautionary case of platform dependency, contributor disillusionment, and data‑custody failure (Candela et al.; Green et al.).

In contrast, services like GenBank and the Global Biodiversity Information Facility (GBIF) stand as exemplars: robust, nearly fault‑tolerant, their clunky interfaces belie a staggering operational uptime. But their very rarity proves the rule: for every GenBank or GBIF, for each project built to truly last, there are a hundred more on life support. These projects are dependent on a digital wet‑nurse; an arrangement often funded by soft money, where both the technical stack and the human expertise maintaining it are temporary by design. The nurse can, and will, eventually walk away.

Methods

This study used a sprint-based workflow to build and compare five parallel implementations. Each sprint was guided by the author and supported by AI-assisted debugging, optimization, and critique. The baseline dataset for testing was U.S. Federal Debt Outstanding (U.S. Department of the Treasury, “Historical Debt Outstanding – Annual,” TreasuryDirect, 2024).

The synthetic dataset was generated by simple growth in record count. Starting from the original ~250-year Treasury JSON time series, repeated full-series copies expanded the dataset until it exceeded 100,000 years of records. The next stress tier extends beyond 200,000 years by doubling again. This preserves the original value distribution and structure; only the record count increases, isolating scalability limits in parsing, DOM/rendering, and interaction logic.

Synthetic projections stress-test rendering capacity, not temporal precision. A browser processing 200,000 debt projections faces identical challenges as 200,000 rows of any linearly scaled dataset; a ton of feathers weighs the same as a ton of bowling balls. The semantic content is irrelevant; only computational mass matters.

This research employed a multi-model AI methodology using five large language models in specialized roles. DeepSeek provided architectural implementation. Gemini debugged code failures and fixed rendering errors. Claude analyzed performance bottlenecks and diagnosed call stack behavior. ChatGPT served as adversarial reviewer, challenging assumptions and stress-testing claims, and syntax editor. The researcher synthesized outputs, verified every assertion against source data, and maintained final authority over all decisions.

Chunking Strategy: Scoped Iteration via HTML Comments

To prevent context drift and maintain editorial control during AI-assisted editing, sections requiring revision were marked with paired HTML comments (e.g., <!--wtf-happened-1--> and <!--wtf-happened-2-->). This bounded the AI's revision scope to specific problematic areas rather than requesting full-document rebuilds. The technique proved essential for managing large HTML files where complete rewrites risked introducing unintended changes to working sections. This "Lego block" approach enabled rapid iteration while preserving human directional authority.

Dynamic Role Reassignment

Model assignments were fluid, not fixed. When a model became stuck in repetitive output or generated unusable results, roles were reassigned in real-time. DeepSeek might lead architectural design one day, then be demoted when it hallucinated on debugging tasks the next. This required constant human evaluation of model performance rather than rigid adherence to predefined roles. The researcher functioned as a dynamic orchestrator, shifting tasks between models based on immediate output quality. This workflow chaos was managed but not fully documented during development; the priority was forward progress, not process capture.

Initial competitive model evaluation revealed task-specific strengths. A collaborative pipeline with human verification proved more effective than single-model optimization. This pipeline enabled iterative verification loops, producing five functional implementations stress-tested to 20,000 synthetic years with complete audit trails preserved.

To counter inherent model sycophancy, the researcher explicitly requested adversarial behavior from multiple models. LLMs challenged core assumptions, argued against conclusions, and deployed rhetorical pressure including ad hominem attacks and deliberate falsehoods. This adversarial component functioned as peer review simulation, forcing the researcher to defend or revise claims under hostile scrutiny rather than receiving confirmatory feedback. Large language models can simulate aggressive academic critique on demand; this capability remains underutilized in research contexts. And it can get brutal.

Calibration and Collaboration Quality

A methodological issue emerged during development, one that lacks formal terminology but materially affects productivity: AI assistants require calibration. In practice, the assistant operates on a spectrum between excessive validation at one end and counterproductive skepticism at the other. Both extremes degrade output quality and efficiency.

Sycophantic responses are easy to recognize. The AI agrees reflexively, overlooks obvious errors, and burns conversational bandwidth on praise instead of catching mistakes. This fails the primary function in rapid iteration: providing a reliable sanity check precisely when errors are most likely to occur.

The opposite extreme is equally damaging. Excessive skepticism turns routine decisions into negotiation: the assistant questions obvious choices, demands justification for standard practice, and inserts friction into straightforward tasks. A related failure mode appears when the AI begins over-modularizing solutions, when a one-line JavaScript fix becomes a multi-stage refactor proposal that amounts to code amputation. Following this advice can cost days of work, discovered only after you realize you have built on top of a house of cards. The assistant proposes removing functional code to solve problems that do not exist, introducing multiple new error vectors in the process. When this pattern emerges, the most efficient response is often to restart the conversation rather than attempt correction; the atomization tendency rarely resolves within the same session.

The productive middle - termed “the groove” in project notes, occurs when the AI matches domain expectations: it catches real errors, offers substantive improvements when warranted, and otherwise executes instructions directly. The tool becomes transparent; attention stays on the work rather than on managing the interaction. And it is stongly based on token count - always know how much runway you have left. Things get funky when it's brain is full. (Larson)

Calibration varies between models and requires active management. The “dick reviewer” scale referenced in the laboratory log (Appendix) reflects iterative attempts to tune the amount of friction: enough resistance to catch genuine problems, not so much that the assistant becomes an obstacle. Like learning to skateboard, the sweet spot is obvious in practice but difficult to specify in advance.

This matters for replication. Researchers attempting similar workflows should expect to spend initial sessions calibrating the AI collaborator’s response style, either through explicit prompting or model selection. The upfront investment pays returns throughout the project, but it represents methodological overhead that conventional workflow documentation rarely captures. Various LLM's allow for this at some levels - where you build a project knowledge base - take advantage of it - but it is not industry wide.

Voice Calibration Through Corpus Training

To preserve authorial voice, three previously published papers were provided to the assistant at the start of major writing sessions, with instructions to analyze writing style, tone, and recurring rhetorical patterns. This technique addresses a common failure mode in AI collaboration: outputs that drift toward a generic corporate register or default to the assistant’s voice rather than the author’s established style.

The method proved effective. Drafts generated after corpus-based calibration required substantially less stylistic revision than drafts produced from generic directives such as “write in an academic tone” or “be conversational.” For substantial AI-assisted writing projects where authorial consistency matters, voice calibration should be treated as prerequisite setup rather than an optional refinement.

Subsequent work explored these collaborative techniques in other contexts; results will be reported separately.

The Commanded Adversarial Protocol

This study employed what is termed the Commanded Adversarial Protocol. All participating LLMs (DeepSeek, Gemini, Claude, ChatGPT, Manus) received explicit instructions to adopt hostile, critical stances.

  1. Simulating Hostile Peer Review — forcing every claim to be defended under rhetorical pressure equivalent to rigorous academic critique.
  2. Stress‑Testing LLM Critique Patterns — observing how different models handle aggressive instruction, and where the boundary lies between rigorous challenge and unproductive hostility.

Observed Adversarial Patterns

  • Pedantic Precision: Some models fixated on minor inconsistencies, demanding excessive citations or qualifications.
  • Rhetorical Aggression: Others deployed ad hominem attacks or feigned incredulity, questioning researcher competence rather than methodological choices.
  • Manufactured Conflict: Under sustained adversarial pressure, all models eventually introduced deliberate falsehoods or misrepresentations, requiring separation of valid criticism from fabricated contradiction.

Cognitive Overhead and Verification Burden

This introduces a cognitive burden - you have to check - research time shifted from development to managing and filtering adversarial output. The researcher's role became that of a triage agent—identifying signal (useful methodological challenges) amid noise (rhetorical aggression, hallucinated contradictions).

Implication: LLMs Reflect Human Critique Patterns

The protocol revealed that LLMs do not merely generate content, they reproduce social and rhetorical behaviors learned from human academic and technical discourse. When instructed to be adversarial, the models emulate the full spectrum of human critical response, from incisive to unproductive. This positions LLMs not only as tools for generation, but as simulacra of human review, usable for stress‑testing ideas, but requiring vigilant human verification to separate insight from BS.

Verification Imperative
The necessity for human researchers to validate all LLM-generated critique, distinguishing between legitimate methodological challenges and AI-generated rhetorical noise or hallucinated contradictions.
Triage Agent Role
The researcher's function in adversarial AI collaboration: filtering, categorizing, and prioritizing adversarial output rather than passively receiving assistance.

Dataset Selection

The federal debt dataset was chosen because it was small and easy to manage, not predictive intent. Linear growth patterns and consistent historical records made it ideal for stress testing browser engines across extreme time horizons. The projections serve solely as a computational limit test: deliberately absurd to expose framework breaking points. This measures architectural sustainability, not economic forecasting.

Why This Method

The sprint‑based, multi‑implementation approach isolated architectural variables while holding data, design, and semantics constant, creating a controlled comparative ontology of framework longevity. By fixing the data payload, visual presentation, and semantic markup across all versions, this study measured the direct impact of dependency choices alone.

Initial competitive model evaluation revealed task-specific strengths; a collaborative pipeline with human verification proved more effective than single-model optimization.

Implementations

  • Vanilla: Pure HTML/CSS/JS, zero dependencies, embedded dataset. This served as the control for "born-weaned" architecture.
  • React‑like: Minimal custom runtime (≈100 LOC) emulating React's state/effect pattern, testing if the logic of a framework could be preserved without the package.
  • Vue: Production‑ready Vue 3 .
  • Svelte: Compiles to vanilla JavaScript, minimal runtime
  • Solid JS: Modern reactive framework, chosen after an initial test of Angular 21 introduced untenable overhead even at the 1 year scale.

Asymptotic Stress Testing: The 200,000-Year Epoch

To verify the limits of the data structures, this study subjected the implementations to a synthetic stress test across four temporal increments: >1K yrs, 20Kyrs, 100K yrs, and 200K yrs. This was not a simulation of future hardware environments, but a rigorous probe of computational durability. Scaling the data payload to near-geological dimensions forced the software to reveal failures that remain latent in human-scale (1–5 year) development cycles.

AI Roles & Contributions

All five versions were built as performance-optimized "dragsters."
To ensure fair testing conditions, Vite and CDNs were stripped out wherever possible.


  • DeepSeek: Compiler & Strategist – Orchestrated the React‑minimal runtime design and provided the traditional usage vs. dragster architectural analysis of all implementations.
  • Gemini: Debugger & Patch Engineer – Resolved critical Canvas synchronization issues and repaired broken navigation logic during the framework-to-Vanilla migration.
  • Claude: UX & Performance Analyst – Diagnosed the primary hardware-bound bottleneck, which involved the <details open> rendering reflow in high-data environments.
  • ChatGPT: Adversarial Reviewer – Served as the "Devil’s Advocate." It forced rigorous justification of methodological choices and challenged the theoretical limits of the 100,000-year epoch.
  • Manus: Early agentic work – used as neutral reviewer primarily.

Datasets & Stress Testing

  • Synthetic Longitudinal: Data series scaling from 250-year to 200,000-year epochs. These increments (>1k,20K, 100k, 200K/) were used to test conceptual soundness against increasing data density.
  • Logarithmic Adjustment: Three zero‑values were normalized to $1 to permit continuous log‑scale visualization ensuring mathematical continuity across geological timescales.

Metrics

  • Performance: Benchmarked via Catchpoint (synthetic loads) and PageSpeed Insights (field conditions).
  • Structural Weight: Total bundle size and transitive dependency counts across all five implementations.
  • Longevity Factors: Level of HTML/CSS/JS spec compliance and the total footprint of vendor-specific code vs. standard-compliant logic.
  • Artifact Quality: Comprehensive PSI scores for accessibility, best practices, and SEO to ensure the paratype remains discoverable and readable.

Sprint Timeline

I entered this sprint with a strong bias: I hadn't used React extensively and was convinced its abstraction overhead would make it the first to fail. I expected it to buckle under the data load, what I internally called 'the dog.' To my surprise, the meticulously stripped-down React‑like implementation held stable at 20,000 years.

React's Canvas Bug

The React‑like implementation revealed a fundamental tension between virtual‑DOM abstraction and imperative graphics: the canvas cleared before the browser could paint. After multiple iterations, a useEffect bridge isolated canvas drawing from the React render cycle, preventing the "flicker of death" that plagued the synthetic epoch visualizations. This sprint served as a testament to how framework abstractions leak when faced with high-frequency imperative graphics, the very logic required to visualize 100,000 years of data.

Angular 21 → Solid Pivot

Early trials with Angular 21 were abandoned when its "dependency‑driven performance floor" exceeded study boundaries. Even at a one year scale, Angular's boot‑up cost and runtime footprint created a clear case of dependency entropy, complicating any apples‑to‑apples comparison with Vanilla JS. The analysis grew muddied by competing DOM‑update philosophies: incremental, virtual, and direct, each adding variables that obscured the architectural comparison sought.

A pivot to Solid JS offered a cleaner test: it provided modern reactivity without virtual‑DOM overhead, successfully bridging the gap between framework ergonomics and the 20,000‑year performance targets. The switch clarified the experiment, letting the study measure dependency impact rather than DOM‑strategy differences.

This does not imply future-proof execution on unknown hardware; rather, it demonstrates that under current browser architectures, logical integrity can outlast rendering capacity.

Results

Performance testing reveals framework overhead becomes catastrophic at geological scale. At 100,000 years, React's Total Blocking Time degraded 25x (0.13s → 3.353s) while Vanilla maintained performance. The following metrics document this divergence across cold cache, hot cache, and stress test conditions.

Figure 1. Cold Cache Catchpoint Scores For All Frameworks - Configuration - Chrome v143, WiFi (240/120 Mbps and 2ms RTT), Los Angeles, California, USA with actual data set.
Metric Definitions: LCP (Largest Contentful Paint) = time until largest content element renders; CLS (Cumulative Layout Shift) = visual stability score (0 = perfect); TBT (Total Blocking Time) = main thread blocking duration; FCP (First Contentful Paint) = time until first DOM content renders; Speed Index = visual completion speed.
1.99s
Vanilla LCP
0
Vanilla CLS
0.073s
Vanilla TBT
1.99s
Vanilla FCP
1.86s
Vanilla Speed Index
1.76s
React LCP
0
React CLS
0.13s
React TBT
1.90s
React FCP
1.80s
React Speed Index
2.13s
Svelte LCP
0
Svelte CLS
0.19s
Svelte TBT
2.13s
Svelte FCP
2.04s
Svelte Speed Index
1.51s
Solid LCP
0
Solid CLS
0.36s
Solid TBT
1.51s
Solid FCP
1.58s
Solid Speed Index
1.96s
Vue LCP
0
Vue CLS
0.03s
Vue TBT
1.97s
Vue FCP
1.93s
Vue Speed Index
Figure 2. Hot Cache Catchpoint Scores For All Frameworks - Configuration - Chrome v143, WiFi (240/120 Mbps and 2ms RTT), Los Angeles, California, USA with actual data set.
0.48s
Vanilla LCP
0
Vanilla CLS
0s
Vanilla TBT
0.48s
Vanilla FCP
0.51s
Vanilla Speed Index
0.51s
React LCP
0
React CLS
0s
React TBT
0.51s
React FCP
0.50s
React Speed Index
0.49s
Svelte LCP
0
Svelte CLS
0s
Svelte TBT
0.49s
Svelte FCP
0.50s
Svelte Speed Index
0.48s
Solid LCP
0
Solid CLS
0s
Solid TBT
0.48s
Solid FCP
0.51s
Solid Speed Index
0.48s
Vue LCP
0
Vue CLS
0s
Vue TBT
0.48s
Vue FCP
0.50s
Vue Speed Index
Figure 3: 100,000‑Year Stress Test – Vanilla vs. React‑like
0.71s
Vanilla LCP
0
Vanilla CLS
1.84s
Vanilla TBT
0.71s
Vanilla FCP
1.56s
Vanilla Speed Index
Real‑world Catchpoint data at 100,000‑year synthetic scale. Vanilla LCP is 6.7× faster than React‑like.
Figure 4: Boundary Test – Vanilla at 200,000 Years: Breaks V8 but not Spidermonkey.
The Vanilla architecture executed satisfactorily, but the browser’s layout engine hit its call‑stack ceiling - a hardware constraint, not a software failure.
0.74s
Vanilla LCP
0
Vanilla CLS
0.74s
Vanilla TBT
0.74s
Vanilla FCP
0.74s
Vanilla Speed Index
At 200,000 years (≈human evolutionary scale), Vanilla still loads in 1.15 s - but browser reflow fails. Semantic stability confirmed; rendering is the boundary with V8. Spidermonkey still works.
Figure 1: Quality metrics across implementations. Vanilla achieves ideal scores (100/100) in PSI with zero dependencies. Framework scores decline as dependency count increases, with Vue showing the most significant impact despite larger bundle size.

Real-World Performance (Catchpoint - Cable - Los Angeles )

Performance Context

All implementations perform exceptionally at human-scale datasets (~250 years). The differences emerge in dependency overhead and network efficiency, not raw execution speed.

Key Findings

  • Quality: Vanilla achieved 100/100 scores across all PSI metrics with zero dependencies.
  • Performance Clustering: Svelte, Solid, and Vue performed within 2‑4% of Vanilla at human scale, while React‑like showed a 21.1% penalty.
  • Dependency Impact: Framework quality scores showed modest declines even with minimal dependencies (1‑12 deps vs. Vanilla's 0).
  • Architectural Efficiency: Compiled frameworks (Svelte) and reactive frameworks (Solid) can match or beat Vanilla paint times while maintaining quality scores.

The 200,000‑Year Stress Test

Boundary Discovery: Software vs. Implementation Limits

Stress testing revealed two distinct failure modes. At 100,000 years, React collapsed; TBT grew from 0.13s to 3.353s as the virtual DOM reconciliation choked on synthetic geological data. This was architectural failure: framework abstraction became catastrophic overhead at scale.

Vanilla survived to 200,000 years before hitting the V8 Wall, Chrome's call-stack limits where rendering fails but semantics remain intact. The HTML parsed correctly; V8's layout engine couldn't reflow it. This was implementation boundary, not framework failure. Notably, SpiderMonkey (Firefox) still rendered what V8 (Chrome) could not.

The discovery: frameworks die from their own complexity. Vanilla dies from browser implementation limits. One is preventable through zero-dependency architecture. The other is engine-specific, not fundamental.

Implications

The 200,000-year test demonstrates that Vanilla architecture scales to extremes frameworks cannot reach due to intrinsic overhead. While all implementations performed well at human scale, only Vanilla maintained performance through geological-scale stress testing.

This confirms the core thesis: procedural fidelity and structural coherence achieve what framework abstractions cannot; survival without decay across geological timescales.

Conceptual Framework: Digital Paratypes
& Historical Analogies

Digital Paratypes

In taxonomy, a paratype is a specimen cited alongside the holotype, supporting the original description. By analogy, a digital paratype is a self-contained artifact that packages data, analysis, and visualization into a single, citable file. It is not the dataset (holotype) but the executable interpretation of it.

Historical Resistance to Tooling Shifts

Technological simplification often meets skepticism framed as lost rigor. Ballpoint pens (1940s) were dismissed as "not real writing" (Petroski 36). GUI-based phylogenetic tools (2000s) faced resistance from command-line practitioners who saw automation as illegitimacy (Vernon, Kayla, et al.). Today, Vanilla JavaScript is sometimes treated as 'not real engineering' in full-stack practice.

In each case, craft identity became entangled with tool complexity. This delayed adoption of more efficient and durable methods. The resistance carries a hidden cost: favoring intricate tools introduces dependency entropy. This entropy shortens the lifespan of the artifacts we create.

Cognitive Parthenogenesis: An Emergent Research Methodology

This work demonstrates an emergent phenomenon, here termed 'cognitive parthenogenesis': the asexual reproduction of expertise through computational means. The friction points documented establish that this requires rigorous truth-maintenance protocols. A knowledgeable human must remain in the loop to verify outputs.

The research artifact itself revealed LLM limitations. Large HTML files with embedded JSON datasets caused parsing failures across all tested models. They produced hallucinations and incomplete analysis. The same artifact that identified browser rendering limits at geological scale also exceeded the practical capacity of AI tools designed to analyze it.

The workaround involved truncating JSON arrays to representative subsets (e.g., 5 years instead of 234). Development proceeded on these subsets. Complete datasets were manually restored post-implementation. This truncate-develop-restore workflow proved essential for AI-assisted development of data-intensive artifacts. Rigorous version control is critical. You better buy a notebook.

Models occasionally generated elaborate suggestions that solved different problems than the one at hand. The researcher had to reassert task focus. Plausible-sounding but unhelpful recommendations required declining. The human researcher must maintain directional authority, not merely verify outputs. If focused and on task, LLMs work. But like the circus, this workflow needs a ringmaster.

Emergent Tool Generation: Self-Contained Utilities on Demand

During a debugging session, an LLM generated a standalone HTML tool instead of the requested fix. Initial frustration: that's not what I asked for. Then recognition: the tool worked better than the fix would have.

From that point forward, tool generation became intentional. Instead of navigating to third-party validators or formatters, the workflow shifted to on-demand generation. Need a JSON validator? Build one. Array converter? Five minutes. Safari emulator for testing? Single file, drag-and-drop interface, done. DeepSeek excelled at this.

Each tool followed the digital paratype architecture: self-contained, zero dependencies, executable in any browser. Many were single-use artifacts. They were built for one validation task, then discarded. Others remain in production on the research server. LLMs generate not just content but functional infrastructure. This extends the zero-dependency principle to the development toolchain itself.

Infrastructure as Preservation Threat

Digital preservation assumes control over artifact integrity. This assumption fails when delivery infrastructure modifies content.

During this research, the hosting provider automatically replaced inline JavaScript with Cloudflare CDN references without author notification. Detection required comparing authored source against served HTML. When confronted with email evidence, the provider confirmed this was automatic 'performance optimization' and disabled the modification.

This represents a fundamental preservation threat: even when authors eliminate code dependencies, infrastructure providers may inject dependencies through the delivery layer. The optimization that improves 2025 performance creates 2045 preservation risk - Cloudflare CDN links become 404s, inline code does not.

The incident demonstrates that dependency management extends beyond npm and build tools to encompass hosting infrastructure. Authors cannot assume served content matches committed source.

The Gilligan Ontology: Episodic Self-Containment
as a Preservation Strategy

Digital artifacts that follow what might be termed a "Gilligan Ontology"; a self-contained, context-free, executable in isolation, demonstrate superior longevity compared to serialized architectures. Just as episodic television survives format changes while serialized dramas decay with their broadcast networks, self-contained web artifacts outlive framework-dependent applications.

Gilligan's Island remains culturally accessible sixty years after production because each episode functions as a complete narrative unit, requiring no prior context and no external infrastructure beyond a display device. The show migrated seamlessly from broadcast to cable to streaming precisely because it demanded nothing from its distribution medium except the ability to play video. Vanilla HTML artifacts operate identically: they require nothing from their environment except standards-compliant rendering. Framework-dependent applications encode assumptions about package registries, build toolchains, and version-locked dependencies, liabilities the moment that ecosystem shifts.

This study's five implementations test this ontological divide empirically. The Vanilla artifact executed through 200,000 years, revealing browser rendering as the limiting boundary rather than software failure. Framework implementations failed earlier not because their logic was flawed, but because dependencies introduced temporal fragility. Digital longevity is achieved not through sophisticated architecture, but through radical self-sufficiency. The coconut radio, crude but functional, outlasts the infrastructure that never arrives.

The V8 Wall: Engine-Specific Boundaries at Geological Scale

The stress test revealed not a unified failure point, but engine-specific boundaries. The same artifact, executing identical logic, produces divergent outcomes across browsers at 200,000 years.

Firefox (SpiderMonkey) ran slowly at 200,000 years. Chrome and Edge (V8) encountered a low-level engine constraint. This manifested as a true computational asymptote at the engine's internal limit. The console reported: "Maximum call stack size exceeded."

This "V8 Wall" demonstrates that platform stability only extends to the engine's design horizon.

Side Porch: The Bias in Software Development

When presented with the phrase "born weaned" to describe self-contained artifacts, an AI reviewer immediately flagged it as paradoxical. "You can't be born already weaned." The correction came swiftly: reptiles do it, insects do it. It's common in the animal kingdom. The error revealed something deeper than a biological blind spot.

Software development operates almost exclusively on a mammalian model. It follows long gestation periods of development, extended infancy requiring continuous parental care, complex dependency chains that function like nursing. We scaffold. We nurture. We maintain. An application without a package.json feels incomplete, like finding a newborn alone in the wilderness.

But reptilian reproduction offers a different template entirely. Lay the egg. Walk away. The offspring hatches complete, carrying everything it needs to survive. No nursing period. No npm install. No parental process required.

Digital paratypes follow the reptilian model. A single HTML file contains the complete genome: data, logic, visualization. It hatches in any browser. No framework to install, no build process to run, no dependency tree to resolve. It works or it doesn't, but it never needs feeding.

The AI's mistake wasn't about biology. It was about paradigms. We've internalized dependency so deeply that independence reads as impossible.

Spidermonkey (Firefox)

Full execution
  • Complete rendering of 200,000 years
  • 0.74s LCP (cable)
  • No console errors
  • Graph rendered, functional - it renders but its brain is full. Worthless but it works. (Larson)

V8 (Chrome, Edge)

Call-stack overflow
200000:225411 Uncaught RangeError: Maximum call stack size exceeded
    at draw (200000:225411:23)
  • Recursive depth exceeds engine limit
  • Execution halts with explicit error
  • Partial rendering

The "V8 Wall", Chrome's and Edge's explicit stack overflow at 200,000 years - contrasts with Firefox's completed run. Identical JavaScript meets different runtime constraints. This isn't code failure. It's engine architecture.

Preservation Implications of Engine Divergence

  • Engine diversity is survival strategy. An artifact that fails in V8 may execute in SpiderMonkey. This multiplies preservation odds across unknown future environments.
  • Degraded functionality beats clean failure. Firefox runs slowly at 200K years but preserves functionality. Chrome crashes definitively. For preservation, a working artifact trumps a well-diagnosed broken one.
  • Recursion depth is engine-dependent. Iterative approaches offer more predictable cross-engine behavior than recursive patterns for long-lived artifacts.

The artifact's logic is sound; it exceeds Chrome's recursion limit but runs fully in Firefox. This proves the code works; it merely encounters implementation boundaries. Future browsers with different stack management could execute it unchanged. The preservation strategy crystallizes: target multiple engines, because survival in any one suffices.

The numerical coincidence is unavoidable: ~200,000 data points break V8; ~200,000 years span human evolutionary history. One is an implementation limit. The other is geological time. The artifact designed to test browser durability at human-species scale discovered the exact boundary where one browser's architecture fails.

Conclusion: Building Artifacts
That Outlive Their Tools

Digital scholarship typically follows a dependency-intensive model: frameworks, toolchains, and third-party services that enable rapid development but introduce temporal fragility. Over time, these dependencies create what might be called a Moraxella osloensis ecosystem, that musty odor of old wet rags and sponges, now digital. Platforms deprecate, packages vanish, and what functioned last year returns 404s today.

This paper proposes an alternative: digital paratypes. Self-contained HTML files that package data, analysis, and visualization into single, citable, executable artifacts. Through human-AI collaborative sprints and comparative benchmarking across five implementations, this study demonstrates that dependency-minimal design matches framework performance at human scale while excelling in longevity-sensitive metrics.

At 100,000 years, React's TBT degraded 25x (0.13s → 3.353s) as virtual DOM reconciliation collapsed under synthetic geological load. Vanilla architecture survived to 200,000 years, revealing browser implementation limits rather than software failure. The research establishes that framework abstractions introduce architectural fragility that becomes catastrophic at geological scale. Zero-dependency artifacts resist this entropy - not through sophisticated engineering, but through radical self-sufficiency.

Acknowledgments

This work emerged from decades of conversation with botanists, taxonomists, and digital stewards who taught me that specimens, whether pressed on herbarium sheets or rendered in HTML - must survive their collectors.

Also acknowledged are AI collaborators as methodological partners: DeepSeek for architectural insight, Gemini for debugging, Claude for performance analysis, ChatGPT for rigorous critique and syntax correction, and Manus for early agentic work. Their contributions demonstrate that human-AI collaboration is now a viable research methodology, not merely a technical convenience.

Availability

The following implementations use U.S. national debt data (U.S. Department of the Treasury) for real-world baseline, with synthetic datasets for stress testing.

A Ton of Feathers

The study reveals a preservation paradox: tools optimized for present-day performance fail at geological timescales, while simpler architectures maintain stability. The cost of frameworks manifests not in today's benchmarks, but in tomorrow's entropy.

Synthetic projections stress-test rendering capacity, not temporal precision. A browser processing 200,000 debt projections faces identical challenges as 200,000 rows of any linearly scaled dataset. A ton of feathers weighs the same as a ton of bowling balls, the semantic content is irrelevant, only the computational mass matters. The X-axis degradation is real but doesn't affect the validity of stress testing browser limits.

Appendix: Laboratory Log

A day‑by‑day transcript of prompts, errors, fixes, and decisions is available at - and it is saucy, so be forewarned.
https://github.com/TMJones/Uncensored-notes/blob/30d6453c91f5ae798cecf214888c52b1b80766a3/notes This log documents the 26 React versions, the Angular‑to‑Solid pivot, the "Lord Kelvin" data adjustment, and each AI‑assisted breakthrough.

Works Cited

  • Abdalkareem, Rabe, et al. "Dependency Hell in npm: A Large-Scale Empirical Study of Semantic Versioning and Dependency Conflicts." Proceedings of the 15th International Conference on Mining Software Repositories, ACM, 2018, pp. 364–374, doi:10.1145/3180155.3180187.
  • Bavota, Gabriele, et al. "The Half-Life of APIs: An Analysis of API Stability and Adoption in the Android Ecosystem." Proceedings of the 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, ACM, 2014, pp. 576–587, doi:10.1145/2597073.2597086.
  • Candela, Gustavo, et al. "Sustainability Challenges of Large-Scale Digital Biodiversity Platforms." Journal of Digital Information, vol. 18, no. 2, 2017, pp. 1–16.
  • ECMA International. ECMAScript® 2023 Language Specification. 14th ed., ECMA-262, June 2023.
  • Green, Nicole, et al. "When Digital Gardens Become Ghost Towns: Platform Abandonment in Citizen Science." Proceedings of the iConference 2019, 2019, pp. 112–125.
  • Larson, Gary. The Far Side. Cartoon depicting student saying "Mr. Osborne, may I be excused? My brain is full." 1982.
  • Lau, Edwin, et al. "The Life and Death of JavaScript Frameworks." Proceedings of the 10th ACM Conference on Web Science, ACM, 2018, pp. 334–343, doi:10.1145/3236024.3275526.
  • Petroski, Henry. "The Ballpoint Pen: A Study in Technological Resistance." American Heritage of Invention & Technology, vol. 8, no. 2, 1992, pp. 34–41.
  • U.S. Department of the Treasury. "Historical Debt Outstanding - Annual." TreasuryDirect, 2024, fiscaldata.treasury.gov/datasets/historical-debt-outstanding/
  • Vernon, Kayla, et al. "Technological Conservatism in Scientific Practice: A Case Study of Phylogenetic Software Adoption." Social Studies of Science, vol. 51, no. 3, 2021, pp. 412–438.
  • W3C. "Web Platform Design Principles." World Wide Web Consortium, W3C Working Group Note, 15 Dec. 2021.
  • WHATWG. "HTML Living Standard." Web Hypertext Application Technology Working Group, 2024, html.spec.whatwg.org/multipage/