Code bundle splitting of Elm programs

2024-03-02

Splitting the code bundle is not a topic that comes up often with Elm, because most times we are blessed with small bundles without the headache.
These improvements introduced in 2018 remove unused Elm code before the JavaScript code is generated. So no useless JS code makes it into the bundle when using the --optimize flag and dead code is effectively eliminated.

To summarize the first two links: Elm js bundles are small in comparison.
For instance the Elm implementation of TodoMVC weighs 9kb (if uglified), and the Elm realworld example weighs 29kb. For comparison the React runtime alone weighs 32kb (back in 2018).
From the list of maintained realworld examples, I looked at a few react versions and this one seems to be one of the smallest ones (in bundle size), and generates two files totaling 85kb (2.a9e8cf08.chunk.js 77kb and main.7edcaa81.chunk.js 9kb).

All numbers assume that the JS code was minified (e.g. with uglify-js or terser) and gzipped.

There is also a comparison from 2020 looking at bundle size, lighthouse performance score and time-to-interactive, where Elm performs very well.

Nevertheless, if I'm now assuming that my Elm program bundle is too big and I want to reduce the initial bundle size, I can split my single big Elm program into multiple smaller ones.

To skip ahead, in the end we can load each program from one js file and all common code (like the Elm runtime) is placed in a shared file that is imported by each program and downloaded only once.

But first, how to split our big program into multiple smaller ones?

Splitting into different programs

A good example how to split the code into multiple programs is given in this post on r/elm: To split an admin section (behind a login form) into its own Elm program, completely separating it from the anonymous user program.

This approach can easily share code to render many of the things the anonymous users see also in the admin view (maybe even extended by buttons to allow for content moderation).

Now the guest users don't have to download all the code needed for the admin pages, which could be a great way to reduce the bundle size (depending on the amount of admin pages).

But if I'm pretty sure that many of the people browsing the public area will also use the second program for admins, it could make sense to share the common code also in the JS bundles and not download that code twice.

Sharing js code between multiple program bundles

The Elm compiler cannot do this (yet?), and it seems a hard thing to add to the compiler itself (if you want to compile the programs individually).
But it is definitely possible by post-processing the generated JS code after compiling multiple programs into one bundle.

First, I need the Elm compiler to bundle both programs into one file:

elm make --optimize --output=bundle.js src/One.elm src/Two.elm

And then I can run split-elm-bundle and get output like this:

> npx split-elm-bundle examples/from-aide/compiled/BrowserSandbox+BrowserElement.optimize.js
Working in directory examples/from-aide/compiled
Read BrowserSandbox+BrowserElement.optimize.js 115.1KiB (27.0KiB gzip)
Extracting BrowserElement
Extracting BrowserSandbox
Wrote BrowserSandbox+BrowserElement.optimize.BrowserElement.mjs 7.7KiB (1827B gzip)
Wrote BrowserSandbox+BrowserElement.optimize.BrowserSandbox.mjs 3336B (741B gzip)
Wrote BrowserSandbox+BrowserElement.optimize.shared.mjs 52.2KiB (12.8KiB gzip)

And then I can import either of the program bundles (and the shared code will be imported twice but downloaded only once).

<main id="elm"></main>
<script type="module">
  import { BrowserElement } from './BrowserSandbox+BrowserElement.optimize.BrowserElement.mjs';
  const app = BrowserElement.init({ node: document.getElementById('elm') });
</script>

Here are the four general steps I needed to extract the shared code into one bundle:

1. Convert IIFE to ESM

First, I turn the IIFE (immediately-invoked-function-expression) that the Elm compiler generates into a valid ES module (ESM) that exports each program.
With that, I can already import it somewhere else without using a global window.Elm:

import Elm from './bundle.js';
// Or
// import * as Elm from './bundle.js';
// and initialize the program
Elm.One.init({ node: document.body })

// Or import a single program
import { One } from './bundle.js';
// and use it like this
One.init({ node: document.body })

2. Track dependencies of every top-level definition (e.g. functions)

Then I parse the whole bundle (out-sourced to tree-sitter) and create a list of all top-level definitions like functions or global state.
For each of them, I also track all dependencies that it needs.

I ignore everything available in the window or globalThis context, because the browser supports this.

As an example, the _VirtualDom_init function needs F4, _Debug_crash and _VirtualDom_render.

var _VirtualDom_init = F4(function(virtualNode, flagDecoder, debugMetadata, args)
{
    // NOTE: this function needs _Platform_export available to work

    /**_UNUSED/
    var node = args['node'];
    //*/
    /**/
    var node = args && args['node'] ? args['node'] : _Debug_crash(0);
    //*/

    node.parentNode.replaceChild(
        _VirtualDom_render(virtualNode, function() {}),
        node
    );

    return {};
});

And its dependency F4 only needs the function F.

function F4(fun) {
  return F(4, fun, function(a) { return function(b) { return function(c) {
    return function(d) { return fun(a, b, c, d); }; }; };
  });
}

Which looks for instance like this (from dependency-graph.test.mjs)

{ "name": "_VirtualDom_init",
, "needs": [ "F4", "_Debug_crash", "_VirtualDom_render" ],
, "startIndex": 1, "endIndex": 384
}

{ "name": "F4"
, "needs": [ "F" ]
, "startIndex": 1, "endIndex": 160
}

The indices are kindly provided by tree-sitter, and needed for the last step of copying.

3. Find the shared code

After that I can generate a set of dependencies for each exported program, and also a set of shared dependencies.
A few dependencies must be placed in the shared dependencies even if they are only used by one program, but not many.

4. Copy the code into destination files

In the end, I only need to copy the chunks of code into the correct destination files.

Note: In my trivial tests, importing all of the module generated smaller files compared to importing each needed function, so I went with that approach and generate

import * as shared from './bundle.shared.mjs';
[...]
var $author$project$BrowserSandbox$main = $elm$browser$Browser$sandbox(
    {init: $author$project$BrowserSandbox$init, update: $author$project$BrowserSandbox$update, view: $author$project$BrowserSandbox$view});
export const BrowserSandbox = { init: $author$project$BrowserSandbox$main(
    shared.$elm$json$Json$Decode$succeed(shared._Utils_Tuple0))(0) };

instead of code like this:

import { $elm$json$Json$Decode$succeed, _Utils_Tuple0, [...] } from './bundle.shared.mjs';
[...]
var $author$project$BrowserSandbox$main = $elm$browser$Browser$sandbox(
    {init: $author$project$BrowserSandbox$init, update: $author$project$BrowserSandbox$update, view: $author$project$BrowserSandbox$view});
export const BrowserSandbox = { init: $author$project$BrowserSandbox$main(
    $elm$json$Json$Decode$succeed(_Utils_Tuple0))(0) };

Approach

I did not use what I would consider a professional approach to this script (like thorough investigation of the source and research up front).
I just tried to parse and split different programs and added tests to detect and work around specific issues (like the mean elm/markdown with its inlined outdated marked js library that tree-sitter cannot parse and which expects hljs to be globally available 😅).

There might easily be more that I'm not aware of yet. If you find any, please open an issue or (even better) a pull request.

The End?

Nope, but it might take me a while until the next update. The code essentially was finished in January and it took me over a month to actually publish it (even though I dropped many things I wanted to include initially).

Here is a demonstration page with a couple of examples from elm-lang.org.

There are other strategies of splitting bundled Elm programs that I want to investigate.

I would like to see the realworld-app example split into a public and into an internal program just to use the script outside of trivial examples.

I want to look into adding this kind of bundle splitting into the Elm compiler source code. Or maybe propose it for zokka.

I would like to see how a tool like elm-pages or elm-land that already do a lot of code generation, could generate all the ports and js glue code to seamlessly switch between different programs.
Allowing them not to bundle all of the pages into one program, but load only what is needed, instead.

A nicer experience could also be if some routes in an SPA are loaded on first use, and I think both tools could also generate the necessary types and code for that (but the main issue would be loading those parts independently of each other without using multiple programs).

Maybe I will explore some of those another time, or even better: Someone else does it 😆.