Dr Timothy Corbett‑Clark

CTO, Researcher, Developer

Content approach

This page explains the structure of all the web files, templates, and data to achieve this website. It also contains notes on hosting, validation, security, selecting libraries, rendering maths and code nicely, making the navigation breadcrumb, choosing colours, creating the favicon and app manifest, making the XML sitemap, adding a draft/wip mode, …

Hence this page provides recipes for how to achieve the various features of a website. They are mostly independent of one another, allowing for easy modification, substitution, or ommission (and the agnostic nature of AWG means that nothing is left behind).

File structure

Organisational considerations include:

  1. The purpose of each file, the relationship with other files, and especially how nearby each file is to related files.
  2. When the file is loaded by the templating mechanism ({% include %} or {% extends %}).
  3. When the file is loaded by the browser.
  4. Where browsers expect or need to find special files.

This translates into the following principles:

  1. Use a flat root directory for all files which browsers expect to see there, such as robots.txt, sitemap.xml, the app manifest, favicons, etc.
  2. Use a flat root directory for all the files loaded by the base template (_base.html) on every page. The base template uses absolute paths.
  3. Only extend templates from the same directory or in parent directories. So the template hierarchy overlays the file system hierarchy.
  4. Otherwise, keep related files together in their own directory. And use relative paths.

Even though it is traditional, I don’t organise files by type (all javascript in /js/, all css in /css/, etc).

The result is quite a few files in the root directory, but with the benefit that they are all visible. And then each page has its own directory, with child pages in child directories, etc.

The template inheritance hierarchy is simple:

  • The base template is in /_base.html, and includes /_header.html and /_footer.html.
  • The page template is in /_page.html, extends the base template, and adds blocks for the breadcrumb, title, page content, and the “page is draft” additions (see below).
  • Most concrete pages then extend the page template (the welcome page being an exception, as it alters the layout to have a sidebar card).

Indentation management

All HTML files are indented for clarity during editing. On output after templating, they are all formatted properly by AWG, so there is no need to try to generate properly indented HTML within the templates (avoiding fiddly whitespace management with “-” in e.g. {%- ... %}). For example, the indentation below is entirely to aid readability at the template stage (rather than final HTML).

{% extends "../_page.html" %}

{% block breadcrumb %}
    {{ super() }}
    <div class="level-item">
        <span class="tag">
            <a href="index.html"><i class="fa-solid fa-arrow-left"></i> Back to: Building this website</a>
        </span>
    </div>
{% endblock breadcrumb %}

{% block title %} Content approach {% endblock title %}

{% block page %}
    {{ "_approach.md" | markdown() }}
{% endblock page %}

To make editing easy with language specific editing/formatting/colourising modes, I avoid files containing a mixture of languages. Hence there are separate files for each piece of markdown, no “frontmatter” TOML/YAML within markdown files, javascript is always included from .js files, etc.

Choice of web framework

There are many to choose from, but I selected Bulma because it is CSS only (no javascript), looks good, is well documented, very popular, and actively maintained. I like the responsive layout and support for colour management.

The two closest alternatives were:

  • Bootstrap, which is older, bit boring looking now, and slightly heavier weight than I need.
  • UIkit, which looks slick, but is perhaps more intended for applications. Also there is less support e.g. for colours.

Maths

For any substantial maths I create PDFs using Typst, but for immediately visible maths in the browser I use the javascript library, KaTeX. Other libraries exist but Katex is well maintained, popular, and fast.

The Common Markdown parser used by AWG includes the dollarmath_plugin. It produces inline and block maths with the following HTML markup:

<span class="math inline"> ...latex maths... </span>
<div class="math block"> ...latex maths... </div>

A small amount of javascript makes KaTeX render on the correct elements after the DOM has loaded:

document.addEventListener("DOMContentLoaded", (event) => {
  for (var node of document.getElementsByClassName("math")) {
    katex.render(node.innerText, node, {
      displayMode: node.classList.contains("block"), // otherwise assume it is "inline"
      throwOnError: false,
    });
  }
});

I put this javascript in the file /render-maths.js and load it in the <head> tag of the base template along with the KaTeX library (both javascript and CSS) from the jsDelivr CDN:

<head>
    ...
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.22/dist/katex.min.css" integrity="sha384-5TcZemv2l/9On385z///+d7MSYlvIEw9FuZTIdZ14vJLqWphw7e7ZPuOiCHJcFCP" crossorigin="anonymous">
    ...
    <script defer src="https://cdn.jsdelivr.net/npm/katex@0.16.22/dist/katex.min.js" integrity="sha384-cMkvdD8LoxVzGF/RPUKAcvmm49FQ0oxwDF3BGKtDXcEc+T1b2N+teh/OJfpU0jr6" crossorigin="anonymous"></script>
    ...
    <script defer src="/render-maths.js"></script>
    ...
</head>

The result is for markdown such as

For example, inline maths looks like $(x+1)^2 - (x-1)^2 = 4x$, and block maths like

$$
\sum_{k=1}^n { k! \over (1+k)^2 }
$$

to be displayed as

For example, inline maths looks like (x+1)^2 - (x-1)^2 = 4x, and block maths like

\sum_{k=1}^n { k! \over (1+k)^2 }

Code

Highlighting code is easy with highlight.js. This will colour many different programming languages in any of a number of different themes, expecting HTML markup like

<pre>
    <code class="language-python">
        ...python code...
    </code>
</pre>

The Common Markdown standard used by AWG has fenced code blocks which produces tags with CSS classes exactly like this.

Hence the <head> section of the base template pulls in the highlight Javascript and chosen theme CSS (in this case, gruvbox-light-hard) from a CDN, and instructs the browser to render code once the page has loaded:

<head>
    ...
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.9.0/build/styles/gruvbox-light-hard.min.css">
    ...
    <script defer src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.9.0/build/highlight.min.js" crossorigin="anonymous"></script>
    ...
    <script defer src="/render-code.js"></script>
    ...
</head>

and the contents of /render-code.js runs the highlighter after all the DOM content is present and correct:

document.addEventListener("DOMContentLoaded", (event) => {
  hljs.highlightAll();
});

The highlight colour theme was chosen to be close to my colour theme, but although close the background isn’t a perfect match. I fix this with some CSS in /main.css (using CSS custom properties to access bulma’s derived colours), and also style with a fine border:

/* Fix up the style of the code blocks e.g. consistent background colour. */
code.hljs {
    border: 1px solid grey;
    border-radius: 0px;
    background: var(--bulma-primary-95);
}

Of course, the above demonstrates the end-result of formatting some HTML and Javascript code.

Navigation breadcrumb

The navigation breadcrumb works by sub-templating, calling {{ super() }} to retain the navigation from above. So the pattern is as follows (ignoring all styling).

<!-- This is the base template: /_page.html -->
<nav class="navigation">
    {% block breadcrumb %}
    {% endblock breadcrumb %}
</nav>


<!-- Then in a subclass template in a sub-directory, e.g. /a/_foo.html -->
{% extends "../_page.html" %}

{% block breadcrumb %}
    {{ super() }}
    <a href="/">Home</a>
{% endblock breadcrumb%}


<!-- And then again, e.g. in /a/b/_bar.html -->
{% extends "../_foo.html" %}

{% block breadcrumb %}
    {{ super() }}
    <a href="../index.html">Back to Recreational Maths</a>
{% endblock breadcrumb%}

Manifest and favicons

The Web Application Manifest is a JSON file containing metadata about a web application. Although this site is not a web app as such, it improves user experience to use the manifest to document the location of all the favicons and theme colours.

Favicons appear as the icons in browser url bars, tabs, bookmark menus. And also in the “add to home screen” feature of touch screen devices.

Adding favicons involves:

  • Creating a set of favicons, ensuring the colours are coordinated with the colour theme of the website.
  • Telling browsers where to find all the favicons, noting that some are expected in “standard” locations anyway.

I created a set of favicons using an online favicon generator, using the same primary colours as configured in Bulma. These are all copied into the root (/) directory of the site according to the file structure principles described above.

The manifest then points to these favicons, and is itself put in the root directory as /manifest.json (see here).

Lastly, the base template (in _base.html) indicates the principal favicons and the location of the manifest:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
    <head>
        ...
        <link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
        <link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
        <link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png">
        <link rel="icon" type="image/x-icon" href="/favicon.ico" />
        <link rel="manifest" href="/manifest.json">
        ...
    </head>
    ...
</html>

Icons

I use Font Awesome for icons. I found that loading them from Font Awesome produced rendering lag (the icons flickered as they appeared), and also suffered from fragile SRI settings. Hence I host directly. To keep things self-contained, the fonts and CSS are all in a top level /fontawesome/ directory.

The head section in the _base.html template loads the CSS using:

<link rel="stylesheet" type="text/css" href="/fontawesome/css/fontawesome.min.css" integrity="sha384-{{ '/fontawesome/css/fontawesome.min.css' | sha() }}">
<link rel="stylesheet" type="text/css" href="/fontawesome/css/brands.min.css" integrity="sha384-{{ '/fontawesome/css/brands.min.css' | sha() }}">
<link rel="stylesheet" type="text/css" href="/fontawesome/css/regular.min.css" integrity="sha384-{{ '/fontawesome/css/regular.min.css' | sha() }}">
<link rel="stylesheet" type="text/css" href="/fontawesome/css/solid.min.css" integrity="sha384-{{ '/fontawesome/css/solid.min.css' | sha() }}">

The main fontawesome.min.css fetches the required fonts from /fontawesome/webfonts/.

Colour, styling, and light/dark mode

Colours are both technical and personal. I found these useful to get started:

Then the following helped me experiment with different palettes:

One gotcha I encountered was that there are different variants/standards of RGB.

Bulma has a “customizer” popup on its website which allows colours (and other style aspects) to be tried out before exporting as CSS settings. Because it automatically derives shades, the main task is to decide a Primary colour, a Link colour, and colours for Info, Success, Warning, and Danger.

Bulma also automatically derives and manages the colour variations between light and dark mode. For that to work, one needs to use the “soft” and “bold” colour classes for those elements which should be a function of light/dark mode. For example, I use the has-background-primary-bold-invert and has-text-primary-bold classes for the main page section. See the Bulma docs for details.

Lastly, remember to coordinate the colour choices across the Bulma setting, the manifest, and the favicons.

Draft / wip mode

Given the purpose of this website, many files are constantly being revised and refactored. New content could be excluded from the build until completely ready, but a “softer and more organic” approach is to include it with a DRAFT watermark and delay linking to such pages or adding them to the sitemap until more ready. Then such content can be viewed if you know it is there but is not readily discovered otherwise; and if seen then it is obvious that it is work-in-progress.

To control this watermark, the /_page.html template adds a CSS class if a draft variable is set:

<div class="content {% if draft %}draft-watermark{% endif %}">
  {% block page %}{% endblock page %}
</div>

The corresponding CSS is:

.draft-watermark {
  background-image: url("draft-watermark.svg");
}

and the corresponding draft-watermark.svg contains

<svg xmlns="http://www.w3.org/2000/svg" version="1.1" height="170px" width="210px">
    <text transform="translate(0, 40) rotate(30)" fill="rgba(245,5,5,0.05)" font-size="60px">
        DRAFT
    </text>
</svg>

Hence to mark a page as in-draft/work-in-progress, set the draft variable at the top of the template as follows:

{% set draft = true %}

{% extends "../_page.html" %}

...etc

XML sitemap and the robots.txt file

To support search engine indexing and SEO, the robots.txt file and related sitemap file (in sitemap.xml) are used to hint to search engines what pages they should index. See Google’s descriptions of robots.txt and sitemaps.

For this site I just use the robots.txt file to point to the sitemap:

Sitemap: {{ SITEURL }}/sitemap.xml

(the SITEURL is set in a template data TOML file).

The sitemap.xml file will also be run though Jinja by AWG because it has an .xml extension. Hence it is a template:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    {%- for p in SITEMAP_FILENAMES %}
    <url>
        <loc>{{ SITEURL }}/{{ p.name }}</loc>
        <changefreq>{{ p.change_frequency or "monthly" }}</changefreq>
        {%- if p.last_mod %}
        <lastmod>{{ p.last_mod }}</lastmod>
        {%- endif %}
    </url>
    {%- endfor %}
</urlset>

(Unlike HTML, XML is not automatically tidied by AWG, hence the few “{%-whitespace control indicators to indent the output.)

The data describing the set of URLs which should be indexed is kept in _sitemap.toml. For example:

[[SITEMAP_FILENAMES]]
name = "index.html"
change_frequency = "monthly"

[[SITEMAP_FILENAMES]]
name = "welcome/index.html"
change_frequency = "weekly"

This isn’t too cumbersomb to maintain (e.g. by listing all candidates with ls -1 **.html). It is also possible to put the different entries in different .toml files in respective directories.

Validation

Iterating with various (free) validation sites makes it easy to check for correctness, best practice, and learn about the web world. For example:

Notable issues I could not address include:

  1. The existence of trailing slashes on void elements such as meta and link tags. These cannot be fixed by hand because AWG checks and formats all HTML with HTML Tidy, produces such trailing slashes. The main concern (see here and here) seems to be that because HTML5 is not XML, and since href arguments don’t have to be quoted, there is an ambiguity if the href is the last attribute and the url contains a trailing slash. Compare:
    <link href=https://foo.bar.baz/>
    <link href="https://foo.bar.baz"/>
    <link href="https://foo.bar.baz/">
    
    This isn’t a problem so long as href’s are always quoted.

Security

Excellent references on web security can be found on Google’s web.dev and Mozzila’s MDN.

To find security weaknesses and information about how to address them, I follow the findings from MDN’s Observatory tool.

Content Security Policy (CSP)

A Content Security Policy (CSP) instructs browsers to place restrictions on what loaded code can do. This is to defend against cross-site-scripting (CSS) and clickjacking in which an attacker finds ways to inject malicious code.

A related concept is SubResource Integrity (SRI), which makes browsers only accept resources when they match the hash contained in the integrity attribute. This attribute is notably available on <script> and <link rel="stylesheet"> tags. So SRI helps to prevent security problems from source file tampering.

CSP is configured using the Content-Security-Policy HTTP Header. Since this is a static site I use the http-equiv meta tag in every HTML file:

<meta http-equiv="...name of HTTP Header..." content="...HTTP header contents...">

My approach is:

  • To deny everything by default and add specific permissions as needed.
  • To always use SRI, including for local (or 'self') files. Note that AWG provides a Jinja filter to make it easy to generate the hashes (such as sha384) from the source files.
  • To check validity using tools such as CSP Evaluator.

In annotated outline, the CSP is as follows:

upgrade-insecure-requests;                   <-- Instruct browser to switch site HTTP urls to HTTPS
default-src 'none';                          <-- Default fallback is deny
require-trusted-types-for 'script';          <-- See link below
base-uri 'self';                             <-- Don't allow the base URL to change from self
img-src 'self';                              <-- Only allow images served up from self
manifest-src 'self';                         <-- Only allow manifest served up from self
script-src-elem
    'strict-dynamic'                         <-- Trusted scripts (i.e. javascript) are trusted to use other scripts
    'sha384-kri+HXDJ8qm2+...'                <-- Trust scripts with the following hashes
    ...etc
    ;
connect-src
    'self'                                   <-- Allow connections to self e.g. for websockets (used by hot reloader)
    ;
font-src
    'self'                                   <-- Allow fonts from self, e.g. Fontawesome.
    https://cdn.jsdelivr.net                 <-- Allow fonts (e.g. for Katex) from jsDelivr CDN
    ;
style-src
    'self'                                   <-- Allow loading of CSS files from self
    https://cdn.jsdelivr.net                 <-- Allow CSS files from jsDelivr CDN
    'sha384-vpayKGwduWhgY...'                <-- Permit CSS with following hashes (does not seem to do anything!)
    ...etc
    ;

(Link for require-trusted-types-for.)

I discovered a few helpful things along the way:

  • Safari does not read style-src-elem, but allows it to exist. Chrome does read it. Hence using style-src.
  • The fallback from say style-src-elem to style-src to default-src does not mean keep trying until one passes, but use the most specific provided. If the most specific fails then the permission is denied.
  • The style-src section does not do anything with hashes for link files. It neither checks the hashes or complains if present. This could be about CSP level 2 vs level 3. See also, here. I’ve kept the hashes in because I believe it should work like this, and doing so appears harmless.
  • If script hashes are provided in script-src then SRI must also be used (i.e. the integrity attribute should exist and contain the hash).
  • For both CSS and javascript, if the SRI is present (using the integrity attribute), then it is checked and must pass. Hence independently of CSP, SRI seems uniformly implemented.
  • Test on different browsers, because (a) they may behave differently, and (b) when things don’t work they give different diagnostic information (some more helpful than others).

In practice, using template data reduces maintenance overhead and helps document what is going on and where things are from. For example:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
    <head>
        ...
        <meta http-equiv="Content-Security-Policy" content="
            ...
            script-src-elem
                'strict-dynamic'
                '{{ KATEX_JS_SHA }}'
                '{{ HIGHLIGHT_JS_SHA }}'
                'sha384-{{ '/hot-reloader.js' | sha() }}'
                'sha384-{{ '/render-maths.js' | sha() }}'
                'sha384-{{ '/render-code.js' | sha() }}'
                ;
            ...
            style-src
                'self'
                https://cdn.jsdelivr.net
                '{{ BULMA_CSS_SHA }}'
                '{{ KATEX_CSS_SHA }}'
                '{{ GRUVBOX_CSS_SHA }}'
                'sha384-{{ '/main.css' | sha() }}'
                ;
        ">
        ...

        <link rel="stylesheet" type="text/css" href="{{ KATEX_CSS }}" integrity="{{ KATEX_CSS_SHA }}" crossorigin="anonymous">
        ...
        <script defer src="{{ KATEX_JS }}" integrity="{{ KATEX_JS_SHA }}" crossorigin="anonymous"></script>
        ...
        <script defer src="/hot-reloader.js" integrity="sha384-{{ '/hot-reloader.js' | sha() }}"></script>
        ...
    </head>
    ...
</html>

Note:

  • The crossorigin="anonymous" attribute on the <link> and <script> tags is needed to make the browser send the appropriate CORS headers to fetch external resources without leaking user credentials - see here.
  • The sha() Jinja filter provided by AWG is used to statically compute hashes of local content. It is done in two places for each file: the CSP header and the SRI integrity attribute.
  • Jinja variables help show meaning and aid re-use. They are kept in a TOML file, e.g. as follows:
# Maths
KATEX_CSS = "https://cdn.jsdelivr.net/npm/katex@0.16.22/dist/katex.min.css"
KATEX_CSS_SHA = "sha384-5TcZemv2l/9On385z///+d7MSYlvIEw9FuZTIdZ14vJLqWphw7e7ZPuOiCHJcFCP"
KATEX_JS = "https://cdn.jsdelivr.net/npm/katex@0.16.22/dist/katex.min.js"
KATEX_JS_SHA = "sha384-cMkvdD8LoxVzGF/RPUKAcvmm49FQ0oxwDF3BGKtDXcEc+T1b2N+teh/OJfpU0jr6"

# Theme for highlight.js
HIGHLIGHT_JS = "https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.9.0/build/highlight.min.js"
HIGHLIGHT_JS_SHA = "sha384-F/bZzf7p3Joyp5psL90p/p89AZJsndkSoGwRpXcZhleCWhd8SnRuoYo4d0yirjJp"
GRUVBOX_CSS = "https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.9.0/build/styles/base16/gruvbox-light-hard.min.css"
GRUVBOX_CSS_SHA = "sha384-vpayKGwduWhgY00faoPtbmJwz8TjOLnnDuqvy+xWy2DWuIVxIt0dxj0mjrMVPxdd"

# Framework
BULMA_CSS = "https://cdn.jsdelivr.net/npm/bulma@1.0.2/css/bulma.min.css"
BULMA_CSS_SHA = "sha384-tl5h4XuWmVzPeVWU0x8bx0j/5iMwCBduLEgZ+2lH4Wjda+4+q3mpCww74dgAB3OX"

Strict Transport Security (HSTS)

This site can use HTTPS throughout. To help prevent manipulator-in-the-middle (MiTM) attacks, the Strict-Transport-Security HTTP Header should be set, together with the upgrade-insecure-requests directive in the CSP.

Hence the following are added in the _base.html template to the <head> tag.

<meta http-equiv="Upgrade-Insecure-Requests" content="1" />
<meta
  http-equiv="Strict-Transport-Security"
  content="max-age=63072000; includeSubDomains"
/>

(max-age is set to the recommended 2 years).

The upgrade-insecure-requests CSP directive is explained in the Content Security Section above.

I’ve also configured GitHub Pages to only serve HTTPS.

NB: The presence of these security settings means that AWG must be run in HTTPS mode.

Deny embedding

A clickjacking approach relies on embedding sites in other sites. Ideally this would be prevented using CSP by setting the frame-ancestors and the X-Frame-Options header. See here for details. But unfortunately neither can be done using http-equiv and I don’t have control over the server HTTP Headers.

Referrer policy

To stop leaking information about where outbound links are coming from (see here), I set the HTTP header as follows:

...
<meta http-equiv="Referrer-Policy" content="no-referrer">
...

MIME types

To inform browsers not to load scripts and stylesheets unless the server indicates the correct MIME type, I set the X-Content-Type-Options header using the <meta> tag to nosniff as explained here:

...
<meta http-equiv="X-Content-Type-Options" content="nosniff">
...

Deployment on GitHub pages

It is easy and convenient to host static content on GitHub pages.

One can either use files from a git branch, the root directory of the repository, or a directory called docs/. It would be nice to be able to use a different directory name, but so be it. I just use the docs/ directory on the master branch.

A custom domain can be used by creating a CNAME file containing the full domain (in my case, www.corbettclark.com).

The default GitHub action detects code commits and deploys on their infrastructure, making the result visible within a couple of minutes (often faster).

As I’m the only person making changes, I mostly dispense with creating a branch and making a pull request to myself (GitHub flow), but instead just make a number of meaningful commits locally. Then when ready to publish, I git push to GitHub. In short, my workflow is:

  • Start up AWG with ./awg.py content/ docs/ --certfile localhost.pem --keyfile localhost-key.pem
  • Repeat until ready to publish:
    • Make changes and check in local browser without leaving my editor (because of hot reload).
    • Commit locally using git.
  • Git push to GitHub.
  • After a minute or so, check the changes have reached live ok.