Preparing WordPress


MakeStaticSite arose out of a specific need to create a portable distribution of a WordPress site. The original intention was to just use Wget with certain options to generate the site. However, the default behaviour of the CMS is to output multiple index pages at the root, each one with a page ID. For example, an early run for this site generated:

Screenshot of Mac Finder showing output from a run of MakeStaticSite, with a long list of index.html files with query string suffices corresponding to WordPress post IDs.
Screenshot from Mac Finder showing output from makestaticsite.sh on a site without any preparation.

Furthermore, WordPress typically appends some CSS and JavaScript files with version numbers; the theme’s style file would be saved as, say, style.css?ver=5.9.1.css. This trait arises because WordPress as a blogging platform ensures that browsers display up-to-date content by delivering the latest versions of any component. Thus core functions, such as including stylesheets, have version numbers as parameters. Such a versioning strategy can be useful for sites that are cached, but the version numbering is unnecessary and even a hindrance for MakeStaticSite in that its output is designed for offline usage and doesn’t need to be cached.

These are some of the well-known issues for static site creation and the preferred approach to dealing with them is to modify the WordPress database. The modifications can be carried out in various ways including the use of plugins and theme modification. It’s generally straightforward to carry out the manual pruning of a site this way with the use of mature tools such as WPCode. On the other hand, the scripting context of MakeStaticSite prompts the use of WP-CLI, a command-line interface that can be used to make such modifications without having to manually effect changes through the WordPress dashboard. So, to assist with some of the changes, we use WP-CLI to install such a plugin, currently Perform.

As these are WordPress-specific changes, they are all contained in a dedicated library file, mod-wp.sh. They are enabled through the configuration option, wp_cli, which can be set during setup and changed manually be editing the respective .cfg file.

We’ll now run through what these modifications are.

Pre-run modifications

There are currently four changes that are considered generally necessary for MakeStaticSite to produce a clean static version. They can all be made through WP-CLI ahead of running Wget provided that the user account that runs wp commands has sufficient permissions to the WordPress installation.

  1. Allow robots to crawl the site: we need WordPress to not restrict robots from browsing the site, nor insert a no-follow option in anchors.
    This is determined in the wp_options table by setting option_value=1 where option_name is blog_public.
    From the dashboard, select Settings: Reading and then make sure that the Search engine visibility option is unchecked.
    Implemented in wp_clean() by a wp-cli command, wp option update blog_publica direct update of the WordPress database.
  2. Ensure permalinks structure uses post names (and to prevent the use of shortlinks).
    This is determined in the wp_options table by setting option_value=/%postname%/ — if it doesn’t already contain that string — where option_name is permalink_structure
    From the dashboard, select Settings: Permalinks and then choose Custom Structure and enter a URL portion that includes /%postname%/, such as /blog/%postname%/.
    Implemented in wp_clean() by a wp-cli command, wp option update permalink_structure, which carries out a direct update of the WordPress database.
  3. Remove shortlinks, which in the WordPress context are URLs that end with ?p=nnn, where nnn is a post ID. They are typically included in the head of a document, e.g.
    <link rel='shortlink' href='https://domain.com?p=123' />
    Not only are they superfluous within a document, but they also lead to the generation of excess files, as illustrated for index pages above. They need to be deleted.
    From the dashboard, a plugin needs to be installed and specific option configured. For example, after installing Perform, select Settings: Perform: General and check the box by Remove Shortlink.
    Implemented in wp_clean() by a wp-cli command, wp option update perform_common, which carries out an update of the WordPress database and then the plugin makes the modification.
  4. Remove query strings that are appended to file names of static (CSS, and JavaScript) resources. These have URLs such as style.css?ver=1.2.3. From the dashboard, a plugin needs to be installed and specific option configured. For example, after installing Perform, select Settings: Perform: General and check the box by Remove Query Strings.
    Implemented in wp_clean() by a wp-cli command, wp option update perform_common, which carries out an update of the WordPress database and then the plugin makes the modification.

Further changes that are worth considering include the removal of any API, particularly XML-RPC

One change that is not necessary is to remove the meta tag that specifies the WordPress version – in terms of security, it matters very little for a static mirror whether the original was running a new or old version (assuming the original is itself being kept secure).

Options

In addition to the preceding changes, further changes can be made, depending on the requirements.

One specific requirement was to support offline search (enabled with the configuration option, add_search=yes). Technically, because of security restrictions around browsers having access to local file systems, files cannot in general be loaded and read. However, the site itself comprises files including JavaScript files and these can be read.

Thus, a strategy has been adopted to use a plugin that generates a search index and incorporate that index within an (existing) JavaScript file. It makes the index accessible to scripting through defining a variable that contains a serialisation of that index. It has the major benefit of not requiring the http: protocol to deliver results as it’s only fetching data from the scripts themselves, which get stored in the browser’s cache. Furthermore, it can act as a drop-in replacement for the search facility provided by WordPress.

The foundation of the solution is the WP Static Search plugin, based on Lunr.js, though in its original form it depends on workers (i.e. requires http:) and has a few bugs. We have taken the repository version and extended it; and the version invoked by MakeStaticSite (and defined in constants.sh) is tweaked a bit more. Its long-term future is currently undetermined, though a new project should probably be created, perhaps with the name WP Offline Search to indicate that it works without needing an internet connection. The only manual intervention required is to build the index from the dashboard.

It’s a rudimentary approach that means such files can become large even will relatively small sites. Hence, they are slow to download the first time. However, once downloaded, search works well enough and it is easy to incorporate in a distribution: just set the configuration option, upload_zip=yes.

Secure protocol (force SSL)

MakeStaticSite supports basic access restrictions for web applications in the form of directory privacy and logins with session cookies. For such access, it is expected that the login process is secured with SSL/TLS. Also, all subsequent requests should be under SSL. Not only is this considered general good practice, but technically it helps prevent having to re-authenticate when switching between https and http, which is tricky to manage.

It is strongly recommended that all internal links are either relative or use https. This can be achieved with the aid of a plugin such as Really Simple SSL.

What if I don’t have an account on the WordPress site?

It gets more complicated. Nevertheless, the proliferation of index.html files arising from shortlinks can be prevented by instructing Wget to not follow links with query strings appended, --reject 'index.html?'*. Since version 0.24.1, this has been incorporated as a default setting for the constant, wget_extra_options. It seems that only in subsequent requests for a page does WordPress generate the query strings to distinguish each access. Also, MakeStaticSite was extended in version 0.25 to support the pruning of JS and CSS URLs of their query strings.

It is possible, at least in theory, to further extend MakeStaticSite to apply various filters through post-Wget scripting, but it would be inefficient. It’s a consideration for further work.

This page was published on 11 November 2022 and last updated on 21 April 2023.