We summarise each setup phase below.
Basic retrieval settings
The first question asks for the value for the URL of the source (url_base) and provides a stock sample of http://example.com. On entering the URL, it will check connectivity. There are basically three cases, which determine what to enter for the next question, ‘Does the site require a login (y/n)?’:
- ‘Connection established OK’ indicates access is available and no login is required. So, answer ‘n’ to login required (y/n)?
- ‘WARNING: redirect detected’ followed by a code 301 or 302 is typically given when a login is required. If a login is indeed required, answer ‘y’ to login required (y/n)?
- ‘WARNING: unauthorised’ followed by code 401 means that you need to enter a valid username and password to access the Web directory. This is not related to the web application itself, so, you should answer ‘n’ to login required (y/n) unless the website actually has a login.
If you’ve indicated that the site requires a login, you’ll be asked to enter a username and password. It won’t be checked at this stage, so you can enter dummy values and subsequently edit the configuration file with the correct ones.
The next few questions are designed to elicit precise options for Wget, which is the program that performs the creation of the static mirror. Most of these custom options should be specified in wget_extra_options; for an optimal capture, with maximal coverage of content, different sites will require different sets of options. As such, some experimentation may be required to yield the desired results, which can be done by examining the output in the respective mirror (feel free to run the setup script as many times as needed). Note that options specified here make no assumptions about settings in Wget’s startup; if there is unexpected behaviour then check any wgetrc files.
Further questions ask about post-processing; wget might not be able to make a complete copy in a single go, so for a more thorough capture, set wget_extra_urls and wget_post_processing to ‘y‘. The final two Wget-related options ask whether to create fresh timestamped copies each time (archive) and what folder name to apply to it (local_sitename).
Preparing the source
Database-driven web sites authored in content management systems tend to have features that add additional complexity when it comes to creating static versions. So, the next set of questions ask whether to allow MakeStaticSite to modify the source to make it more amenable to mirroring. These changes generally only need to be made once and will apply for all static sites that are generated subsequently (unless the changes are reverted).
Whenever modifications are to be made, care is needed and it is strongly recommended that you carry out a backup of the affected database and files beforehand. If in doubt, then these changes can be carried out manually through the respective CMS.
At present, these modifications are only available for WordPress sites and make use of WP-CLI. Setting these options, as explained in more depth, will make modifications to the database to avoid generating excess URLs (such as those with query strings corresponding to post IDs). If you are accessing a host remotely, then whether you can actually avail yourself of this functionality will depend on whether WP-CLI is available on your hosting provider. With shared hosting, you may find that it is installed, but the commands you can run are limited. Again, it may require experimentation, with examination of the output.
If you have WordPress and WP-CLI installed, and trust MakeStaticSite to make changes to your database, then set wp_cli=y, i.e., answer ‘y‘ to the question, ‘Use WP-CLI to carry out tweaks on WordPress database (y/n)?’ You will be asked whether to run WP-CLI on your machine or remotely (i.e. on the hosting provider). In the latter case, you will need to enter connection details (source_host, source_protocol, source_port, source_user). On the other hand, if you don’t have these components or would rather make adjustments through the WordPress dashboard, then answer ‘n‘.
One particular feature of setting wp_cli=y is the option to install a static search plugin, a modified version of WP Static Search. This search facility is ideal for distributing zip archives of the site; the site will remain searchable when unpacked, for example, onto a memory stick.
Additional Features and Deployment
In addition to running and re-running Wget to create a static mirror, there are further options to augment and refine the site. One of these is snippets, which enables you to tweak certain portions of content, perhaps to hide a menu or to provide slightly different behaviour from that experienced in a dynamic site.
As alluded to above, there is an option to create a zip file that can be added to the site. We’ve opted to do so here, so visitors can download their own copy to view offline at their convenience. Also, the option of further post-processing with HTML Tidy can help improve the quality of the generated HTML and make it more standards-compliant.
The final set of options determine how the static site is to be deployed, whether locally (on the same machine as MakeStaticSite) or remotely. For remote deployment, rsync may be used over ssh, for which it is recommended that a public/private key pair be created. Another option is to deploy to a Netlify CDN.
Creation of Configuration File
Once you have entered all your configuration preferences, the setup will invite you to write them to a .cfg file. Such a file is needed for the main script to work. It’s possible to compose it manually, but it’s recommended to generate the first version through the setup script and then modify it manually later, as necessary.
Furthermore, you have the option of going on to running the main script straightaway based on the newly entered configuration. There’s no need to do this; we can run makestaticsite.sh any time once the setup is complete.
MakeStaticSite .cfg file
As a sample, here is a slightly edited copy of the .cfg file used in the early days of this website:
################################################
# Configuration file for makestaticsite.sh
# generated by setup.sh
# Created: 2022-11-24 14:56:36
# Last modified:
################################################
url=https://mss.Localweb
require_login=n
wget_cmd=wget
ssl_checks=n
wget_extra_options=-X/wp-json --reject wp-admin,xmlrpc* --limit-rate=100k
input_urls_file=wget_extra_urls=y
wget_post_processing=y
archive=y
local_sitename=mssweb
wp_cli=y
wp_cli_remote=n
site_path=/var/webs/mss.Localweb
wp_helper_plugins=y
add_search=y
wp_search_dir=wp-static-search
use_snippets=n
upload_zip=y
zip_filename=makestaticsite_web.zip
zip_uploads_folder=download
deploy=y
deploy_remote=y
deploy_host=example.com
deploy_port=22
deploy_user=username
deploy_path=~/mssweb
deploy_domain=makestaticsite.sh
htmltidy=y
htmltidy_cmd=tidy
htmltidy_options=-m -q -indent --indent-spaces 2 --show-filename yes --tidy-mark no
add_extras=n