Examples


These examples walk though how to set up MakeStaticSite for different use cases. The first concentrates an an ongoing project, illustrating in detail the respective options for run levels 0, 1 and 2, while the second shows the potential for archival highlighting the support for capturing part of a site. There is also an example of running the setup in non-interactive mode.

This section ends with a brief conclusion.

Example 1: MakeStaticSite project website

That’s the website your are now viewing! The home page provides an animation, showing what happens as the site is built. It is based on the supplied configuration, generated by running the setup script.

All the content authoring is carried out on a database-driven site running the latest version of WordPress. We use MakeStaticSite to convert it to a set of static pages that retains the structure and the look and feel (so that a CMS detector may still think it’s WordPress).

We proceed to work through the three levels, gradually elaborating on particular questions that are raised. Each setup starts with:

./setup.sh

Then we answer 0, 1 or 2 to the first question on runtime level. The explanations given below for our answers may vary for each level, usually to successively give more detailed explanations or rationale.

Level 0: Quick Test

An initial run with minimal options gives a quick impression of what the site can look like.

Please enter the value for url: https://mss.localweb/

The site is built on a personal computer, so it has been assigned a made-up local domain that is distinct from the top-level domains on the Internet.

Connectivity will be checked and if the site is reachable, then that’s it as far as input is concerned! A preview of the generated configuration file will be displayed with a number of auto-generated options.

Do you wish to write this configuration to a file (y/n)? y

The resulting configuration file should be what we need and if not we can manually edit afterwards, so we say yes (the system is quite liberal we may enter y or yes). For this level, no deployment options are set, so it is really designed just for personal offline use.

Now that the configuration file is stored, we are ready to actually build the site:

Would you like to make the static site now (y/n)? y 

We say yes to allows the script to go ahead and generate an initial output.

On completion of the run, we observe:

  • the basic site appears to be structured faithfully to the original and navigation works fine
  • all the page elements, including the home page JavaScript-powered animation, function correctly, except there’s a small glitch in that the use of mss.localweb in some illustrative text has been changed to the deploy domain.
  • the search box doesn’t work
  • the Wget options have managed to clean up any excess or unwanted files generated by WordPress, though Wget provisionally retrieved the site’s home page numerous times, before deleting all but the original copy (from the Wget options).
  • download files are missing
  • all URLs end in .html

Furthermore, after the initial run of Wget, there were many absolute URLs that had yet to be made relative (because of the WordPress use of shortlinks), which meant MakeStaticSite had to carry out additional processing.

Altogether, running at level 0 has resulted in a satisfactory personal archive, but it’s not ready for production.

Level 1: Intermediate

Running the setup at the next level should get us much closer to production-ready output. It starts, as above:

Please enter the value for url: https://mss.localweb/

As above, but Level 1 then initiates a series of questions to tailor the output. It starts by asking about resources that may be hosted on other domains:

The URL format (which designates the protocol to be used) must be one that Wget supports — scheme or protocol relative URLs that start // are not supported. However, if MakeStaticSite subsequently encounters such URLs in the body of a Web document, it can process them by prefixing with http: or https:, as determined by the constants wget_protocol_relative_urls and wget_protocol_prefix.

Please enter the value for extra_domains: 

We don’t source any images, JavaScript, or other assets from external sites. We have everything we need locally. So we leave this empty (just hit Return).

Please enter the value for local_sitename: mssweb

The output, referred to as the mirror, will be generated in its own directory (or folder), with its name based on this value, either the same or with a timestamp appended. The system suggested mss_localweb, which was generated automatically based on the URL supplied. We contract that to mssweb for brevity.

Use WP-CLI to carry out tweaks on WordPress database (y/n)? y

Having observed in level 0 the limitations of running without WordPress customisation, we opt to do this. Selecting this option will prepare our WordPress site ahead of capturing the site with Wget and do so automatically using WP-CLI, the WordPress command-line interface. Alternatively, if we have administrator account on the site, then we can log in and carry out these steps

Please enter the value for site_path: /var/webs/mss.localweb

This should be the full path to your WordPress site; the value given is actually made up. (On Linux, you can use auto-complete to help enter the correct path.)

Try to install WordPress plugins to configure site snapshot (y/n)? y

We agree to this as these plugins carry out the tweaks on WordPress to make them more amenable to static output. This entails downloading and installing plugins, as required, which depends on network connectivity and WordPress being able to carry out such changes.

Add a static search function (y/n)? y

This is a custom static search facility, a drop-in replacement for the supplied WordPress search. It enables offline searches, without Internet connections, and will be incorporated in the zip file of the website that visitors can download. Just what we want, though it’s somewhat experimental and probably not suitable for large sites.

Use snippets to create page variants (y/n)? y

The snippets feature can be used in a variety of ways. In our case, it arises from illustrations of how this site works! Specifically, due to the rudimentary nature of domain conversion, a reference to the original domain, mss.localweb, ordinarily maps that to the deploy domain, makestaticsite.sh. MakeStaticSite has code to ignore the domain in a <code> block, but we use it instead within a shortcode. So, a snippet is used to reinstate the original.

Snippets will be more commonly used to hide page elements that only make sense in a CMS editing context, such as login links.

Create a zip file for distribution with the static website (y/n)? y

MakeStaticSite can generate a copy of the website for you to aid distribution. Level 0 opted for this automatically, but here we get the choice.

Please enter the value for zip_filename: makestaticsite_web.zip

This is the name of the Zip file that will be distributed.

Please enter the value for zip_download_folder: download

This is the name of the directory on the website where the Zip file will be stored. We go with the default.

Deploy the output on a server (y/n)? y

This option, which is not available in Level 0, is the basis of a series of questions on deployment. We answer yes.

Please enter the value for deploy_domain: makestaticsite.sh

This, makestaticsite.sh, is usually where the public will access the site.

Deploy to a server on a remote host (y/n)? y

We answer yes as we are serving the site from a host provider, not from our personal computer.

MakeStaticSite allows deployment to multiple servers per run. Currently, for Level 1 there is just one option, rsync:

Deploy using rsync over ssh (y/n)? y

We answer yes and proceed to enter details for our hosting provider (deploy_host, deploy_port, deploy_user and deploy_path). We actually intend this server as a fallback; in case there are problems with Netlify, we should be able to modify the DNS settings to this other server.

We are now left with just two more questions …

Clean up mirror output using HTML Tidy (y/n)? y

HTML Tidy is a mature tool that can modify web pages to help ensure standards-compliance and improve readability not just for machines, but also for humans. It generally improves the quality of the HTML code, so we opt to use this, though there is a small risk that the changes will break the page functionality.

And the final configuration option:

Add additional files to the static output (y/n)? y

This is like a survey providing a free text box at the end, “Anything further you would like to say?” Here, it can be used in many ways, particularly to slot in additional files for download or add web pages that are not static, such as contact forms. In our case, we have files to copy over to the download/ directory.

To conclude, as with Level 0, we are shown a summary of all the configuration options, though it’s now a longer list. We say yes to ‘Do you wish to write this configuration to a file (y/n)?’.

The next question, not included in Level 0, asks whether this new configuration file should be the default. (It is offering a convenience.)

Would you like this configuration to be copied to default.cfg file, which is loaded automatically when you run makestaticsite.sh without a parameter (y/n)? n

We would only say yes, here if we settle on this configuration as the most commonly used. But we are just experimenting at this stage, so we say no and proceed to run.

On completion of this run, we observe:

  • as above, the site appears to be well-structured and works well, with all page elements functioning correctly
  • the use of a snippet has fixed the text issue in the animation
  • the search box works
  • the WordPress preparation (via the Perform plugin) tidied up the site ahead of runnning Wget, so now it only retrieved the site’s home page once.
  • the download files are available
  • with appropriate configuration, the snippet deals with a formatting quirk
  • the output converts links to URLs that end in a slash, /, and it can be deployed on a server using rsync

The site is production-ready, though the setup only offers deployment locally or via rsync. For other hosting solutions, we need to carry this out manually or through additional scripting.

Level 2: Full Details

Level 1 brought us close to the configuration we need, but to complete it we need Level 2, i.e. full customisation.

As before, we start by entering the site’s URL:

Please enter the value for url: https://mss.localweb/

On a technical note, the MakeStaticSite project site was originally developed on a Mac. The use of .local was avoided as a hostname since this is used by Bonjour for networking services.

Next, a new question:

Does the site require a login (y/n)? n

That is, do you need to enter a username and password just to view site content? Normally this is only the case for an intranet or firewalled site. This is useful after converting a public-facing database-driven CMS to one that firewalls that CMS and then offers a static version for the public.

In our case, we run the CMS on a personal computer, so the answer is no.

Validate certificate in encrypted (SSL/TLS) connections (y/n)? n

Do you trust the SSL certificate of the site to which you are connecting? If our site is on the Internet, then we might be cautious and wish to validate it, as asked here, in which case we need to refer to a public certificate issued by a suitable authority (the full question indicates the format). But here the site is on a laptop and so no check is needed. (This is the default for the other levels.)

The next question is as per Level 1:

Please enter the value for extra_domains: 

We again leave this empty (just hit Return).

The next question is the main one for fine-tuning behaviour:

Please enter the value for wget_extra_options: -X/wp-json,/wp-admin -nH --reject xmlrpc*,'index.html?'* --limit-rate=500k

Wget lies at the heart of the system and its precise configuration is needed to generate a faithful snapshot. In addition to several core options used at each run to generate a mirror, the user can provide further options depending on the kind of system being accessed. Some of the suggested defaults are based on experience specifically with WordPress.

  • -X/wp-json,/wp-admin: this excludes /wp-json and /wp-admin folders from the mirror
  • --reject xmlrpc,'index.html?': this tells Wget not to follow links to files that end .xmlrpc or with a query string appended in index.html.

Other options are more general:

  • -nH omit the host directory, mss.localweb, to make the output more compact. Note that this is not actually applied until after Wget has finished its runs and postprocessing has been carried out in phase 4. This is to ensure that page requisites from external domains are handled properly.
  • -limit-rate=500k: network bandwidth is not infinite, so the rate at which data is downloaded is limited. This constraint can, of course, be modified or removed, especially if the site is on your personal computer.

We leave empty the value of input_urls_file.

Then, another key difference with previous levels:

Use Wget to retrieve additional assets from the domain (y/n)? y

Wget has a predefined list of HTML tags, but that list keeps evolving, so it may omit the retrieval of some relevant files. Saying yes here will carry out — albeit bluntly — a further search to retrieve what’s missing. Generally recommended. In the case of MakeStaticSite, it doesn’t actually retrieve anything further, but it does for other sites maintained by the creator.

Further refine the output from the first run of Wget (y/n)? y

Various Wget post-processing is carried out to make the site more portable and standards-compliant in its new static form. It means making changes to the web pages, but even so, saying yes is recommended.

Add the mirror site to an archive (y/n)? y

Answering no means output will always be saved in the same directory. Unless site generation is a one-off or it’s important to save disk space, it’s recommended to select yes, which will append each run with a timestamp.

The next question is as for Level 1:

Please enter the value for local_sitename: mssweb

Again, we contract mss_localweb, which was automatically suggested by the system.

Use WP-CLI to carry out tweaks on WordPress database (y/n)? y

As with Level 1, we opt to do this to prepare our WordPress site ahead of capturing the site with Wget.

The next question extends Level 1 by offering to support the use of WP-CLI over ssh, which in this scenario secures file transfers :

Is the use of WP-CLI on a remote server (through ssh) (y/n)? n

We answer no as the use of WP-CLI is local, and not on a remote server.

Please enter the value for site_path: /var/webs/mss.localweb

As above.

Try to install WordPress plugins to configure site snapshot (y/n)? y

As above.

Add a static search function (y/n)? y

As above.

Use snippets to create page variants (y/n)? y

As above.

For the next three questions, we choose yes to create a zip file for distribution with the static website and give it the filename, makestaticsite_web.zip, leaving the value of the download folder as download.

Deploy the output on a server (y/n)? y

This question determines a number of system behaviours for the build. As well as leading to further questions to specify the server where the site is deployed, it determines the endings of internal links. If answering no, then it will be assumed that the output will only be browsed offline and not hosted on a server. Hence, following Wget, links will be to files ending .html.

To support the full lifecycle as far as deployment, we answer yes. It is recommended selecting yes if deployment is envisaged at any time in future. It will always be the case that any zip file created will maintain links to files ending .html, but the mirrored output will do likewise when running makestaticsite.sh up until phase 8, i.e. with q=8.

Please enter the value for deploy_domain: makestaticsite.sh

This is usually where the public will access the site and accordingly, we answer yes to the next question: Deploy to a server on a remote host (y/n)?

MakeStaticSite allows deployment to multiple servers per run. Currently there are just two options. The first is rsync:

Deploy using rsync over ssh (y/n)? y

We answer yes and proceed to enter details for our first hosting provider, as above.

We actually intend this server as a fallback; in case there are problems with Netlify, we should be able to modify the DNS settings to this other server.

Deploy the output on Netlify (y/n)? y

This is our main live target, so we answer yes and proceed to enter the respective value for deploy_netlify_name.

Clean up mirror output using HTML Tidy (y/n)? y

As above.

And the final configuration option:

Add additional files to the static output (y/n)? y

Again, we answer as above for Level 1.

Having written the configuration to a file, for the next question, about default.cfg, we can say yes, because it is the site we most often build, but when configuring multiple sites, we would usually expect to say no.

Would you like to make the static site now (y/n)?

At this point, we can go ahead and let the setup pass over to the main script to build the site. In any case, our configuration is complete and the script will provide the command that we can run to build it in future.

Example 2: Archiving a Directory

For this second example, we show how MakeStaticSite can neatly archive a URL with a path, i.e. we mirror a directory and all the pages underneath. Furthermore, the default settings do this in such a way that cuts directories to shorten the resulting URLs.

These pages are from the World Wide Web Consortium’s (W3C) Web Accessibility Initiative’s guidance on various roles, https://www.w3.org/WAI/roles/. Our usage scenario is to have this guide to hand for personal use, as a reference archive, an index to online resources on accessibility, which is in accordance with the W3C WAI terms of usage.

As we have no control over external sites, the options available are more limited. Furthermore, we are not going to publish this website, though we might consider using it in meetings. In this instance, we may choose to run the setup at level 0.

So, we have only one option to configure:

Please enter the value for url: https://www.w3.org/WAI/roles/

Provided the URL responds to web requests, a preview of the generated configuration file will be displayed with a number of auto-generated options, similar to the following.

################################################
# Configuration file for makestaticsite.sh
# generated by setup.sh
# Created: 2023-05-10 10:35:22
# Last modified: 
################################################

url=https://www.w3.org/WAI/roles/      # URL of website being snapped
require_login=n                        # Does the site require a login (y/n)?
ssl_checks=n                           # Validate certificate in encrypted (SSL/TLS) connections (y/n)?
extra_domains=                         # Additional domains for asset retrieval and offline access
wget_extra_options=-X/wp-json,/wp-admin --reject xmlrpc*,'index.html?'* --limit-rate=500k # Additional command line options for Wget
input_urls_file=                       # Name of Wget input file for custom crawl URLs
wget_extra_urls=y                      # Use Wget to retrieve additional assets from the domain (y/n)?
wget_post_processing=y                 # Further refine the output from the first run of Wget (y/n)?
archive=y                              # Add the mirror site to an archive (y/n)?
local_sitename=www_w3_org              # Directory name for the mirror site and stem of zip file.
wp_cli=n                               # Use WP-CLI to carry out tweaks on WordPress database (y/n)?
use_snippets=n                         # Use snippets to create page variants (y/n)?
upload_zip=y                           # Create a zip file for distribution with the static website (y/n)?
zip_filename=www_w3_org.zip            # Zip filename for static snapshot
zip_download_folder=download           # Storage location for zip download
deploy=n                               # Deploy the output on a server (y/n)?
htmltidy=n                             # Clean up mirror output using HTML Tidy (y/n)?
add_extras=n                           # Add additional files to the static output (y/n)?

Review the output and then answer:

Do you wish to write this configuration to a file (y/n)? y

We say yes, and give it a name, but no to the offer of default configuration.

Then agree to run makestaticsite.sh on the new configuration, which shouldn’t take much more than a minute.

For those in a hurry
(or who wish to automate)

The above can actually be performed without any interaction, by running setup with a couple of options:

./setup.sh -u https://www.w3.org/WAI/roles/

where -u is a flag denoting ‘unattended’ and takes no argument. It is followed by a non-option parameter for the URL. When running a script in unattended mode, assumptions are made on certain yes/no questions to enable the script to continue without intervention. Similarly, once the configuration is complete, it will call makestaticsite.sh with -u for building the site.

Review of Output

Wget is run with the --no-parent option, to limit traversal to the current level and below, whilst including resources such as images, CSS and fonts, from higher levels. Wget will do this without changed the relative hierarchy, but in MakeStaticSite setting the runtime option, parent_dirs_mode, to contain, will make the layout more portable by moving such resources into an assets folder at the same level as the captured URL.

Furthermore, when capturing a URL with a path, MakeStaticSite initially generates nested directories corresponding to the path, i.e. the directories WAI/ and WAI/roles/. However, given that the pages are contained under the path, the mirror’s directories are compacted, effectively removing parent folders. This compaction is due a default setting, mss_cut_dirs=yes, in constants.sh.

We illustrate the result by a tree view inside the newly created mirror archive directory using the tree command to show nesting three directories deep.

$ tree -L 3
.
└── www.w3.org
    ├── assets20230615_104224
    │   ├── analytics
    │   ├── assets20230615_104224
    │   └── WAI
    ├── designers
    │   └── index.html
    ├── developers
    │   └── index.html
    ├── index.html
    ├── managers
    │   └── index.html
    ├── new
    │   └── index.html
    ├── policy-makers
    │   └── index.html
    ├── robots.txt
    ├── sitemap.xml
    ├── testers
    │   └── index.html
    ├── trainers
    │   └── index.html
    ├── users
    │   └── index.html
    └── writers
        └── index.html

14 directories, 12 files

The top-level folder, www.w3.org, corresponds to the domain of the URL. It can be omitted from the output by specifying -nH as a Wget option (as in the first example).

Immediately underneath is the top-level web content, including the index page and folders. Note that the default name for the assets folder is assets, but as it coincides with an existing folder of the same name, it has a timestamp appended, hence: assets20230615_104224.

Its contents originally resided inside https://www.w3.org/WAI/, for example its main stylesheet is at https://www.w3.org/WAI/assets/css/style.css?1683687637037890127. However, if we retained that structure, then navigation and distribution, e.g., as a zip archive, would need many directories. By incorporating such content, as above, we could instead zip the roles directory.

If the original layout needs to be preserved, then set mss_cut_dirs=no to preserve the nesting of directories.

Conclusion

Moving successively through the levels helps to inform the requirements for site capture and deployment. In general, for retrieving external sites for archival only, Level 0 may well suffice. For sites one maintains, Level 1 should provide almost all the required options, but finer control is available in Level 2.

To review the setup structure, please refer to a summary.

This page was published on 8 May 2023 and last updated on 26 October 2024.