Runtime Options


MakeStaticSite is highly configurable through numerous options, as explained in the options management page. In brief, all options are initially defined in lib/constants.sh. When setup.sh is run, it generates a site-specific configuration covering the most fundamental settings, which get stored as a configuration (.cfg) file.

In addition, MakeStaticSite has a number of runtime options whose global defaults, set in the constants file, can be overridden on a per-site basis by editing the respective .cfg file; simply append the constant definitions to the existing content. It is recommended that you make a backup first.

As at version 0.30.13.1, the options are (with localisation support indicated by a ‘*’ – note that the list can be changed):

max_redirects
Default: 5
Maximum number of redirects allowed for determining the effective URL being mirrored. In this case the URL originally entered will be replaced by the effective URL.
etc_hosts
Default: /etc/hosts
Location of hosts file. When creating the source website locally, it can be useful for url_base and deploy_domain have the same domain, particularly to test certain functionality such reCAPTCHA. In this case, with the aid of the constants, ip4re and ip6re, MakeStaticSite will inspect the hosts file for an entry that anticipates the DNS for the domain and temporarily comment it out when it comes to deployment, so that there’s no interruption to site editing.
mss_file_permissions
Default: 600
Default Unix file permissions for file creation.
mss_dir_permissions
Default: 700
Default Unix file permissions for directory creation.
tmp_dir
Default: tmp
Directory where temporary files are to be stored. These are mainly to support Wget, including input files and cookies.
tab
Default: " "
Tab spacing for file outputs, e.g. the site map (XML) file.
host_dir
Default: auto
Host directory mode when creating a site mirror with Wget; empty or ‘no’ corresponds to -nh, effectively removing one directory level. Otherwise, the host directory is included in the output.
credentials_rc_file
Default: .netrc
‘Run commands’ file for (temporary) storage of credentials — either .wgetrc or .netrc
credentials_cleanup*
Default: yes
Delete references to credentials in temp files and .rc file on completion of run (y/n)?
credentials_manage_cmd
Default: pass
Path to binary for managing (and encrypting) credentials.
credentials_manage_cmd_url
Default: https://www.passwordstore.org/#download
URL where credentials manager may be downloaded.
credentials_storage_namespace
Default: MSS
MakeStaticSite-specific directory for storing credentials (usernames, passwords, tokens, etc.).
credentials_storage_mode
Default: plain
How to store credentials: config to store in the configuration file, as-is; plain to store separately, as-is, in plain text; encrypt to store separately and encrypt.
credentials_extension
Default: gpg
Encryption file type extension.
credentials_home
Default: "$HOME/.password-store"
Password-designated directory under which credentials are stored.
wget_cmd
Default: wget
Path to Wget binary. If wget is available in PATH, then simply enter wget. Otherwise, enter its full path.
wget_error_level
Default: 6
The lowest Wget error code tolerated or else aborts (>8 for no tolerance).
wget_user_agent
Default: mss
The browser user agent to be used by Wget. When set to wget, Wget/version will be submitted; if set to mss, then MakeStaticSite/version (Wget/version; MSS site URL) will be used; if an empty string, then no user agent string will be sent; otherwise enter a custom string, e.g. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15). If no string is supplied, then access might be refused by the host’s web application firewall.
wget_protocol_relative_urls*
Default: yes
Allow protocol-relative URLs to be fetched by Wget by prefixing a protocol (y/n).
wget_protocol_prefix
Default: https
Protocol to prefix protocol-relative URLs.
wget_http_login_field
Default: user
Wget’s user login field for HTTP authentication.
wget_http_password_field
Default: password
Wget’s password field for HTTP authentication.
wget_cookies
Default: cookies.txt
The name of the cookies file used by Wget.
wget_cookies_min_filelength
Default: 5
The minimum number of lines for a valid non-empty Wget cookies file.
wget_cookies_nullify_user_agent*
Default: no
When wget_user_agent is defined above as a non-empty string, should it be reset to null for handling cookies (yes/no)
wget_post
Default: wget_post.txt
The name of the file containing POST data.
wget_inputs_main_stem
Default: wget_inputs_main.txt
The name of the input file for wget, used in the first run. This file comprises URLs that might not be reachable by standard crawls.
wget_inputs_extra_stem
Default: wget_inputs_extra.txt
The name of the input file for wget, used in subsequent runs. This file is auto-generated during a deep search of URLs.
wget_mirror_options
Default: (--recursive --timestamping --level=inf --no-remove-listing)
The standard default settings for wget to generated a mirror. They can be tweaked. For example, to create only a partial mirror, set --level to be a number.
wget_core_options
Default: ("${wget_mirror_options[@]}" --convert-links --adjust-extension --page-requisites)
These are the basic options for wget to crawl a URL and download a static version. This should only be changed if there’s undesirable behaviour. Additional options should be specified per site in the .cfg file.
wget_wayback_core_options
Default: ()
Specify additional recursion options in () brackets. This is experimental, but the kind of options envisaged include something like: (--recursive --level=2).
wget_default_page
Default: index.html
The Wget --default-page option, used as the file name for saving directory indices.
wget_adjust_extensions
Default: html,css
The Wget list of file extensions that have the extension appended to match the HTTP response header when the extension doesn’t exist.
wget_no_parent
Default: auto
Should capturing URLs with directories include the --no-parent option? Set to auto or yes to check and add automatically; manual to check and ask during runtime; otherwise no intervention.
wget_extra_core_options
Default: (-r -l inf -nc --adjust-extension)
Used in phase 3 (augment assets). Similar to wget_core_options, these are the basic options for wget to crawl a URL and download a static version. They are slightly different, with -nc (no clobber) instead of --page-requisites, reflecting the context of targeting supporting assets (such as images) to augment an existing site. As the retrieval method is blunt, not specifying this could be very time-consuming.
wget_progress_indicator
Default: (--show-progress --progress=bar:force:noscroll)
Wget progress bar, currently used when output_level=quiet (leave empty to omit), used when running Wget in both phases 2 and 3. It gives minimal updates per download during site capture, whilst recording more details may be recorded in the log file.
wget_threads
Default: 1
The number of parallel threads for running Wget (integer). This is a recently-introduced feature and should be regarded as experimental.
wget_extra_urls_depth
Default: 5
The number of times to call wget_extra_urls() to scan for and fetch extra URLs (integer).
feed_html
Default: feed/index.xml
Newsfeeds are generally XML standards, whereas Wget typically saves these with a .html extension and updates anchors accordingly. The URLs of such feeds need replacing and this setting, currently targeted at WordPress, which stores feeds in a number of feed/ folders, specifies the tail of the invalid URLs.
feed_xml
Default: feed/index.xml
This setting specifies the tail of valid replacement feed URLs (ending .xml) for feed_html URLs. To properly support this in deployment, on the web server, add index.xml as the last entry to the DirectoryIndex directive in .htaccess at the site’s root.
warc_output
Default: no
Generate WARC (Web ARChives) (y/n)?
warc_header_format
Default: mm
Header format for WARC files: default will use Wget defaults; MSS will generate additional fields: software: MakeStaticSite/version (Wget/version), operator: $USER environment variable, hostname $HOSTNAME. Otherwise a non-empty string conforming to ‘warcinfo’ standard, wrapped in quotes, e.g. "Operator: Fred Blogs Archival Services|software:MakeStaticSite version", fields separated by ‘|’.
Reference: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#warcinfo.
warc_cdx
Default: yes
Write CDX index files (y/n)?
warc_compress
Default: yes
Compress WARC files using gzip (y/n)?
warc_combine_output
Default: yes
Combine enumerated WARC files into one file (y/n)?
url_asset_capture_level
Default: 3
For determining the capture level (0 fewest, 5 most) for URL matching of assets to download and localise.
url_wildcard_capture*
Default: no
Use a wildcard for matching URLs in asset processing (y/n)? If set to ‘yes’, when capturing asset URLs on pages, a simple regex capture group will be used instead of the input file of itemised URLs generated in phases 2 and 3.
url_separator_chars
Default: "[,:(]"
Additional class of separator characters (regular expression capture class) of URLs to be captured: for example, data-src (comma) and JSON (colon). Leave empty to omit.
url_grep_search_pattern
Default: "[^\\\"'<) ]"
URL terminating characters in grep searches (ERE notation); if link text contains ')', then this character can be removed.
web_source_extensions
Default: htm,html,xml,txt,css
List of web document file extensions, intended for assets search.
web_source_exclude_dirs*
Default:
Comma-separate list of directories to exclude (relative to working mirror directory).
web_element_extensions
Default: js,css,svg,map,ico
Comma-separate list of file extensions for standard Web page components .
font_extensions
Default: cff,ttf,eot,woff,woff2
Comma-separate list of file extensions for Web fonts .
image_extensions
Default: jpeg,jpg,gif,png
Comma-separate list of file extensions for Web images.
audiovideo_extensions
Default: heic,webp,mp3,m4a,ogg,wav,avi,mpg,mp4,mov,ogv,wmv,3gp,3gp2
Comma-separate list of file extensions for audio and video assets.
doc_extensions
Default: pdf,doc,docx,odt,ppt,xls,xlsx
Comma-separate list of file extensions for office documents.
asset_extensions
Default: $web_element_extensions,$image_extensions,$audiovideo_extensions,$doc_extensions,$font_extensions
List of file extensions for assets that may be retrieved by Wget in phase 3 (derived from WordPress.com allowable upload file types). If no extensions are defined, then cURL will be used to remove non-HTML assets, but all other assets will be accepted.
asset_extensions_external
Default: $web_element_extensions,$image_extensions,$font_extensions
List of file extensions for assets from external (3rd-party domains), a more limited set than for asset_extensions.
relativise_primarydomain_assets*
Default: yes
Convert absolute links to relative links for primary domain assets (y/n)?
shorten_longlines*
Default: auto
Break apart long lines to reduce processing time: set as off to not touch any files; auto to decide whether or not shorten on a per file basis according to criteria based on file size and number of lines in document; on to apply line shortening to all files.
average_linelength_max*
Default: 1000
When shorten_longlines=auto, shorten lines when the average line length exceeds this number of characters.
longest_linelength_max
Default: 100000
When shorten_longlines=auto, shorten lines when the longest line length exceeds this number of characters.
newline_inserts
Default: ('<\/script></<\/script>$'"'\n'"'<' '<\/style></<\/style>$'"'\n'"'<' '<\div></<\div>$'"'\n'"'<' '\\\"\,\\\"/\\\"\,$'"'\n'"'\\\"' '\}@media/\}$'"'\n'"'@media' '\}@font-face/\}$'"'\n'"'@font-face')
Replacements to be made for shortening line length (array).
prune_query_strings*
Default: no
Remove query strings appended to paths and URLs in anchors limited to files of type given in query_prune_list (y/n)?
query_prune_list
Default: js,css,svg,png,$font_extensions
List of file extensions in requests that may have query string appended for versioning or other non-essential purposes that can be pruned without loss of functionality.
extra_assets_allow_query_strings*
Default: yes
Allow Wget to fetch additional URLs with query strings in phase 3 (y/n)?
extra_assets_query_strings_limit
Default: 100000
Only fetch URLs with query strings when the total number of assets is less than this number.
extra_assets_mode
Default: contain
How assets from extra domains should be incorporated: empty or 'off' to keep in separate directories under mirror ID; 'contain' will move the directories inside the assets directory (see separate constant).
assets_directory
Default: webassets
Directory immediately under primary domain directory where extra assets are stored per extra domain (set empty to place assets in root).
imports_directory
Default: imports
Directory immediately under assets_directory for storing assets imported for extra domains.
parent_dirs_mode
Default: contain
For URLs with directories, specify what to do with assets that lie outside the mirrored directory: empty or off to keep assets where they are after the Wget mirror; contain to move the directories inside the assets directory.
external_dir_links
Default:
Specify what to do with links to resources on same domain, but outside the mirrored tree: empty or off to not make relative, only point to the deployment domain; local to make relative, to the assets directory.
mss_cut_dirs
Default: yes
Option to cut directories, effectively shortening the URL. Enter yes or on for a MakeStaticSite-specific cut that moves content from the directory path specified in the URL up to the root directory. When this is enabled, there is no need (and it's not recommended) to specify Wget option --cut-dirs. Leave empty or enter no or off to disable (when Wget option --cut-dirs may be used instead).
cors_enable*
Default: yes
Enable cross-origin resources once downloaded (y/n)?
link_rel_canonical
Default: yes
Include <link rel="canonical"...> tag in header (yes or no)? This helps search engines to index the site.
link_href_tail
Default: /
The tail of canonical URLs and internal links, e.g. index.html or a trailing slash, /, which is assumed if left blank.
a_href_tail
Default:
The tail for internal links, e.g. index.html or / (leave blank for /). The value should normally match link_href_tail.
robots_create
Default: yes
Generate and overwrite robots.txt (yes or no)? Whilst a CMS may generate a virtual robots file, it might be unduly restrictive or not be a good fit for the static output. Selecting 'yes' signals the generation of a new robots.txt file.
robots_default_file
Default: robots.txt
File name for default robots.txt (inside lib/files/). A sitemap will subsequently be appended.
sitemap_create
Default: yes
Generate and overwrite the site map file (yes or no)? Whilst a CMS may generate a virtual site map, it might not be a good fit for the static output. Selecting 'yes' signals the generation of a new site map, which currently is constructed from a listing off all pages on the site.
sitemap_file
Default: sitemap.xml
Name of sitemap (XML) file.
sitemap_schema
Default: http://www.sitemaps.org/schemas/sitemap/0.9
Site map XML schema URL.
sitemap_file_extensions
Default: htm,html
A comma-separated list of file extensions allowed for inclusion in the sitemap file.
mod_wayback
Default: mod_wayback.sh
Wayback Machine module filename.
wayback_cli*
Default: no
Use a third-party client to download sites from the Wayback Machine (y/n)? If not set to 'yes', then any Wayback sites will be retrieved natively using default (Wget).
use_wayback_id*
Default: no
When retrieving natively, capture the original page rather than the Wayback Machine's processed version (y/n)? Whilst this is more faithful to the original format for individual pages, the overall output, links and navigation, etc. is more likely to be fragmented.
wayback_hosts
Default: web.archive.org,www.webarchive.org.uk
Comma-separated list of domains where a Wayback Machine is hosted.
wayback_memento_check*
Default: no
Perform dynamic check for Memento site using HTTP request header (y/n)?
wayback_header
Default: Memento-Datetime:
The search string that will be used in the HTTP header request to identify support for Memento URLs.
wayback_mementos_only*
Default: yes
Only download assets with Memento URLs (y/n)? This resets page_element_domains to be empty, keeping the capture strictly to the Wayback Machine.
wayback_assets_mode
Default: original
How to incorporate assets downloaded during phase 3: off to take no action, not use any; original to recreate original layout as far as possible (timestamps removed); timestamp to leave and reference assets in Wayback Machine timestamped folders.
wayback_timestamp_policy
Default: any
Timestamp policy: exact to only download and refer to assets with exact timestamp; range to download subject to specified date range (see below).
wayback_date_from_earliest
Default:
Earliest date timestamp (YYYYMMDDhhmmss) for Wayback Machine snapshot files.
wayback_date_to_latest
Default:
Latest date timestamp (YYYYMMDDhhmmss) for Wayback Machine snapshot files.
wayback_snapshot_path_depth
Default: 3
The number of directories to traverse to get to the original domain directory (a magic number, default set for Internet Archive, until a suitable algorithm is determined).
wayback_search_regex
Default: "href[[:space:]]*=[[:space:]]*[\'\"]\?[^#:>\'\"/][^:>]\+[[:space:]]*[\'\"]\?[[:space:]]*>"
Basic regular expression for matching the href attribute in an anchor.
wayback_matchtype
Default: prefix
Wayback Machine CDX server match type: domain will return all results from host domain and all its subdomains; host will return results from host domain, but no other domains; exact will return results matching URL exactly; and 'prefix' will return results for all results under a URL path. Currently, the only options supported are prefix (the default) or exact.
wayback_domain_original*
Default: yes
Restore original domain folder when generating a mirror of site archived by the Wayback Machine (y/n)? This is derived from the second URL in the Memento URL, which is generally the URL that was originally captured by the Wayback Machine.
wayback_domain_original_sitemap*
Default: yes
Restore original URLs when generating the sitemap for a site archived by the Wayback Machine (y/n)?
wayback_newsfeed_clean*
Default: yes
Delete references to Wayback Machine host for newsfeeds (y/n)?
wayback_code_clean*
Default: yes
Delete (JavaScript) Playback code inserted by Wayback Machine (y/n)? Options: no to keep as is; yes to restore the original link; otherwise convert to a relative link.
wayback_code_re
Default: regular expression
Regular expression to match code inserted by the Wayback Machine.
wayback_folders_clean*
Default: yes
Delete supporting directories created by the Wayback Machine that appear in the mirror (y/n)?
wayback_folders
Default: _static
Comma-separated list of Wayback Machine directory names that may appear in the mirror.
wayback_comments_clean*
Default: yes
Delete comments inserted by Wayback Machine (y/n)?
wayback_comments_re
Default: regular expression
Regular expression to match comments appendeded by the Wayback Machine.
wayback_links_clean
Default: no
Strip Wayback Machine prefixes from link URLs to restore the original links in web pages (y/n)?
wayback_machine_downloader_url
Default: https://github.com/hartator/wayback-machine-downloader
URL of Hartator's Wayback Machine Downloader GitHub repository.
wayback_machine_downloader_cmd
Default: wayback_machine_downloader
[Path to] binary for the Wayback Machine downloader.
wayback_machine_only
Default:
Restrict downloading to URLs that match this filter (enclose in slashes // to treat as a regex and place in quotes). For example, to include only HTML files with .html extension use: "/.*\.html/"
wayback_machine_excludes
Default:
Skip downloading of URLs that match this filter (enclose in slashes // to treat as a regex and place in quotes). For example, to exclude ASP files use: "/.*\.asp.*/"
wayback_machine_statuscodes
Default:
Accepted status codes. The default is 200 — OK. Enter all for 30x (redirections), 40x (not found, forbidden) and 50x (server error).
wget_reject_clause
Default: *login*,*logout*
For connections that require a login, wget is run with a --reject parameter to avoid logouts.
mod_wp
Default: mod_wp.sh
Filename of the WordPress module, as stored in the lib/ directory.
wp_cli_install
Default: https://wp-cli.org/#installing
The URL of where to install WP-CLI.
wp_permalinks_postname
Default: yes
The permalinks structure has a key bearing on the output. This setting will force it to make use of the post name rather than post ID or dates.
wp_search_plugin
Default: https://makestaticsite.sh/download/contrib/wp-static-search-1-1-1.zip
The URL of a (temporary) version of the WP Static Search plugin tweaked to work offline.
wp_search_dir
Default: wp-static-search
Directory name of search plugin. Within the standard WordPress layout, a directory of this name will be created under the wp-plugins/ directory.
wp_remove_query_strings
Default: yes
Remove query strings from WordPress core URLs.
wp_remove_shortlink
Default: yes
Remove WordPress shortlinks.
wp_disable_embeds
Default: yes
Disable embeds in WordPress.
wp_disable_xmlrpc
Default: yes
Disable support for XML-RPC in WordPress.
wp_remove_wlwmanifest_link
Default: yes
Remove Windows Live Writer <link> tag from header.
wp_remove_rest_api_links
Default: yes
Remove support for REST API in WordPress.
wp_remove_rsd_link
Default: yes
Remove Really Simple Discovery (RSD) tag in WordPress.
htmltidy_cmd
Default: tidy
The command to invoke HTML Tidy, which is usually tidy.
htmltidy_options
Default: -m -q -indent --indent-spaces 2 --show-filename yes --tidy-mark no
Command line options for HTML Tidy. Errors will be collated in a single file in the MakeStaticSite root folder
htmltidy_errors_file
Default: errors_htmltidy.txt
The error reporting generated by HTML Tidy will be saved in this file.
htmltidy_source_extensions
Default: "htm,html"
List of web document file extensions intended for HTML Tidy.
ink_error
Default: red
(Similarly ink_warning (amber), ink_ok (green), ink_info (lime).) Ink colours supported on all displays, using standard labels: black, red, green, yellow, blue, magenta, cyan, and white. A few additional colours that need 256-colour support, with custom labels: amber, lime, paleblue.
clean_query_extensions*
Default: no
Remove query strings from filenames (yes/no).
system_files_cleanup
Default: Thumbs.db,.DS_Store
List of unwanted system files, to be removed from mirror output.
web_print_runtime_data*
Default: no
Append MakeStaticSite runtime session data summary to web pages (yes/no)?
timezone
Default: local
Timestamps are used for marking the creation of .cfg files and for mirror directories. There are three options: local (local time), utc (UTC time, with no local adjustment), and utclocal (local time specified in relation to UTC).
output_level
Default: quiet
This determines the level of reporting to the terminal when running makestaticsite.sh. There are four options with increasing levels of output: silent, quiet, normal and verbose. The setting for output_level tends to be quieter than that for logs (see following entry).
log_level
Default: normal
This determines the level of logging to file when running makestaticsite.sh. There are four options with increasing levels of output: silent, quiet, normal and verbose. The setting for log_level tends to be more verbose than that for terminal output (see previous entry).
log_filename
Default: makestaticsite.log
The file name for logs. A single file stores all logged activity; separate processes (manual or automated) can carry out log rotation, as required.
trap_errors
Default: no
Trap errors with immediate script termination (yes/no). This is used to support debugging during development. It stops the script if any command [in a pipeline] fails, if a variable is unset, or an exit code indicates failure, i.e. is nonzero. It then reports the system error.
run_unattended
Default: no
In a few instances, makestaticsite.sh may prompt the user with a warning message and then ask whether or not to continue; for example, after encountering an error code on running wget or when it is about to write data to a non-empty directory. If run_unattended is set to yes, it will be generally assumed that the choice is made to always continue, without manual intervention.
extras_dir
Default: extras
This is the name of the directory containing any files — in nested folders relative to the site's web root — that should be added after the mirror has been generated.
force_ssl
Default: yes
Convert anchors to deployment domain to https (yes/no). The name of this constant deliberately echoes the use in WordPress.
force_domains
Default: yes
Automatically replace occurrences of the source domain with the deployment domain (yes/no). If set to 'no', then a prompt will be issued at runtime reporting on the number of matches found.
domain_match_prefix
Default: //
Domain prefix for matches (in sed).
domain_subs_prefix
Default: //
Domain prefix for substitutions (in sed).
rsync_options
Default: (-a -z -h)
Core rsync options (excludes the output level). -a archive mode preserves permissions, ownership, and modification times, etc.; -z compression during transfer; -h outputs numbers in human-readable format

It's recommended that other options are left as they are.

This page was published on 1 November 2022 and last updated on 19 December 2024.