MakeStaticSite is highly configurable through numerous options, as explained in the options management page. In brief, all options are initially defined in lib/constants.sh. When setup.sh is run, it generates a site-specific configuration covering the most fundamental settings, which get stored as a configuration (.cfg) file.
In addition, MakeStaticSite has a number of runtime options whose global defaults, set in the constants file, can be overridden on a per-site basis by editing the respective .cfg file; simply append the constant definitions to the existing content. It is recommended that you make a backup first.
As at version 0.31.1, the options are (with localisation support indicated by a ‘*’ – note that the list can be changed):
- max_redirects
- Default: 5
Maximum number of redirects allowed for determining the effective URL being mirrored. In this case the URL originally entered will be replaced by the effective URL. - etc_hosts
- Default: /etc/hosts
Location of hosts file. When creating the source website locally, it can be useful for url_base and deploy_domain have the same domain, particularly to test certain functionality such reCAPTCHA. In this case, with the aid of the constants, ip4re and ip6re, MakeStaticSite will inspect the hosts file for an entry that anticipates the DNS for the domain and temporarily comment it out when it comes to deployment, so that there’s no interruption to site editing. - mss_file_permissions
- Default: 600
Default Unix file permissions for file creation. - mss_dir_permissions
- Default: 700
Default Unix file permissions for directory creation. - tmp_dir
- Default: tmp
Directory where temporary files are to be stored. These are mainly to support Wget, including input files and cookies. - tab
- Default: " "
Tab spacing for file outputs, e.g. the site map (XML) file. - host_dir
- Default: auto
Host directory mode when creating a site mirror with Wget; empty or ‘no’ corresponds to -nh, effectively removing one directory level. Otherwise, the host directory is included in the output. - credentials_rc_file
- Default: .netrc
‘Run commands’ file for (temporary) storage of credentials — either .wgetrc or .netrc - credentials_cleanup*
- Default: yes
Delete references to credentials in temp files and .rc file on completion of run (y/n)? - credentials_manage_cmd
- Default: pass
Path to binary for managing (and encrypting) credentials. - credentials_manage_cmd_url
- Default:
https://www.passwordstore.org/#download
URL where credentials manager may be downloaded. - credentials_storage_namespace
- Default: MSS
MakeStaticSite-specific directory for storing credentials (usernames, passwords, tokens, etc.). - credentials_storage_mode
- Default: plain
How to store credentials: config to store in the configuration file, as-is; plain to store separately, as-is, in plain text; encrypt to store separately and encrypt. - credentials_extension
- Default: gpg
Encryption file type extension. - credentials_home
- Default: "$HOME/.password-store"
Password-designated directory under which credentials are stored. - wget_cmd
- Default: wget
Path to Wget binary. If wget is available in PATH, then simply enter wget. Otherwise, enter its full path. - wget_error_level
- Default: 6
The lowest Wget error code tolerated or else aborts (>8 for no tolerance). - wget_user_agent
- Default: mss
The browser user agent to be used by Wget. When set to wget, Wget/version will be submitted; if set to mss, then MakeStaticSite/version (Wget/version; MSS site URL) will be used; if an empty string, then no user agent string will be sent; otherwise enter a custom string, e.g. Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15). If no string is supplied, then access might be refused by the host’s web application firewall. - wget_protocol_relative_urls*
- Default: yes
Allow protocol-relative URLs to be fetched by Wget by prefixing a protocol (y/n). - wget_protocol_prefix
- Default: https
Protocol to prefix protocol-relative URLs. - wget_http_login_field
- Default: user
Wget’s user login field for HTTP authentication. - wget_http_password_field
- Default: password
Wget’s password field for HTTP authentication. - wget_cookies
- Default: cookies.txt
The name of the cookies file used by Wget. - wget_cookies_min_filelength
- Default: 5
The minimum number of lines for a valid non-empty Wget cookies file. - wget_cookies_nullify_user_agent*
- Default: no
When wget_user_agent is defined above as a non-empty string, should it be reset to null for handling cookies (yes/no) - wget_post
- Default: wget_post.txt
The name of the file containing POST data. - wget_inputs_main_stem
- Default: wget_inputs_main.txt
The name of the input file for wget, used in the first run. This file comprises URLs that might not be reachable by standard crawls. - wget_inputs_extra_stem
- Default: wget_inputs_extra.txt
The name of the input file for wget, used in subsequent runs. This file is auto-generated during a deep search of URLs. - wget_mirror_options
- Default: (--recursive --timestamping --level=inf
--no-remove-listing)
The standard default settings for wget to generated a mirror. They can be tweaked. For example, to create only a partial mirror, set --level to be a number. - wget_core_options
- Default: ("${wget_mirror_options[@]}"
--convert-links --adjust-extension
--page-requisites)
These are the basic options for wget to crawl a URL and download a static version. This should only be changed if there’s undesirable behaviour. Additional options should be specified per site in the .cfg file. - wget_wayback_core_options
- Default: ()
Specify additional recursion options in () brackets. This is experimental, but the kind of options envisaged include something like: (--recursive --level=2). - wget_default_page
- Default: index.html
The Wget --default-page option, used as the file name for saving directory indices. - wget_adjust_extensions
- Default: html,css
The Wget list of file extensions that have the extension appended to match the HTTP response header when the extension doesn’t exist. - prune_filename_extensions_querystrings
- Default: yes
Remove file name extensions thus added by Wget via --adjust-extension option (yes/no)? - wget_no_parent
- Default: auto
Should capturing URLs with directories include the --no-parent option? Set to auto or yes to check and add automatically; manual to check and ask during runtime; otherwise no intervention. - wget_extra_core_options
- Default: (-r -l inf -nc
--adjust-extension)
Used in phase 3 (augment assets). Similar to wget_core_options, these are the basic options for wget to crawl a URL and download a static version. They are slightly different, with -nc (no clobber) instead of --page-requisites, reflecting the context of targeting supporting assets (such as images) to augment an existing site. As the retrieval method is blunt, not specifying this could be very time-consuming. - wget_progress_indicator
- Default: (--show-progress
--progress=bar:force:noscroll)
Wget progress bar, currently used when output_level=quiet (leave empty to omit), used when running Wget in both phases 2 and 3. It gives minimal updates per download during site capture, whilst recording more details may be recorded in the log file. - wget_threads
- Default: 1
The number of parallel threads for running Wget (integer). This is a recently-introduced feature and should be regarded as experimental. - wget_extra_urls_depth
- Default: 5
The number of times to call wget_extra_urls() to scan for and fetch extra URLs (integer). - feed_html
- Default: feed/index.xml
Newsfeeds are generally XML standards, whereas Wget typically saves these with a .html extension and updates anchors accordingly. The URLs of such feeds need replacing and this setting, currently targeted at WordPress, which stores feeds in a number of feed/ folders, specifies the tail of the invalid URLs. - feed_xml
- Default: feed/index.xml
This setting specifies the tail of valid replacement feed URLs (ending .xml) for feed_html URLs. To properly support this in deployment, on the web server, add index.xml as the last entry to the DirectoryIndex directive in .htaccess at the site’s root. - warc_output
- Default: no
Generate WARC (Web ARChives) (y/n)? - warc_header_format
-
Default: mm
Header format for WARC files: default will use Wget defaults; MSS will generate additional fields: software: MakeStaticSite/version (Wget/version), operator: $USER environment variable, hostname $HOSTNAME. Otherwise a non-empty string conforming to ‘warcinfo’ standard, wrapped in quotes, e.g. "Operator: Fred Blogs Archival Services|software:MakeStaticSite version", fields separated by ‘|’.
Reference: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#warcinfo. - warc_cdx
- Default: yes
Write CDX index files (y/n)? - warc_compress
- Default: yes
Compress WARC files using gzip (y/n)? - warc_combine_output
- Default: yes
Combine enumerated WARC files into one file (y/n)? - url_asset_capture_level
- Default: 3
For determining the capture level (0 fewest, 5 most) for URL matching of assets to download and localise. - url_wildcard_capture*
- Default: no
Use a wildcard for matching URLs in asset processing (y/n)? If set to ‘yes’, when capturing asset URLs on pages, a simple regex capture group will be used instead of the input file of itemised URLs generated in phases 2 and 3. - url_separator_chars
- Default: "[,:(]"
Additional class of separator characters (regular expression capture class) of URLs to be captured: for example, data-src (comma) and JSON (colon). Leave empty to omit. - url_grep_search_pattern
- Default: "[^\\\"'<) ]"
URL terminating characters in grep searches (ERE notation); if link text contains ')', then this character can be removed. - web_source_extensions
- Default: htm,html,xml,txt,css
List of web document file extensions, intended for assets search. - web_source_exclude_dirs*
- Default:
Comma-separate list of directories to exclude (relative to working mirror directory). - web_element_extensions
- Default: js,css,svg,map,ico
Comma-separate list of file extensions for standard Web page components . - font_extensions
- Default: cff,ttf,eot,woff,woff2
Comma-separate list of file extensions for Web fonts . - image_extensions
- Default: jpeg,jpg,gif,png
Comma-separate list of file extensions for Web images. - audiovideo_extensions
- Default:
heic,webp,mp3,m4a,ogg,wav,avi,mpg,mp4,mov,ogv,wmv,3gp,3gp2
Comma-separate list of file extensions for audio and video assets. - doc_extensions
- Default: pdf,doc,docx,odt,ppt,xls,xlsx
Comma-separate list of file extensions for office documents. - asset_extensions
-
Default:
$web_element_extensions,$image_extensions,$audiovideo_extensions,$doc_extensions,$font_extensions
List of file extensions for assets that may be retrieved by Wget in phase 3 (derived from WordPress.com allowable upload file types). If no extensions are defined, then cURL will be used to remove non-HTML assets, but all other assets will be accepted. - asset_extensions_external
- Default:
$web_element_extensions,$image_extensions,$font_extensions
List of file extensions for assets from external (3rd-party domains), a more limited set than for asset_extensions. - relativise_primarydomain_assets*
- Default: yes
Convert absolute links to relative links for primary domain assets (y/n)? - shorten_longlines*
- Default: auto
Break apart long lines to reduce processing time: set as off to not touch any files; auto to decide whether or not shorten on a per file basis according to criteria based on file size and number of lines in document; on to apply line shortening to all files. - average_linelength_max*
- Default: 1000
When shorten_longlines=auto, shorten lines when the average line length exceeds this number of characters. - longest_linelength_max
- Default: 100000
When shorten_longlines=auto, shorten lines when the longest line length exceeds this number of characters. - newline_inserts
- Default:
('<\/script></<\/script>$'"'\n'"'<'
'<\/style></<\/style>$'"'\n'"'<'
'<\div></<\div>$'"'\n'"'<'
'\\\"\,\\\"/\\\"\,$'"'\n'"'\\\"'
'\}@media/\}$'"'\n'"'@media'
'\}@font-face/\}$'"'\n'"'@font-face')
Replacements to be made for shortening line length (array). - prune_query_strings*
- Default: no
Remove query strings appended to paths and URLs in anchors limited to files of type given in query_prune_list (y/n)? - query_prune_list
- Default: js,css,svg,png,$font_extensions
List of file extensions in requests that may have query string appended for versioning or other non-essential purposes that can be pruned without loss of functionality. - extra_assets_allow_query_strings*
- Default: yes
Allow Wget to fetch additional URLs with query strings in phase 3 (y/n)? - extra_assets_query_strings_limit
- Default: 100000
Only fetch URLs with query strings when the total number of assets is less than this number. - extra_assets_mode
- Default: contain
How assets from extra domains should be incorporated: empty or 'off' to keep in separate directories under mirror ID; 'contain' will move the directories inside the assets directory (see separate constant). - assets_directory
- Default: webassets
Directory immediately under primary domain directory where extra assets are stored per extra domain (set empty to place assets in root). - imports_directory
- Default: imports
Directory immediately under assets_directory for storing assets imported for extra domains. - parent_dirs_mode
- Default: contain
For URLs with directories, specify what to do with assets that lie outside the mirrored directory: empty or off to keep assets where they are after the Wget mirror; contain to move the directories inside the assets directory. - external_dir_links
- Default:
Specify what to do with links to resources on same domain, but outside the mirrored tree: empty or off to not make relative, only point to the deployment domain; local to make relative, to the assets directory. - mss_cut_dirs
- Default: yes
Option to cut directories, effectively shortening the URL. Enter yes or on for a MakeStaticSite-specific cut that moves content from the directory path specified in the URL up to the root directory. When this is enabled, there is no need (and it's not recommended) to specify Wget option --cut-dirs. Leave empty or enter no or off to disable (when Wget option --cut-dirs may be used instead). - cors_enable*
- Default: yes
Enable cross-origin resources once downloaded (y/n)? - link_rel_canonical
- Default: yes
Include <link rel="canonical"...> tag in header (yes or no)? This helps search engines to index the site. - link_href_tail
- Default: /
The tail of canonical URLs and internal links, e.g. index.html or a trailing slash, /, which is assumed if left blank. - a_href_tail
- Default:
The tail for internal links, e.g. index.html or / (leave blank for /). The value should normally match link_href_tail. - robots_create
- Default: yes
Generate and overwrite robots.txt (yes or no)? Whilst a CMS may generate a virtual robots file, it might be unduly restrictive or not be a good fit for the static output. Selecting 'yes' signals the generation of a new robots.txt file. - robots_default_file
- Default: robots.txt
File name for default robots.txt (inside lib/files/). A sitemap will subsequently be appended. - sitemap_create
- Default: yes
Generate and overwrite the site map file (yes or no)? Whilst a CMS may generate a virtual site map, it might not be a good fit for the static output. Selecting 'yes' signals the generation of a new site map, which currently is constructed from a listing off all pages on the site. - sitemap_file
- Default: sitemap.xml
Name of sitemap (XML) file. - sitemap_schema
- Default:
http://www.sitemaps.org/schemas/sitemap/0.9
Site map XML schema URL. - sitemap_file_extensions
- Default: htm,html
A comma-separated list of file extensions allowed for inclusion in the sitemap file. - mod_wayback
- Default: mod_wayback.sh
Wayback Machine module filename. - wayback_cli*
- Default: no
Use a third-party client to download sites from the Wayback Machine (y/n)? If not set to 'yes', then any Wayback sites will be retrieved natively using default (Wget). - use_wayback_id*
- Default: no
When retrieving natively, capture the original page rather than the Wayback Machine's processed version (y/n)? Whilst this is more faithful to the original format for individual pages, the overall output, links and navigation, etc. is more likely to be fragmented. - wayback_hosts
- Default:
web.archive.org,www.webarchive.org.uk
Comma-separated list of domains where a Wayback Machine is hosted. - wayback_memento_check*
- Default: no
Perform dynamic check for Memento site using HTTP request header (y/n)? - wayback_header
- Default: Memento-Datetime:
The search string that will be used in the HTTP header request to identify support for Memento URLs. - wayback_mementos_only*
- Default: yes
Only download assets with Memento URLs (y/n)? This resets page_element_domains to be empty, keeping the capture strictly to the Wayback Machine. - wayback_assets_mode
- Default: original
How to incorporate assets downloaded during phase 3: off to take no action, not use any; original to recreate original layout as far as possible (timestamps removed); timestamp to leave and reference assets in Wayback Machine timestamped folders. - wayback_timestamp_policy
- Default: any
Timestamp policy: exact to only download and refer to assets with exact timestamp; range to download subject to specified date range (see below). - wayback_date_from_earliest
- Default:
Earliest date timestamp (YYYYMMDDhhmmss) for Wayback Machine snapshot files. - wayback_date_to_latest
- Default:
Latest date timestamp (YYYYMMDDhhmmss) for Wayback Machine snapshot files. - wayback_snapshot_path_depth
- Default: 3
The number of directories to traverse to get to the original domain directory (a magic number, default set for Internet Archive, until a suitable algorithm is determined). - wayback_search_regex
- Default:
"href[[:space:]]*=[[:space:]]*[\'\"]\?[^#:>\'\"/][^:>]\+[[:space:]]*[\'\"]\?[[:space:]]*>"
Basic regular expression for matching the href attribute in an anchor. - wayback_matchtype
- Default: prefix
Wayback Machine CDX server match type: domain will return all results from host domain and all its subdomains; host will return results from host domain, but no other domains; exact will return results matching URL exactly; and 'prefix' will return results for all results under a URL path. Currently, the only options supported are prefix (the default) or exact. - wayback_domain_original*
- Default: yes
Restore original domain folder when generating a mirror of site archived by the Wayback Machine (y/n)? This is derived from the second URL in the Memento URL, which is generally the URL that was originally captured by the Wayback Machine. - wayback_domain_original_sitemap*
- Default: yes
Restore original URLs when generating the sitemap for a site archived by the Wayback Machine (y/n)? - wayback_newsfeed_clean*
- Default: yes
Delete references to Wayback Machine host for newsfeeds (y/n)? - wayback_code_clean*
- Default: yes
Delete (JavaScript) Playback code inserted by Wayback Machine (y/n)? Options: no to keep as is; yes to restore the original link; otherwise convert to a relative link. - wayback_code_re
- Default: regular expression
Regular expression to match code inserted by the Wayback Machine. - wayback_folders_clean*
- Default: yes
Delete supporting directories created by the Wayback Machine that appear in the mirror (y/n)? - wayback_folders
- Default: _static
Comma-separated list of Wayback Machine directory names that may appear in the mirror. - wayback_comments_clean*
- Default: yes
Delete comments inserted by Wayback Machine (y/n)? - wayback_comments_re
- Default: regular expression
Regular expression to match comments appendeded by the Wayback Machine. - wayback_links_clean
- Default: no
Strip Wayback Machine prefixes from link URLs to restore the original links in web pages (y/n)? - wayback_machine_downloader_url
- Default:
https://github.com/hartator/wayback-machine-downloader
URL of Hartator's Wayback Machine Downloader GitHub repository. - wayback_machine_downloader_cmd
- Default: wayback_machine_downloader
[Path to] binary for the Wayback Machine downloader. - wayback_machine_only
- Default:
Restrict downloading to URLs that match this filter (enclose in slashes // to treat as a regex and place in quotes). For example, to include only HTML files with .html extension use: "/.*\.html/" - wayback_machine_excludes
- Default:
Skip downloading of URLs that match this filter (enclose in slashes // to treat as a regex and place in quotes). For example, to exclude ASP files use: "/.*\.asp.*/" - wayback_machine_statuscodes
- Default:
Accepted status codes. The default is 200 — OK. Enter all for 30x (redirections), 40x (not found, forbidden) and 50x (server error). - wget_reject_clause
- Default: *login*,*logout*
For connections that require a login, wget is run with a --reject parameter to avoid logouts. - mod_wp
- Default: mod_wp.sh
Filename of the WordPress module, as stored in the lib/ directory. - wp_cli_install
- Default: https://wp-cli.org/#installing
The URL of where to install WP-CLI. - wp_permalinks_postname
- Default: yes
The permalinks structure has a key bearing on the output. This setting will force it to make use of the post name rather than post ID or dates. - wp_search_plugin
- Default:
https://makestaticsite.sh/download/contrib/wp-static-search-1-1-1.zip
The URL of a (temporary) version of the WP Static Search plugin tweaked to work offline. - wp_search_dir
- Default: wp-static-search
Directory name of search plugin. Within the standard WordPress layout, a directory of this name will be created under the wp-plugins/ directory. - wp_remove_query_strings
- Default: yes
Remove query strings from WordPress core URLs. - wp_remove_shortlink
- Default: yes
Remove WordPress shortlinks. - wp_disable_embeds
- Default: yes
Disable embeds in WordPress. - wp_disable_xmlrpc
- Default: yes
Disable support for XML-RPC in WordPress. - wp_remove_wlwmanifest_link
- Default: yes
Remove Windows Live Writer <link> tag from header. - wp_remove_rest_api_links
- Default: yes
Remove support for REST API in WordPress. - wp_remove_rsd_link
-
Default: yes
Remove Really Simple Discovery (RSD) tag in WordPress. - htmltidy_cmd
- Default: tidy
The command to invoke HTML Tidy, which is usually tidy. - htmltidy_options
- Default: -m -q -indent --indent-spaces 2
--show-filename yes --tidy-mark no
Command line options for HTML Tidy. Errors will be collated in a single file in the MakeStaticSite root folder - htmltidy_errors_file
- Default: errors_htmltidy.txt
The error reporting generated by HTML Tidy will be saved in this file. - htmltidy_source_extensions
- Default: "htm,html"
List of web document file extensions intended for HTML Tidy. - ink_error
- Default: red
(Similarly ink_warning (amber), ink_ok (green), ink_info (lime).) Ink colours supported on all displays, using standard labels: black, red, green, yellow, blue, magenta, cyan, and white. A few additional colours that need 256-colour support, with custom labels: amber, lime, paleblue. - clean_query_extensions*
- Default: no
Remove query strings from filenames (yes/no). - system_files_cleanup
- Default: Thumbs.db,.DS_Store
List of unwanted system files, to be removed from mirror output. - web_print_runtime_data*
- Default: no
Append MakeStaticSite runtime session data summary to web pages (yes/no)? - timezone
- Default: local
Timestamps are used for marking the creation of .cfg files and for mirror directories. There are three options: local (local time), utc (UTC time, with no local adjustment), and utclocal (local time specified in relation to UTC). - output_level
- Default: quiet
This determines the level of reporting to the terminal when running makestaticsite.sh. There are four options with increasing levels of output: silent, quiet, normal and verbose. The setting for output_level tends to be quieter than that for logs (see following entry). - log_level
- Default: normal
This determines the level of logging to file when running makestaticsite.sh. There are four options with increasing levels of output: silent, quiet, normal and verbose. The setting for log_level tends to be more verbose than that for terminal output (see previous entry). - log_filename
- Default: makestaticsite.log
The file name for logs. A single file stores all logged activity; separate processes (manual or automated) can carry out log rotation, as required. - trap_errors
- Default: no
Trap errors with immediate script termination (yes/no). This is used to support debugging during development. It stops the script if any command [in a pipeline] fails, if a variable is unset, or an exit code indicates failure, i.e. is nonzero. It then reports the system error. - run_unattended
- Default: no
In a few instances, makestaticsite.sh may prompt the user with a warning message and then ask whether or not to continue; for example, after encountering an error code on running wget or when it is about to write data to a non-empty directory. If run_unattended is set to yes, it will be generally assumed that the choice is made to always continue, without manual intervention. - extras_dir
- Default: extras
This is the name of the directory containing any files — in nested folders relative to the site's web root — that should be added after the mirror has been generated. - force_ssl
- Default: yes
Convert anchors to deployment domain to https (yes/no). The name of this constant deliberately echoes the use in WordPress. - force_domains
- Default: yes
Automatically replace occurrences of the source domain with the deployment domain (yes/no). If set to 'no', then a prompt will be issued at runtime reporting on the number of matches found. - domain_match_prefix
- Default: //
Domain prefix for matches (in sed). - domain_subs_prefix
- Default: //
Domain prefix for substitutions (in sed). - rsync_options
- Default: (-a -z -h)
Core rsync options (excludes the output level). -a archive mode preserves permissions, ownership, and modification times, etc.; -z compression during transfer; -h outputs numbers in human-readable format
It's recommended that other options are left as they are.