WWW - Website Writing Workflow
Over the years I’ve tried a bunch of writing apps, and in this article I’m going to talk a bit about my writing workflow. To begin with, I write in Markdown and I’ve been writing in Markdown for a few years now. Using a plain text format makes it easy to use pretty much any application, whether on macOS, Linux or Windows, an I prefer Markdown to writing HTML by hand, but I can write HTML if I need a feature not implemented in Markdown.
My preferred Markdown editor is iA Writer for longer, distraction-free writing and if I’m writing short notes (like the so.cl section which is basically micro-blogging) I will probably use Sublime Text, since it’s always open on my computer(s), anyway.
But writing text into a text editor is the easy part, if you’re curious what happens between pressing Save in the editor and an article appearing on this website, keep reading.
TLDR
I have two main workflows, when I’m done writing an article I run the Normal Workflow and if I need to regenerate the website archive I will run the Archive Workflow. The Normal Workflow will build the website and deploy it on Tor and Clearnet (with a bunch of other stuff) and the Archive Workflow will do the same thing plus creating a gzipped tar archive of the website (after doing a custom Hugo build for it). Each build must be done separately into different directories, because Tor requires a specific base URL (the Tor onion address), the Clearnet website requires another base URL (sizeof.cat plus http(s) prefix) while the archiving part depends on relative URLs.
Basically it’s just a matter of executing ./site.sh full
or ./site.sh archive
.
Also, keep in mind that there are better ways to do what I’m doing here, for sure, but this is the way I like it. I have a few more functions inside the bash script (like encrypting articles, pulling JSON data from my terrarium with an Arduino Mega, etc) but those listed below are just the basic ones.
Let’s begin!
We’ll need to set action
to the first parameter that gets passed to the bash script, set dir
to the directory where the Hugo website resides, and set today
to the current date in ISO-8601 format (2023-08-18).
action=$1
dir="$HOME/Projects/website"
today=$(gdate --iso-8601)
If you wonder why the g prefix, it’s simple: macOS coreutils are a mess. So do yourself a favor, install Homebrew, run brew install coreutils
and you will get the GNU coreutils prefixed with the g letter.
Normal workflow
This workflow is for quickly posting something to the website.
$ ./site.sh full
[ Cleaning up ]
[ Building content for WWW ]
[ Building content for Tor ]
[ Generating statistics ]
[ Minifying files ]
[ Deploying files to WWW ]
[ Deploying files to Tor ]
[ DONE ]
Cleaning up
This stage removes the public
folder in case it still exists, creates the required directory structure and removes the potential .DS_Store
files created by macOS (yuck).
function begin_clean () {
echo -e "[ Cleaning up ]"
rm -rf "$dir/public"
mkdir -p "$dir/public/{www,tor,local}"
find $dir -name ".DS_Store" -type f -delete
}
Building content for WWW
This is the function that uses Hugo to build the WWW website (the Clearnet one). The configuration is taken from the config.toml
file inside the config
directory.
function build_www () {
echo -e "[ Building content for WWW ]"
hugo --quiet -s $dir --config "$dir/config/config.toml"
}
Building content for Tor
The part that builds the Tor mirror website using Hugo. The configuration is taken from the config.toml
file inside the config
directory and is overwriten with the config-tor.toml
file from the same directory. This is done so I can overwrite baseURL
(to the Tor onion address) and publishDir
(to public/tor
) config variables.
function build_tor () {
echo -e "[ Building content for TOR ]"
hugo --quiet -s $dir --config "$dir/config/config.toml,$dir/config/config-tor.toml"
}
The config-tor.toml
file is very simple:
baseURL = "http://sizeofaex6zgovemvemn2g3jfmgujievmxxxbcgnbrnmgcjcjpiiprqd.onion/"
publishDir = 'public/tor'
Generating statistics
This is where bash scripting gets dizzy. Basically I want to replace some strings inside the statistics and website archive pages, for both languages (English and Catalan).
function stats_full () {
echo -e "[ Generating statistics ]"
local num_images=$(gfind $dir/public/www -mindepth 1 -type f \( -iname \*.png -o -iname \*.jpg -o -iname \*.gif \) -printf x | wc -c)
local num_html=$(gfind $dir/public/www -mindepth 1 -type f -name "*.html" -printf x | wc -c)
local num_files=$(gfind $dir/public/www -mindepth 1 -type f -name "*.*" -printf x | wc -c)
local archive_raw_date=$(echo $dir/archive/sizeofcat-website-archive-*.tar.gz | grep -Eo '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}')
local archive_date=$(gdate -d "$archive_raw_date" '+%B %d, %Y')
local archive_sha=$(shasum -a 256 "$dir/archive/sizeofcat-website-archive-$archive_raw_date.tar.gz" | awk '{print $1}')
local archive_size=$(gstat -c '%s' "$dir/archive/sizeofcat-website-archive-$archive_raw_date.tar.gz" | numfmt --to=si --suffix=B)
sed -i '' -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" $dir/public/tor/statistics/index.html $dir/public/www/statistics/index.html $dir/public/tor/ca/estadístiques/index.html $dir/public/www/ca/estadístiques/index.html
sed -i '' -e "s/ARCHIVESHA/$archive_sha/g" -e "s/ARCHIVEDATE/$archive_date/g" -e "s/ARCHIVERAWDATE/$archive_raw_date/g" -e "s/ARCHIVESIZE/$archive_size/g" $dir/public/tor/project/website-archive/index.html
}
Minifying files
Next stage is using Taco de Wolff’s minify tool to recursively minify (and not obfuscate) all HTML, XML, JavaScript and CSS files.
function do_minify () {
echo -e "[ Minifying files ]"
minify -o "$dir/public/" --recursive --match=\.*ml "$dir/public/"
minify -o "$dir/public/" --recursive --match=\.*css "$dir/public/"
minify -o "$dir/public/" --recursive --match=\.*js "$dir/public/"
}
Deploying files
Actually, there are two functions here, deploy_www
and deploy_tor
but they are identical, the only difference is the source and remote directories. So, I’m using rsync to remotely upload the files, deleting all files on destination that don’t exist in source and using a protect filter for the project/website-archive/files/
directory. Why, you might ask. It’s simple, I can call the script using the full
parameter, and in this case the sizeofcat-website-archive-X.tar.gz
file won’t exist in the source, and I don’t want it deleted on destination (because of the rsync --delete
flag).
function deploy_www () {
echo -e "[ Deploying files to WWW ]"
rsync --stats -amch --delete --delete-after --exclude=.DS_Store --filter='protect project/website-archive/files/*' "$dir/public/www/" user@host:/var/www
}
Done
This is really simple, the function just removes the public
directory since we’re done with it.
function end_clean () {
rm -rf "$dir/public"
echo -e "[ DONE ]"
}
Summary
The order of the functions is this:
if [ $action = "full" ]
then
begin_clean
build_www
build_tor
stats_full
do_minify
deploy_www
deploy_tor
end_clean
fi
Archiving workflow
I use the archiving workflow when I need/want to generate a new website archive. In addition to the normal workflow, the website needs to be generated with relative URLs (so it can be browsed while offline), so Hugo will do that in the public/local
directory.
$ ./site.sh archive
[ Cleaning up ]
[ Building local content ]
[ Building content for WWW ]
[ Building content for Tor ]
[ Generating local statistics ]
[ Archiving local site ]
[ Copying site archive ]
[ Generating statistics ]
[ Minifying files ]
[ Deploying files to WWW ]
[ Deploying files to Tor ]
[ DONE ]
I’ll only show you the new functions in this workflow.
Building local content
Similar to the part where Hugo is building the Tor website, the configuration is taken from the config.toml
file inside the config
directory and is overwriten with the config-local.toml
file from the same directory. This is done so I can overwrite relativeURLs
(to true
) and publishDir
(to public/local
) config variables.
function build_local () {
echo -e "[ Building local content ]"
hugo --quiet -s $dir --config "$dir/config/config.toml,$dir/config/config-local.toml"
}
The contents of the config-local.toml
file is this:
relativeURLs = true
publishDir = 'public/local'
Generating local statistics for WWW
Very similar to stats_full
, this function generates statistics for the local files (the ones that will get archived) in both languages.
function stats_local () {
echo -e "[ Generating local statistics ]"
local num_images=$(gfind $dir/public/tor -mindepth 1 -type f \( -iname \*.png -o -iname \*.jpg -o -iname \*.gif \) -printf x | wc -c)
local num_html=$(gfind $dir/public/tor -mindepth 1 -type f -name "*.html" -printf x | wc -c)
local num_files=$(gfind $dir/public/tor -mindepth 1 -type f -name "*.*" -printf x | wc -c)
sed -i '' -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" $dir/public/local/statistics/index.html $dir/public/local/ca/estadístiques/index.html
}
Archiving local site
Start by removing the older archive(s), use tar to create the new archive and exclude some extensions from the final archive (for space-saving reasons). When the processing is complete, remove the public/local
directory since we’re done with it.
function archive_local () {
echo -e "[ Archiving local site ]"
rm -rf $dir/archive/*.*
tar --exclude='*.pdf' --exclude='*.zip' --exclude='*.tar.gz' -czf $dir/archive/sizeofcat-website-archive-$today.tar.gz -C $dir/public/local .
rm -rf "$dir/public/local"
}
Copying site archive
The only thing remaining is to copy the archive into the correct directories (for both Tor and WWW) so rsync can do its magic.
function deploy_archive_local () {
echo -e "[ Copying site archive ]"
mkdir "$dir/public/www/project/website-archive/files/"
cp "$dir/archive/sizeofcat-website-archive-$today.tar.gz" "$dir/public/www/project/website-archive/files/sizeofcat-website-archive-$today.tar.gz"
mkdir "$dir/public/tor/project/website-archive/files/"
cp "$dir/archive/sizeofcat-website-archive-$today.tar.gz" "$dir/public/tor/project/website-archive/files/sizeofcat-website-archive-$today.tar.gz"
}
Summary
The correct order of the functions in the bash script file is this:
if [ $action = "archive" ]
then
begin_clean
build_local
build_www
build_tor
stats_local
archive_local
deploy_archive_local
stats_full
do_minify
deploy_www
deploy_tor
end_clean
fi
Changes
One can easily modify any of the workflows to add additional deployments, for example to I2P, IPFS or git. You only need a build function and a deploy function.
In the building stage you just instruct Hugo to build the website into a new public/{ipfs,i2p,etc}
directory based on a custom configuration file inside config/config-{ipfs,i2p,etc}.toml
.
function build_ipfs () {
echo -e "[ Building content for IPFS ]"
hugo --quiet -s $dir --config "$dir/config/config.toml,$dir/config/config-ipfs.toml"
}
While an IPFS deploy should look like this:
function deploy_ipfs () {
echo -e "[ Deploying files to ipfs ]"
local hash=$(ipfs add -r -q "$dir/public/ipfs/" | tail -n 1)
local pin=$(ipfs pin add $hash)
local ipns=$(ipfs name publish $hash)
echo -e "[ URL: https://ipfs.io/ipfs/$hash ]"
echo -e "[ IPNS: $ipns ]"
}
A git deploy would look similar to this:
function deploy_git () {
echo -e "[ Deploying files to git ]"
cd "$dir/public/www"
git add -A
git commit -S -m "Site rebuilt."
git push origin master
cd $dir
}
As you can see, the whole script is quite extensible.
I won’t provide a full bash script to download and I’ll leave that part as an exercise to the reader.