WWW - Website Writing Workflow

May 30, 2023    Article    1732 words    9 mins read

Over the years I’ve tried a bunch of writing apps, and in this article I’m going to talk a bit about my writing workflow. To begin with, I write in Markdown and I’ve been writing in Markdown for a few years now. Using a plain text format makes it easy to use pretty much any application, whether on macOS, Linux or Windows, an I prefer Markdown to writing HTML by hand, but I can write HTML if I need a feature not implemented in Markdown.

My preferred Markdown editor is iA Writer for longer, distraction-free writing and if I’m writing short notes (like the so.cl section which is basically micro-blogging) I will probably use Sublime Text, since it’s always open on my computer(s), anyway.

But writing text into a text editor is the easy part, if you’re curious what happens between pressing Save in the editor and an article appearing on this website, keep reading.

TLDR

I’m using a bash script to call multiple functions (and external files) to build and deploy this website using Hugo. Also, if you’re interested more in the hardware part, make sure you read the uses page.

I have two main workflows, when I’m done writing an article I run the Normal Workflow and if I need to regenerate the website archive I will run the Archive Workflow. The Normal Workflow will build the website and deploy it on Tor and Clearnet (with a bunch of other stuff) and the Archive Workflow will do the same thing plus creating a gzipped tar archive of the website (after doing a custom Hugo build for it). Each build must be done separately into different directories, because Tor requires a specific base URL (the Tor onion address), the Clearnet website requires another base URL (sizeof.cat plus http(s) prefix) while the archiving part depends on relative URLs.

Basically it’s just a matter of executing ./site.sh full or ./site.sh archive.

Also, keep in mind that there are better ways to do what I’m doing here, for sure, but this is the way I like it. I have a few more functions inside the bash script (like encrypting articles, pulling JSON data from my terrarium with an Arduino Mega, etc) but those listed below are just the basic ones.

Let’s begin!

We’ll need to set action to the first parameter that gets passed to the bash script, set dir to the directory where the Hugo website resides, and set today to the current date in ISO-8601 format (2023-08-18).

action=$1
dir="$HOME/Projects/website"
today=$(gdate --iso-8601)

If you wonder why the g prefix, it’s simple: macOS coreutils are a mess. So do yourself a favor, install Homebrew, run brew install coreutils and you will get the GNU coreutils prefixed with the g letter.


Normal workflow

This workflow is for quickly posting something to the website.

$ ./site.sh full
[ Cleaning up ]
[ Building content for WWW ]
[ Building content for Tor ]
[ Generating statistics ]
[ Minifying files ]
[ Deploying files to WWW ]
[ Deploying files to Tor ]
[ DONE ]

Cleaning up

This stage removes the public folder in case it still exists, creates the required directory structure and removes the potential .DS_Store files created by macOS (yuck).

function begin_clean () {
	echo -e "[ Cleaning up ]"
	rm -rf "$dir/public"
	mkdir -p "$dir/public/{www,tor,local}"
	find $dir -name ".DS_Store" -type f -delete
}

Building content for WWW

This is the function that uses Hugo to build the WWW website (the Clearnet one). The configuration is taken from the config.toml file inside the config directory.

function build_www () {
	echo -e "[ Building content for WWW ]"
	hugo --quiet -s $dir --config "$dir/config/config.toml"
}

Building content for Tor

The part that builds the Tor mirror website using Hugo. The configuration is taken from the config.toml file inside the config directory and is overwriten with the config-tor.toml file from the same directory. This is done so I can overwrite baseURL (to the Tor onion address) and publishDir (to public/tor) config variables.

function build_tor () {
	echo -e "[ Building content for TOR ]"
	hugo --quiet -s $dir --config "$dir/config/config.toml,$dir/config/config-tor.toml"
}

The config-tor.toml file is very simple:

baseURL = "http://sizeofaex6zgovemvemn2g3jfmgujievmxxxbcgnbrnmgcjcjpiiprqd.onion/"
publishDir = 'public/tor'

Generating statistics

This is where bash scripting gets dizzy. Basically I want to replace some strings inside the statistics and website archive pages, for both languages (English and Catalan).

function stats_full () {
	echo -e "[ Generating statistics ]"
	local num_images=$(gfind $dir/public/www -mindepth 1 -type f \( -iname \*.png -o -iname \*.jpg -o -iname \*.gif \) -printf x | wc -c)
	local num_html=$(gfind $dir/public/www -mindepth 1 -type f -name "*.html" -printf x | wc -c)
	local num_files=$(gfind $dir/public/www -mindepth 1 -type f -name "*.*" -printf x | wc -c)
	local archive_raw_date=$(echo $dir/archive/sizeofcat-website-archive-*.tar.gz | grep -Eo '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}')
	local archive_date=$(gdate -d "$archive_raw_date" '+%B %d, %Y')
	local archive_sha=$(shasum -a 256 "$dir/archive/sizeofcat-website-archive-$archive_raw_date.tar.gz" | awk '{print $1}')
	local archive_size=$(gstat -c '%s' "$dir/archive/sizeofcat-website-archive-$archive_raw_date.tar.gz" | numfmt --to=si --suffix=B)

	sed -i '' -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" $dir/public/tor/statistics/index.html $dir/public/www/statistics/index.html $dir/public/tor/ca/estadístiques/index.html $dir/public/www/ca/estadístiques/index.html
	sed -i '' -e "s/ARCHIVESHA/$archive_sha/g" -e "s/ARCHIVEDATE/$archive_date/g" -e "s/ARCHIVERAWDATE/$archive_raw_date/g" -e "s/ARCHIVESIZE/$archive_size/g" $dir/public/tor/project/website-archive/index.html
}

Minifying files

Next stage is using Taco de Wolff’s minify tool to recursively minify (and not obfuscate) all HTML, XML, JavaScript and CSS files.

function do_minify () {
	echo -e "[ Minifying files ]"
	minify -o "$dir/public/" --recursive --match=\.*ml "$dir/public/"
	minify -o "$dir/public/" --recursive --match=\.*css "$dir/public/"
	minify -o "$dir/public/" --recursive --match=\.*js "$dir/public/"
}

Deploying files

Actually, there are two functions here, deploy_www and deploy_tor but they are identical, the only difference is the source and remote directories. So, I’m using rsync to remotely upload the files, deleting all files on destination that don’t exist in source and using a protect filter for the project/website-archive/files/ directory. Why, you might ask. It’s simple, I can call the script using the full parameter, and in this case the sizeofcat-website-archive-X.tar.gz file won’t exist in the source, and I don’t want it deleted on destination (because of the rsync --delete flag).

function deploy_www () {
	echo -e "[ Deploying files to WWW ]"
	rsync --stats -amch --delete --delete-after --exclude=.DS_Store --filter='protect project/website-archive/files/*' "$dir/public/www/" user@host:/var/www
}

Done

This is really simple, the function just removes the public directory since we’re done with it.

function end_clean () {
	rm -rf "$dir/public"
	echo -e "[ DONE ]"
}

Summary

The order of the functions is this:

if [ $action = "full" ]
then
	begin_clean
	build_www
	build_tor
	stats_full
	do_minify
	deploy_www
	deploy_tor
	end_clean
fi

Archiving workflow

I use the archiving workflow when I need/want to generate a new website archive. In addition to the normal workflow, the website needs to be generated with relative URLs (so it can be browsed while offline), so Hugo will do that in the public/local directory.

$ ./site.sh archive
[ Cleaning up ]
[ Building local content ]
[ Building content for WWW ]
[ Building content for Tor ]
[ Generating local statistics ]
[ Archiving local site ]
[ Copying site archive ]
[ Generating statistics ]
[ Minifying files ]
[ Deploying files to WWW ]
[ Deploying files to Tor ]
[ DONE ]

I’ll only show you the new functions in this workflow.

Building local content

Similar to the part where Hugo is building the Tor website, the configuration is taken from the config.toml file inside the config directory and is overwriten with the config-local.toml file from the same directory. This is done so I can overwrite relativeURLs (to true) and publishDir (to public/local) config variables.

function build_local () {
	echo -e "[ Building local content ]"
	hugo --quiet -s $dir --config "$dir/config/config.toml,$dir/config/config-local.toml"
}

The contents of the config-local.toml file is this:

relativeURLs = true
publishDir = 'public/local'

Generating local statistics for WWW

Very similar to stats_full, this function generates statistics for the local files (the ones that will get archived) in both languages.

function stats_local () {
	echo -e "[ Generating local statistics ]"
	local num_images=$(gfind $dir/public/tor -mindepth 1 -type f \( -iname \*.png -o -iname \*.jpg -o -iname \*.gif \) -printf x | wc -c)
	local num_html=$(gfind $dir/public/tor -mindepth 1 -type f -name "*.html" -printf x | wc -c)
	local num_files=$(gfind $dir/public/tor -mindepth 1 -type f -name "*.*" -printf x | wc -c)
	sed -i '' -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" -e "s/NUMIMAGES/$num_images/g" -e "s/NUMHTMLS/$num_html/g" -e "s/NUMFILES/$num_files/g" $dir/public/local/statistics/index.html $dir/public/local/ca/estadístiques/index.html
}

Archiving local site

Start by removing the older archive(s), use tar to create the new archive and exclude some extensions from the final archive (for space-saving reasons). When the processing is complete, remove the public/local directory since we’re done with it.

function archive_local () {
	echo -e "[ Archiving local site ]"
	rm -rf $dir/archive/*.*
	tar --exclude='*.pdf' --exclude='*.zip' --exclude='*.tar.gz' -czf $dir/archive/sizeofcat-website-archive-$today.tar.gz -C $dir/public/local .
	rm -rf "$dir/public/local"
}

Copying site archive

The only thing remaining is to copy the archive into the correct directories (for both Tor and WWW) so rsync can do its magic.

function deploy_archive_local () {
	echo -e "[ Copying site archive ]"
	mkdir "$dir/public/www/project/website-archive/files/"
	cp "$dir/archive/sizeofcat-website-archive-$today.tar.gz" "$dir/public/www/project/website-archive/files/sizeofcat-website-archive-$today.tar.gz"
	mkdir "$dir/public/tor/project/website-archive/files/"
	cp "$dir/archive/sizeofcat-website-archive-$today.tar.gz" "$dir/public/tor/project/website-archive/files/sizeofcat-website-archive-$today.tar.gz"
}

Summary

The correct order of the functions in the bash script file is this:

if [ $action = "archive" ]
then
	begin_clean
	build_local
	build_www
	build_tor
	stats_local
	archive_local
	deploy_archive_local
	stats_full
	do_minify
	deploy_www
	deploy_tor
	end_clean
fi

Changes

One can easily modify any of the workflows to add additional deployments, for example to I2P, IPFS or git. You only need a build function and a deploy function.

In the building stage you just instruct Hugo to build the website into a new public/{ipfs,i2p,etc} directory based on a custom configuration file inside config/config-{ipfs,i2p,etc}.toml.

function build_ipfs () {
	echo -e "[ Building content for IPFS ]"
	hugo --quiet -s $dir --config "$dir/config/config.toml,$dir/config/config-ipfs.toml"
}

While an IPFS deploy should look like this:

function deploy_ipfs () {
	echo -e "[ Deploying files to ipfs ]"
	local hash=$(ipfs add -r -q "$dir/public/ipfs/" | tail -n 1)
	local pin=$(ipfs pin add $hash)
	local ipns=$(ipfs name publish $hash)
	echo -e "[ URL: https://ipfs.io/ipfs/$hash ]"
	echo -e "[ IPNS: $ipns ]"
}

A git deploy would look similar to this:

function deploy_git () {
	echo -e "[ Deploying files to git ]"
	cd "$dir/public/www"
	git add -A
	git commit -S -m "Site rebuilt."
	git push origin master
	cd $dir
}

As you can see, the whole script is quite extensible.

I won’t provide a full bash script to download and I’ll leave that part as an exercise to the reader.