Home | Software | Rules and About | Atom feed
Admin | Edit
published: Thursday 5 September 2019
modified: Thursday 5 September 2019
author: Hales
markup: html

Minisleep

Minisleep - a tiny wiki engine done right.



Minisleep is genuinely small (<1000LOC), has an (optional) graphical page editor with drag-and-drop image support, statically compiles pages to html files, needs almost zero dependencies, is simple to move from server to server and is secure to external attack.

Minisleep is designed to avoid many of the pitfalls encountered by the author when setting up and using other wiki engines:
  • Lists of perl/php/python dependencies.  A wiki should not need 100's of MiB of code to display a few pages.  A few KiB is enough.
  • Complex default page templates.  Templates should be easy to change: simple HTML and CSS with no knowledge of the dark arts required.
  • High time & effort demands for maintenance, especially for fixing breakages after updates.  A good wiki should help you save time, not take your time away.  There is a reason you are not using Sharepoint.
  • Complex security models.  For better or for worse: many people do not know how to (or have the time to) implement and maintain full and proper security on their wikis, so many end up being spammed.  The simpler the security model is the easier it is to prevent this.

Intro

Minisleep is written in POSIX shell.  This is the lowest common denominator across most webhosts, so it should "just work" out of the box for most users.

If you want to try Minisleep then see the 5 minute quickstart section.  This software is designed to be run from wherever it is extracted, no install process is necessary.

Minisleep can be used for almost anything:
  • Project documentation
  • Community wikis
  • Corporate front sites (intended to only be edited by select staff)
  • Personal websites & blogs
However, Minisleep does not have:
  • A user signup form to create accounts.  Accounts must instead be made by the administrator.
  • Page namespaces.  Folders may or may not do what you want instead.
  • Conflict-resolution (for when two people edit a page at the same time).  Last user wins.
  • Table creation support in the WYSIWYG editor.  In the works!
  • Any form of warranty (see the GPL).  This is free software from the internet, use at your own risk.

Downloads

2019-08-18: Version 1.10 (latest)
  • Changed the codebase from bash to POSIX sh.  No longer requires bash.
  • Renamed config (to config.ini) and removed bash-specific features.
  • Removed the (unmentioned) hard dependency on flock.  Now it's used only if available.
  • Fixed the dependencies list.  It was missing some coreutils, this may be important for people on non glibc-linux platforms.
  • Heavily reworked documentation.
  • Fixed WYSIWYG editor form losing focus when editor buttons are clicked.
  • Added HTTPAUTH_MANDATORY option to the config, to provide a layer of protection if/when webserver configurations are accidentally changed.  Otherwise such accidental changes could allow anyone to edit your Minisleep website.
  • Moved the bad-cgi-implementation CONTENT_LENGTH workaround into the main minisleep script.  (used to be in minisleep_lessgreey.cgi).
  • Lots of other small fixes & changes.
2019-05-05: Version 1.01
  • Fixed several typos in the documentation
  • Worked around variables unintentionally being substituted in the documentation pages.
2019-05-05: Version 1.00

Dependencies

  1. Linux (likely works on other *nixes, ask for help if you have problems)
  2. A POSIX-y shell (tested: dash, bash).
  3. GNU coreutils or equivalents (head, tr, cat, cut, sed, realpath, touch).   
  4. A HTTP webserver that supports CGI (lighttpd, apache, hiawatha, Yaws, etc).
Optionally: flock (from util-linux) is used if it is available.  This prevents race conditions that can cause page corruption if multiple people submit page edits at the same time.  This is unlikely to be an issue for low-editor count sites; and if you use the web editor then page revisions are automatically backed up to a ds_revisions folder regardless.

GPLv3 or later, copyright William Hales 2019.  Contact minisleep AT halestrom DOT net.

Features in detail

Several options to write pages
  • Online interface: HTML WYSIWYG with drag-and-drop image support (no separate uploading necessary).  Recommended for new users.
  • Supports any markup to HTML converter of your choice:  markdown, textile, mediawiki, bbcode, reStruturedText, uuencoded RTF, etc.  Adding your own requires adding one line of code to build_page.sh
  • Pages can be edited locally (with your favourite editor) and synced to your site using tools like git, rsync or unison. No additional complexity or config required.
  • Optional feature (disabled by default): a page can be an executable script, where any text printed becomes the page.  Useful for generating site indexes, such as front pages for blogs.
Lightweight: Almost zero dependencies
  • Common commandline utilities and a HTTP webserver that supports CGI (eg apache, lighttpd, hiawatha, yaws)
  • Ridiculously easy to setup and to move from host to host.
  • No daemon, no database.
  • Less than 1000 lines of code: designed to be understandable and fixable by non-experts.
Statically compiles pages: fast & resilient to failure
  • All scripts can be disabled or fail and the site will still stay online.  You can repair problems in your own time.
  • Likely faster page delivery than any traditional dynamic wiki software (Mediawiki, Doku, etc).
Dumb and simple security
  • HTTP authentication, handled by your webserver instead of this wiki's scripts.
  • Documentation & examples are provided for several popular webservers.
  • No notable attack surface if you keep your login credentials safe.
No database: pages stored as files and folders
  • Dead simple to administrate and backup.
  • Very easy to migrate page content to and from other wikis and systems, so you don't feel trapped.
Easy to theme
  • Comes with one very short CSS file, rather than one filled with dark magic.
  • One short script acts as the page template, intended to be edited & kept by users across updates.

Security model

Or "how can something using shell be secure?".

Minisleep is divided into two parts:
  1. Normal pages: what most users see.  Static .html files in folders making up the public side of the wiki.
  2. Editor script: one special URL used to edit pages, secured with HTTP authentication.
"HTTP authentication" is an HTTP feature that brings dialogs like these up in your web-browser:

These HTTP auth challenges are sent and handled by your HTTP webserver, not Minisleep itself.

This means Minisleep has two notable attack routes:

(1) Unauthorised users/attackers
  • Have full access to the public side of the website (static .html files).
  • Are challenged to provide a username+password (via HTTP auth) if they try to access the editor script.
  • The HTTP server will not forward any attacker requests to Minisleep's scripts until they succeed at this auth.
(2) Authorised users/attackers (ie those with valid usernames + passwords)
  • Can edit pages and insert malicious javascript/images/trackers/ads/etc. This is a problem faced by most wiki engines.
  • Can exploit bugs in the editor script to run arbitrary code on the server.
Several defences have been put in place to mitigate this last problem, however Minisleep is not guaranteed to be 100% bug free.  Your primary line of defence is to only give accounts to trusted people.

5-minute quickstart

Minisleep comes with working example server configs that you can run in-place.

Option 1: lighttpd

Lighttpd's configuration files tend to be simpler than that of Apache.

1. Install lighttpd on your computer. Eg:
sudo xbps-install lighttpd     # Void
sudo apt-get install lighttpd # Devuan, Debian, Ubuntu, Mint, etc
sudo yum install lighttpd # Fedora
2. Enter the folder 'minisleep/docs/lighttpd'

3. Try to run lighttpd with the provided config:
$ lighttpd -f lighttpd.conf -D
You may need to provide the full path of lighttpd, depending on your distro:
$ /usr/sbin/lighttpd -f lighttpd.conf -D
4. Point your web browser to http://localhost:8080/minisleep/

If you want to edit any pages: the username is 'david' and the password is 'magic'.

If you plan to use Lighttpd yourself then pay attention to:
  • Enabling server.follow-symlink
  • Inserting mod_auth and mod_cgi in the right order to avoid module-loading problems.
  • Configuring page expiry, so that browsers don't keep old copies of pages cached.

Option 2: Yaws

"Yet Another Webserver" also has a nice config file format.

1. Install yaws

2. Enter the folder 'minisleep/docs/yaws'

3. Run yaws with the provided config:
$ yaws --conf yaws.conf
4. Point your web browser to http://localhost:8080/minisleep/

If you want to edit any pages: the username is 'david' and the password is 'magic'.

If you intend to use Yaws yourself then note:
  • Yaws aggressively caches pages by default.  You may have to wait up to 30 seconds before refreshing will show changed page contents.

Option 3: Hiawatha

Hiawatha is another easy to configure webserver with some really nice features and a long history.  It's not in the Debian repos but many other distros package it. 

Unfortunately Hiawatha's future is unclear. As of 2019 with the lead author has locked the forums and wants to scale down the project.

1. Install hiawatha

2. Enter the folder 'minisleep/docs/hiawatha/'

3. Get your current path using the 'pwd' command:
$ pwd
/home/valentine/library/code/minisleep/docs/hiawatha
4. Edit hiawatha.conf to reflect this path:
set START_POINT=/home/valentine/library/code/minisleep/docs/hiawatha
5. Run hiawatha with the provided config:
$ hiawatha -c . -d
6. Point your web browser to http://localhost:8080/minisleep/

If you want to edit any pages: the username is 'david' and the password is 'magic'.

If you want to use Hiawatha yourself then note:
  • MaxRequestSize (for uploading page edits with lots of big images)
  • Enabling FollowSymlinks

Full installation procedure

(1) Obtain a HTTP webserver that supports CGI.   If you are on a shared host then one will probably have already been setup for you, otherwise I recommend you install lighttpd or apache (two of the most popular options). 

Further down this document is the section "Tip: Testing CGI" that will make your life easier.

(2) Choose two URLs for minisleep to use.  One URL for all of the normal static pages to be under and one special URL for the editor's CGI script.  Valid choices include:
http://example.com/minisleep/
http://example.com/minisleep.cgi

http://example.com/
http://example.com/cgi-bin/editor.cgi

http://example.com/bobs_barbarians/
http://example.com/cgi-bin/bruce.cgi

...etc...
Note: Many webservers only allow you to enable HTTP auth for folders, not files.  This means you may have to put the CGI file into its own special folder (eg cgi-bin/).
(3) Download and extract your copy of minisleep somewhere safe.  Do not extract it into anywhere that your HTTP server will serve (as you would with many php websites).  Instead keep it somewhere such as your home directory or /var/newfolder where other people cannot get access to it.

(4) Edit your minisleep 'config'.  Add your chosen URLs:
export URLPUBLIC='/bobs_barbarians'
export URLCGI='/cgi-bin/bobs_barbarians.cgi'
You may also want to disable HTTPAUTH_MANDATORY, just for initial testing.  Don't leave this off for anything that is web-facing.

(5) Update minisleep's pages to reflect the changes made to this config file, otherwise the links on the pages will be broken:
scripts/rebuild_all_pages.sh
(6) Add two symbolic links between your minisleep setup and your web server's WWW directory to reflect your chosen URLs.  Examples include:
# On a shared host
ln -s ~/minisleep/public ~/public_html/bobs_barbarians
ln -s ~/minisleep/scripts/minisleep.cgi ~/public_html/cgi-bin/bobs_barbarians.cgi

# On my own server
ln -s ~/minisleep/public /var/www/html/bobs_barbarians
ln -s ~/minisleep/scripts/minisleep.cgi /var/www/html/cgi-bin/bobs_barbarians.cgi
(7) Configure your webserver to allow following symlinks.  Some disable this by default.

At this point your install of minisleep should be working.  Try it out in your browser.

(8) Enable HTTP auth for the editor URL (so that people need a username+password to edit pages). 

For apache and many shared hosts: you can enable this feature using a .htaccess and a .htpasswd file.  See the documentation of your webserver/host for more details.

Example (working) configurations for several webservers are included in the docs/ directory.

(9) Setup TLS (https) so that you can access and edit your website securely.  If you do not do this then it is possible for attackers to sniff and steal login credentials whenever you use them, especially if you are on an untrusted network (eg open wifi).

Lets Encrypt is a popular free service for obtaining HTTPS certificates and many shared hosting providers automatically set you up with a free certificate anyway.

Managing HTTP authentication credentials (users)

The most common way of managing HTTP auth credentials is to use the 'htpasswd' utility.  This tool "should" come with your HTTP server, but some distros only bundle a copy with apache.  On Debian based distros it's separated into the apache2-utils package.

Use it like so:
$ htpasswd -c myauthfile.htpasswd bobuser   # First time  usage requries '-c' to create the file
$ htpasswd myauthfile.htpasswd another
$ htpasswd myauthfile.htpasswd thirduser
Htpasswd supports some better hash types than it's default of apr1 (a variant of MD5), but make sure your webserver actually supports them before you try to use them.  I have found many webservers simply ignore what they don't understand.

Hiawatha comes with its own version of htpasswd called wigwam.

If you have troubles getting a copy of htpasswd then a shell-script imitation is provided in the docs/ directory.  It requires an openssl variant to be installed (generally true for any Linux server these days).

If all else fails: many HTTP servers also support 'plaintext' passwd files like this:
bob:bobs password in the clear
mary:turduckinator 3000
admin:password
If you are on a shared host then this may be unwise, as there's a higher chance of someone finding a way to read your files and find your passwords.  Generally speaking: avoid using plaintext passwords.

Things you do not need to do

1. Tell minisleep where it's installed.

The symlinks are enough.  Minisleep works out the rest. 

I wish more website engines did this!  Many instead require you to hardcode their locations into multiple files.

2. Setup an SQL database.

Minisleep keeps pages as files and folders.  There is little to Minisleep that isn't hierarchical, so a relational database is not really beneficial.

Tip: Testing CGI

It's worth testing CGI with a simple script before trying to get Minisleep working.

Create a text file with the following contents:
#!/bin/sh
printf 'status: 200\n'
printf 'content-type: text/html\n'
printf '\n'
echo '<b> Moo said the cow </b>'
echo '<p> If you can read this then CGI is working. </p>'
Depending on your host & setup you will need to work out where to save it and under what name.  Examples include:
~/public_html/cgi_bin/mytest.cgi  # Common path on shared hosts
/var/www/html/mytest.cgi  # Your own HTTP server.
If applicable: enable CGI for the relevant URL in your HTTP server's configuration.  Examples for some webservers are in the docs/ directory, otherwise see your particular server's official documentation.

Make the script executable:
$ chmod a+x /var/www/html/mytest.cgi
Now browse to the relevant URL in your web browser.  If everything is working then you will see:


If instead you see the sourcecode to your script, are prompted to download the script or get an error: CGI is not yet setup correctly.

Recommendation: Use https

(If you are running minisleep on your home LAN or in another controlled network then you can safely ignore this section).

Every time you login to a website you will want to make sure your connection is encrypted and secured.  If it's not then people can steal your username+password and do all sorts of naughty things to your website.  This isn't a problem unique to Minisleep, which is why the vast majority of websites now support HTTPS.

Every single HTTP server and environment has a different way of setting up TLS/SSL/HTTPs.  You also need to create or get a valid certificate -- as of the time of writing lets encrypt is a very popular free service. 

If you are on a shared host then they may be able to do this for you (and some do it automatically for free without asking).

Modifying buildpage.sh

Minisleep is split into two main scripts:
  • scripts/buildpage.sh
  • scripts/minisleep.cgi
The buildpage.sh file is intended for users to edit and keep their edits across updates of the wiki.  The minisleep.cgi file on the other hand is intended to remain unedited so that it can be easily replaced with newer versions when updating.

buildpage.sh is not very long.  You can skip all of the initial setup code in it and go right to the page rendering bits.

Adding support for your favourite markup language converter

Anything that can input a text file and output HTML will work.

In scripts/buildpage.sh:
# Any script/program/method that you want can be used to markup your pages into
# HTML. Minisleep comes with several examples included below, however they will
# probably need adjusting to meet you needs.
#
# Tips:
# - There are many different 'markdown' converters out there. If you use
# mardown then make sure to adjust the command line options below to match
# your variant.
# - 'pandoc' supports pretty much every format under the sun and is really
# convenient, but it's often in the form of a single >100MiB executable.
# - Stick to HTML if you're unsure, it requires no setup of external programs.
#
# 'script' is disabled by default, because it is suspected that many people do
# not like interfaces for executing arbitrary code on their server to exist.
case "$markup"
in
html) cp temp_pre temp_post ;;
html_bug) cp temp_pre temp_post ;;
markdown) markdown temp_pre > temp_post ;;
textile) pandoc -f textile -t html temp_pre > temp_post ;;
mediawiki) pandoc -f mediawiki -t html temp_pre > temp_post ;;
plaintext)
echo '<pre>' > temp_post
cat temp_pre | sed 's|<|\&lt;|g ; s|>|\&gt;|g ; s|'\''|\&apos\;|g ; s|"|\&quot\;|g' >> temp_post
echo '</pre>' >> temp_post
;;
#script)
# chmod u+x temp_pre
# . temp_pre > temp_post
# ;;
*) echo "Page generation error (buildpage.sh): unknown markup type '$markup'."
esac
Let's say we want to add support for reStructuredText using a program called 'rst2html' from the Debian package 'docutils-common'.  After installing this tool we can simply add the following line to the mix:
restructuredtext)  rst2html temp_pre > temp_post ;;
Done.  Try it out by specifying 'restructuredtext' as the markup for a page when editing it. 

Note: You may want to choose a more convenient-to-type name (like 'rst') instead.

Customise page appearance (aka templating) including top links

Minisleep does not use a separate template file, instead code and template are mixed together in buildpage.sh:

# ------------------------------------------------------------------------------
# -- Final Content Render
# ------------------------------------------------------------------------------

exec 1>index.html.temp
echo "
<!DOCTYPE html>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html;charset=UTF-8' />
<title> $title </title>
<link rel='stylesheet' href='${URL_CSS}' type='text/css' />
<link rel='icon' href='${URL_FAVICON}' />
<meta name='expires' content='0' />
<meta http-equiv='pragma' content='no-cache' />
<meta name='viewport' content='width=device-width, initial-scale=1.0'/>
</head>
<body>

<header>
<div class='left'>
<a HREF='${/minisleep}/'>Home</a> |
<a HREF='http://www.autofish.net/'>Somewhere Else</a> |
<a HREF='https://libraryofbabel.info/'>Deeper</a>
</div>
<div class='right'>
<a HREF='ds_revisions/'>Revisions</a> |
<a HREF='${/cgi/minisleep.cgi}?action=getcontrols&path=${PAGEPATH}'>Edit</a>
</div>
</header>

<main>
<h1> $title </h1>"

cat temp_post

echo "</main></body></html>"

# Shift the now completed page into production
# The 'mv' step is added for atomicity
mv index.html.temp index.html
rm temp_pre temp_post
rm ds_lockfile
That's it.  Nothing more.  Compare that to some default templates provided by other wikis :D

You will notice that single quotes ( ' ) are used instead of double quotes ( " ) in the HTML.  This is 100% valid HTML and makes it easier to avoid quoting problems in the script, otherwise you have to write with slashes ( \" ) everywhere.

By default a CSS file is kept in public/misc/style.css:

/* -----------------------------------------------------------------------------
* Default HTML constructs
* ---------------------------------------------------------------------------*/
body { margin: 0; font-family: Sans, Sans-Serif; }
h1, h2, h3, h4 { clear: left; }
pre
{
white-space: pre-wrap;
margin-left: 2rem;
}
img
{
height: auto;
margin: 0;
max-width: 100%;
padding: 0;
}

/* -----------------------------------------------------------------------------
* Main page components
* ---------------------------------------------------------------------------*/
main { margin: 1rem; }

header
{
background-color: #8A0000;
color: #AAAAAA;
padding: 0.3rem 0.5rem;
margin: 0;
overflow: hidden;
}
header a { color: white; }
header a:visited { color: white; }


/* -----------------------------------------------------------------------------
* Misc
* ---------------------------------------------------------------------------*/
.left { float: left; }
.right { float: right; }

#content h1:first-of-type {
clear: none;
margin-top: 0;
}

This CSS is intentionally short and bereft of magic. You can either work from it or wipe it and start from scratch, nothing will break.  The editor interface provides its own CSS, so you don't have to worry about harming the editor.


Background and discussion

Why was Minisleep written?

Many years ago I wanted to start my own personal website and I thought a wiki backend would be perfect.  I had spent a lot of time writing for a game project's Mediawiki site and I had grown to love the wiki-style markup & features.

My adventures setting up my own wiki didn't go well.  After trying several I felt some usability problem themes emerging:
  • Second class and limited markup support: a prime example is not being able to have multiple lines of text in a table cell.
  • Complicated image upload procedures.  How many steps does it take to get an image online?  Per namespace or per page?  Many failed the test of "is it actually easier to use an SFTP client like filezilla?".
  • Extremely complex page templates.  IF ELSE spaghetti.
(NB I've only solved some of this in Minisleep)

In particular I found the complexity of wiki projects was their biggest drawback.  I was left feeling that "enterprise suitable" means "lots of space, time and effort required".

Some examples:
  • Mediawiki is a behemoth, with lots to do and go wrong during the setup process.  Great for big projects with spare hands, much more difficult to use for one person's personal site.
  • Tiddlywiki has some cool concepts, but it uses lots of javascript and gets very slow for larger sites.
  • ikiwiki looked absolutely perfect, but when I tried to set it up on my cheap shared host I had to fetch hundreds of megabytes of perl dependencies.  It took me a few attempts to get it right and it broke for me when the host updated.

Counter-depictions of Minisleep are encouraged, and if they're particularly nice I'll feature them here.


Generally speaking: I wanted something with similar features but less effort.  Life is to short to be spent dusting and oiling software.

Several years back I wrote my own backend called Darksleep that I use for my own personal site.  Originally I had to SSH in or remotely sync files to make changes, but eventually I added an online interface for page creation and editing, slowly morphing it into something more like a wiki.

Minisleep is a rewritten version of Darksleep with many fixes and changes based off what I have learned from operating and running Darksleep.  Notably the commenting and submission throttling features have been strippled, but I plan to add these back at some point as optional features.

Why is Minisleep written in shell instead of (eg) python, rust or go?

Shell scripts are one of the lowest common denominators across all webhosts.  In particular: bash is popular enough that you're almost guaranteed to have it already; and if not then it's easy to get without needing special tools or an ore-train of dependencies.

Compiling executables to run on some hosts can be a pain.  My current shared host bails any attempt at running (even static!) executables that I've compiled on other systems (such as Debian stable) and I'm not sure why.  The hosting provider only permits access to their toolchain temporarily and only upon request.

My experiences wrangling dependencies and runtime requirements for website backends drove me mad.  A good tool does a lot with a little, not the other way around.

Why CGI?

Context: CGI lets your scripts talk to a webserver, so site pages can be dynamic (made on the fly).  There are infinite ways of doing this these days, with CGI now considered by many to be old or slow.

Here's a copy of an extensive answer to this question I wrote up on Lobste.rs a while back:

A comment about CGI in general: it’s absolutely beautiful.

When I wrote my own site backend a few years back I had no knowledge about the world of interfacing webservers with code. I discovered that there were many, many methods and protocols that each webserver only seemed to support a subset of. And many people telling me that CGI was old and bad and that I shouldn’t use it.

I had a nightmare getting non-CGI things to work. I had no experience and background here, so not much of it made sense. I had presumed ‘FCGI’ was a “fixed” version of CGI, but didn’t succeed at getting it to work after following a few guides and trying a few different webservers. I gave up.

I decided I should do the opposite of modern advice. I tried CGI. And I was immediately hooked by its simplicity.

  • No dependencies or libraries
  • Works with any programming language that can print and read text
  • Supported by “most” webservers out there (cough nginx)

For those not in the know, a fully working CGI script is as simple as this:

#!/bin/sh
printf 'content-type: text/html\n\n'
printf '<strong>Greetings Traveller</strong>'
printf "<p>The date and time are $(date)</p>"

The webserver itself handles all the difficult bits of the HTTP headers. You just need to provide a content-type and then the page itself. Done.

If you want to provide more (cookies, etc) you can; it’s just one more ‘printf’ line and you’re done. No libraries, no functions, no complexity. You don’t even have to parse strange constructs. Just print.

If you want to look at URL strings (eg for GET) you just need to be able to access the environment variable QUERY_STRING. If you want to access body data (eg for POST) you just need to read input. Just as if someone was sitting there typing into stdin of your program.

It does get ugly for complex or multipart POSTs. That’s where a library or program can help. But you only need to attack that once you get there.

Compare this to every other method of talking to a webserver out there:

  • No dependencies or libraries.
  • Works with any programming language ever that can read text and print text.
  • Zero external config other than telling your webserver to enable CGI on your file

A related story of teaching

A few months back I was helping some students with their website project. They were new to web development and had been recommended to use flask, a python library that acts as a webserver and webserver interface all in one. They were having extreme difficultly wrapping their heads around many concepts. Notably:

  • Templates
  • Serving of static files (like stylesheets)
  • Cookie handling
  • Mapping of URLs to files, functions and directories.

Many of their problems stemmed from them not knowing how HTTP worked in the first place, so I was teaching them this. What made this process horrible was then also trying to find out how and then explain how flask abstracts these concepts into its own processes and functions. I could understand how to beginners like them it seemed completely opaque.

They thought pages were unreadable objects generated by the templating code, and that the templates themselves were sent to the user’s browser along with the page. They thought cookies were handled and stored by the webserver as well as the client. The way flask’s functions worked and the examples they followed suggested this to them.

If I’m ever in the situation again of helping new people learn web technology then I’m going to get or convert them to use CGI right off the bat. It’s easier to teach, it’s easier to understand, easier to get working on most webservers and isn’t locked in to any particular language or framework.

The only downside of CGI that I know about is the fact it starts a new process to handle each user request. Yes that’s a problem in big sites handling hundreds or thousands of visitors per second. But by the time a new student gets to running a big site they will have already encountered many, many other scalability issues in their code and backend/storage. Let alone teaching them database and security concepts. There’s a reason we have quotes like “premature optimisation is the root of all evil”.

I don’t think students new to webdev should be started on anything other than CGI. They can use any language they want. They can actually understand what they’re doing. And they’re not hitting any artificial barriers or limits set by frameworks or libraries.

Final notes

The whole idea that “CGI should be dead” makes little sense from my context and point of view. I run my own site, help maintain a few others and try to assist others in learning and coping with webdev.

I think the “CGI should be dead” makes sense only in the context of very high workload sites. Whilst these handle a large percentage of the web’s total traffic, the percentage of people actually running these sites is small. Different units: traffic of visitors vs people running sites. I think we confuse them.

It’s too easy to get caught up in “professional syndrome”, where you look up to the big players and trust in their opinions. But you also need to understand that their opinions are based on their current experiences, which are often a world away from what the rest of us should be worrying about.

If a captain of a battleship says that cannons are his biggest problem then you shouldn’t try to learn about and use cannons to build your first ship. You should then realise only a tiny fraction of ships need them, even the really big ones.

CGI Problem: Inconsistency (and silliness) with stdin

When a user accesses your CGI script they sent a HTTP message using their browser.  This message is split into two parts: header and body. 

In CGI you:
  • Access the header through environment variables (eg $QUERY_STRING)
  • Access the body by reading standard input (stdin).
Unfortunately different HTTP servers seem to disagree on exactly how to treat over-reading and under-reading of stdin.  There are approximately three categories:

(1) Sensible & forgiving.  Examples: apache, lighttpd

If you read too little stdin: no one minds
If you read too much stdin: no one minds, the read fails.

(2) Pushy.  Examples: (some shared hosts?)

If you read too little stdin: the user is redirected to an error page.

This is annoying, but perhaps understandable.  If all of stdin has not been read then perhaps your script crashed early.  An easy workaround is to add something like "cat > /dev/null" to the end of your script.

(3) Confused.  Examples: hiawatha, yaws

If you read too much stdin: your read calls hang forever.

This does not make sense to me.  Why hang the read?  The server knows there is nothing more to provide, making the script hang forever seems impolite.  Worst of all the webserver then kills the script for taking too long.  The only way out of this prison is to be a guard!

Minisleep's workaround

Minisleep spawns a second copy of itself, but with stdin sent over a controlled and well-behaved pipe.
if [ ! -z "${CONTENT_LENGTH:-}" ]
then
temp_conlen=$CONTENT_LENGTH
unset CONTENT_LENGTH
head --bytes "$temp_conlen" | scripts/minisleep.cgi
exit $?
fi
From a technical point of view: this solution is a bit inefficient.  But hey, we're already using shell.

From a social standpoint: the problem itself is stupid.  It turns CGI, an otherwise great interface for learning, into something with dark traps.  Students and learners don't have a hope of finding out what is going wrong here unless they already understand read calls and unix pipes.

Sed vs bash's in-built pattern substitution

Minisleep depends on 'sed' to HTML encode pages for the in-built editor (Sidenote: there are some curious and unexpected gotchas around this like UTF-7).  I was curious to see if bash's pattern substitution (not a POSIX feature) could compete with sed, however it looks to be many orders of magnitude slower. 

Example:

htmlencode()
{
sed 's|&|&amp\;|g ; s|<|\&lt;|g ; s|>|\&gt;|g ; s|'\''|\&apos\;|g ; s|"|\&quot\;|g '
}

htmlencode2()
{
a="$(cat)"
a="${a//&/&amp;}"
a="${a//</&lt;}"
a="${a//>/&gt;}"
a="${a//\'/&apos;}"
a="${a//\"/&quot;}"
printf "%s" "$a"
}

$ time htmlencode < public/misc/docs/ds_raw > foo1

real 0m0.024s
user 0m0.014s
sys 0m0.009s

real 0m0.024s
user 0m0.015s
sys 0m0.009s

$ time htmlencode2 < public/misc/docs/ds_raw > foo2

real 0m1.755s
user 0m1.715s
sys 0m0.030s

real 0m1.759s
user 0m1.709s
sys 0m0.040s

$ diff foo*
103c103
<
---
>
\ No newline at end of file

Almost 100 times slower, taking it from almost-imperceptible to annoying :(

In practice it can still be faster,  but only if called many times over on small pieces of data.  This isn't the use case here.

Why HTTP_AUTH?

It's damned simple compared to cookie-based auth methods and it prevents unauthorised requests ever making it to the site scripts.  The less surface area you have the less you have to worry about.  See the 'Security' section above.

Isn't CGI slow?

Context: many alternative communication interfaces have been made over the years that claim to be faster.  CGI suffers the problem of needing to start a process for every user request.
  1. No.  Webapp latency is a complex topic, server interfaces are only one part of the puzzle.
  2. CGI is only used in Minisleep when you edit pages, not when you view them.
Minisleep is the fastest site engine I have ever used.  This is not because of choice of external technologies, it's because of design.

Sidenote: slowdowns in Minisleep tend to be dominated by uncached disk reads.  Notably on shared hosts the first edit will take a while to load but after that things will be snappier.  Traditional daemon-style website engines work around this by permanently hogging resources, there are ways of doing the same for Minisleep (if you are interested). 

Bugs & Plans

WYSIWYG editor:
  • Forms get destroyed when the editor is enabled.
  • Dragging and dropping some types of files onto the editor (such as videos) embeds them in a broken manner.  
  • No table editing support (beyond copying+pasting them in and editing them from there).  This used to exist in Firefox and was really nice, but it is now gone.
  • No image resizing support.  This used to exist in Firefox and was really nice, but it is now gone.
  • Clicking on a formatting button removes focus from the typing area.
  • Needs key shortcuts.  Ctrl+B for bold, etc.
  • Needs more features (text colour, boxes, floats, tables, etc)
  • 'Unformat' button only removes some types of formatting (not <pre>, <ul>, etc).
  • Makes horribly ugly HTML code.  Optional htmltidy (or similar) support?
  • Editor is not at the same path as the pages themselves, so relative-pathed materials (images, video) do not show in the editor.
General:
  • Documentation: info on cache control & page expiry.
  • Feature: Page deletion (move deleted folders to a safe place).
  • Feature: User management (add, delete, suspend, etc)
  • Username detection will likely fail if you use a HTTP_AUTH type other than basic (eg digest)
  • Feature: Automatic table of contents generation.  ie a simple script pass that looks at <h2>, <h3> etc tags and prepends some extra HTML.
  • Feature: split inline images off into actual files (to make future page edits faster & easier)
  • More elegant solution to the read(stdin) hang problem of some HTTP servers (see minisleep/scripts/minisleep_lessgreedy.cgi).
  • Errant newlines are added to the end of a page every time it is edited (HTTP post quirk)
Debatable:
  • HTTP keywords (POST, GET) are probably not used correctly, PUT and some others might be worth considering.
Limitations (things that probably won't be changed or fixed):
  • Gradated access control (different user roles)
  • Namespaces - ie places where different users have different rights.  (If really needed: run another copy of minisleep?)

Contact

Please send all of your comments, suggestions, workarounds, stories, bugreports, code and complaints to:  minisleep AT halestrom DOT net.  If you read this far then please feel free to just say Hi.