modified: Sunday 18 August 2019
author: Hales
markup: textile
Meta: New Minisleep, site rebased
Minisleep 1.1 is out:
http://halestrom.net/darksleep/software/minisleep/
There’s now a 60 second video demonstrating some of the wiki’s features. See the changelog for more interesting notes. No easter eggs are in this documentation.
Porting from Bash to POSIX sh
I thought this would be easy :D
Testing isn’t as trivial as you might think, as often bugs caused by a shell changeover are not syntactic but behavioural. Example: some features of minisleep used to read the contents of files into variables like so:
avariable="$(<afilename)"
This works fine on bash, and it even passes checkbashims fine, but it results in an empty variable when used on dash.
Instead you have to:
avariable="$(cat afilename)"
‘cat’ tends to be an external binary, not a shell builtin. Sometimes it’s worth putting the effort into avoiding external binaries to make your code go faster, but Minisleep still takes less time to render pages and do its work than most site backends I have tried, so I don’t think it’s too much of an issue. Runtime seems to be dominated by disk IO, especially on my shared host.
Lack of substring operations
This drove me mad! A basic feature of a language should be selecting substrings; but POSIX sh does not support it. Instead you have to print data out and pipe it into external tools like cut, head or tail. Grrr.
Printing untrusted data is more complex than you think
Many people believe that you should not use echo to print untrusted data:
echo "$untrusted_data_from_a_user"
The data may contain options for echo (like ‘-n’ or ‘-e’). I’m not aware of any way that this can be exploited generically to execute commands or crash your scripts, but it introduces controllability for attackers that you don’t want to have to even think about.
Instead it’s generally recommended that people use printf like so:
printf '%s' "$untrusted_data_from_a_user"
It’s important not to provide the untrusted data in the first (format) string for similar reasons: %20s might seem harmless, but there are so many ways it could stuff things up (URLencoded strings anyone?).
What I didn’t realise is that GNU coreutils printf (the version you probably have if you are on Linux) supports “—longoptions” like many other GNU tools. Take this command for example:
$ printf '-- check logs --'
bash: printf: --: invalid option
printf: usage: printf [-v var] format [arguments]
Brr. Sometimes I wish I was writing in C, with its strict function arguments and strict datatypes. Alas there are some very solid reasons Minisleep is in shell: it’s the lowest common denominator on cheap webhosts, so software written in it can work out of the box. I have not had such luck with:
- C-based backends. My current host has an odd setup where external binaries will not run (odd glibc?), you have to use their toolchain, something that you can only get access to on request and temporarily.
- Perl. Path config typically messes with me.
- Python. Multiple versions and they’re not necessarily installed on your shared host.
PHP is a good contender, but I don’t have any experience with it. From what I have read it’s as evil as shell >:D. It also has multiple common versions out there across various shared webhosts.
Sidenote: bash isn’t completely immune to being different across hosts. I discovered that some have versions that don’t support multi-digit file descriptors. I don’t think many people will care about this feature, but it bit me unexpectantly once when I was using fd 666 for file locking.
Bad line reporting for syntax errors
Messages along the lines of “EOF hit early” or “missing a closing quote’ are absolutely evil. 9/10 times I have hit these errors it has been because of mismatched brackets some 100 or so lines of code before the line the error is reported for:
culprit_evil="${printf '%s' "$hibiscus" | tr -d '\n\r' | sed "s|moo|cow|g")"
Squint if you can’t see the error. The only practical way to finding the culprit line is by deleting/stubbing chunks of code (binary searching by dissection) until the error message disappears.
Bash sometimes gives better errors than dash, but not always.
Was it worth it?
Performance: no. The difference has been minimal. Some tests I have run show that starting bash is a bit slower than starting dash, but the disk IO penalties of work seem to outweigh everything else.
For an interesting tangent to performance: see my performance comparison of sed versus bash’s inbuilt find-and-replace string feature on minisleep’s webpage. Bash must be doing something very unexpected here.
Interoperability: perhaps. I’m not sure whether or not BSD, haiku and other OS users will ever want to use my code, but it’s comforting to know they have a better chance now. Any step to make a project more portable is a good step, especially in terms of code longevity.
Rebasing this website on the latest minisleep
This has closed a few holes and generally made me feel a lot better. Adding comment and throttling support back in (just for this website, not publicly released) took much longer than expected.
Still todo: Email subscription support for commentors, so they can be informed when new comments are added.
What to expect soon
- Lots of lab projects with pretty pictures.
- Boring and drawn out code musings.
- Perhaps some more code/project releases.
Over in the land of feed readers
My new sidequest: a non-crappy feed reader webapp:
I’ve previously found that QuiteRSS is a nice desktop feed app, but it presents me with a few issues:
- It uses webkit. To view untrusted content from the web. Hmm.
- I can’t access it from multiple computers.
RSSguard fixes the first issue (it has a very basic understanding of HTML) and is super-fast to launch, but it is ridiculously slow to update feeds by comparison. During the feed update process many parts of the application are disabled too.
To try and fix all of my issues I started looking at the self-hosted webapp options. This would allow me to use any computer to view my feeds without having to guess at which ones are new/unread.
Unfortunately all of the webapps I have tried suffer from various problems:
- Complex to setup and run. Databases, lists of dependencies, multiple setup attempts before everything is OK.
- Unable to provide both a global list of articles or
- Being designed around ‘background updating’ of feeds. I use my feed reader in “sessions” and I don’t necessarily have a computer always running at home 24/7 for this to be daemonised on.
- Horribly over-padded interface by default. This is often fixable, but it gives me a bad feeling about what the app is targeted toward.
Bastardfeed is my attempt to work around this. Some haphazard notes about it:
- Goals:
- to be the fastest feed reader out there. Both in terms of fetching and processing.
- to be simple and lightweight
- to use a tree filesystem (files and folders) rather than a relational database (SQL)
- to be self-contained (minimal external deps)
- Uses an absolute bastard of a feed processor, written in C.
- Full XML parsing? Pah, who needs that?
- Understands tags about 1-2 layers deep, depending on the situation
- Tuned to work on my (long) list of feeds.
- Surprisingly this angle of attack seems to be working!
- Spends too long creating page indexes (lots of shell calls to external binaries like sed and date). This component is being rewritten into C, soon maybe the whole project will be like this.
- Has troubles with some favicons (notice the blanks in the picture above). Mimetype issues, I’m hoping to avoid needing to use ‘file’ to determine filetypes.
- Haphazard built-in HTTP server, with half-arsed CGI support, in the form of a shell script + socat.
- Doesn’t track article read status, relies on your web-browser history (Firefox sync’d in my case)
- Aggressively fetches feeds in parallel
- Streams the feed-fetch log over HTTP to your browser, so you can sit back with popcorn and watch the bits fly.
- Highlights new articles in feeds (red numbers on sidebar)
- Doesn’t “interleave” articles on the front (mixed feed) page. When a feed gets multiple new entries then they get clumped together, even if this is not the correct method of time-sorting them. This was initially done to simplify things, but I have now grown to like this behaviour — when someone releases a few articles then I tend to want to see them in one place, not intermingled amongst other things.
If I’m insane enough I might release this. But for now I get some satisfaction (and chuckles) every day when I use it.
Comment and throttling support are, of course, exactly what would make minisleep useful to me (and, I'd bet, to other folks as well).
Any plans to release them?
Hmm.
At the moment the comment system is split into a few parts:
(1) New: scripts/comment.cgi. Handles user submissions, re-rendering of pages & emailing the site owner. Gets its own URL (eg /cgi-bin/comment.cgi).
(2) New: scripts/throttle.sh. Places per-IP and global hard limits on x comments per y hours.
(3) More lines in scripts/buildpage.sh, to render the comments and the 'add your own comment' form.
(4) An extra few lines in scripts/minisleep.cgi, to provide an 'Allow comments' checkbox on the page edit form.
This adds a lot of complexity to minisleep -- another ~450 lines of code, taking it past the 1000 line goal. I don't want to ship anything that's hairy, it's (fortunately? unfortunately?) a personal goal after having to deal with some other wiki engines. I also want to make sure it's a 100% sure-fire optional feature (ie you can delete it and nothing breaks).
Throttling introduces some interesting data collection and storage issues. It only stores information (your IP and time of comment submit) if you actually succeed at passing the simple antispam captcha, so arguably you have already somewhat consented. It does however leave you with a folder full of throttle logs that you need to regularly hoover (eg via cronjob) to prevent a useless and permanent accumulation of (potentially personal) data. Spiritually I think that goes against the ideas of a nice, semi-static website; but storing logs is the only way to do it short of a daemon (which could somewhat equivalently log in-memory).
My comment code does not yet support email notifications for commentors. I know it's often inconvenient to regularly visit back to a random blog you have commented on to see if anyone else has replied. Alas there are a number of ways this can go wrong (safe storage of email addresses, need to limit/throttle emails sent out), so I am still contemplating the options. This will definitely have to wait for a more future release.
Before releasing the existing code: I want make some changes so that it's simpler (and therefore easier for other people to deal with). Namely I want to remove per-user comment throttling, instead only throttling comment submissions globally. Throttle code is only hit after people succeed at the captcha anyway, so it's equivalent for limiting most "bad user" scenarios.
Mike in Boston: what sort of thing would you use Minisleep for? Any particular requirements on the comment support, or qualms with anything I have mentioned? Also, thanks for asking :)
Thanks for getting back to me! But there is no need to thank me for asking, since I did so only out of sheer laziness: if you might eventually release comment and throttling support, then I can procrastinate written them myself. :)
I am planning to use minisleep as a lightweight engine with an attack surface much smaller than WordPress for a topical blog hosted on a <a href="https://www.endoffice.com/picolo.html">super cheap Raspberry Pi</a>.
I also dislike needing cron jobs to clear up cruft. One alternative I've used before, is the on-demand approach: when a CGI is called, it first does housecleaning (e.g., cleaning up any entries that are too old ) before going on to its main job.
I wouldn't expect email notifications for commenters-- as you point out, that's a whole can of worms. But I am old-school enough to want to add pingback support, but it will be interesting to see whether pingback spam can be avoided.