modified: Sunday 22 April 2018
Meta: Blog comment systems
Ruben Schade is asking for advice about adding a comment system to his blog.
You fool! I'm only too willing to give it.
Here's an example from disqus' own blog :
- It works
- People can auth with their existing accounts with one of the big corporates
- You don't own anything. If disqus wants to turn around and monetise or shut operations, you've lost all your historical comments.
- Free version is ad-supported
- They collect and share your "personally identifiable information"
None of this ever agreed with me. Ignoring the issues I see as a person running my own site: I've wanted to write comments onto other people's blogs before and actively been stopped due to my issues with disqus. Not all people are the same, but for low read-volume blogs like mine I care about every commenter.
Stories of self hosting
My site is statically generated and has comment support. Many people will look at the comment posting interface and scream internally -- it looks easy to spam and people can impersonate each other. There's only a very, very weak captcha and no login/auth system. It sounds strange and vulnerable, but there's a good story here.
When I originally made the site I wrote a whole user registration system, complete with a million sanity checks and email verification links. To take a corporate view on things it was a HUGE success -- I didn't have one ounce of spam!
It turns out people didn't like having to register just to comment on a blog. They give up halfway through. My logs said so.
I tested my system regularly to make sure registration worked (and it did). Only one comment ever made it, and it was from a friend I forced through the registration process. Even then he wasn't happy he couldn't set his name to start and end with 'X_X'.
- Users want a really simple system where they can just write a comment and hit 'post'
- I want to avoid automated spam and be able to moderate things
My system was a gulag. I had gone full-bore in my own direction without a care for the users.
Of course they didn't want to sign up. It breaks their train of thought (they just want to write a comment!) and puts them into an interrogation chair. What is your mother's maiden address? Where did you hide the [body]?
What could I do?
Well, when in doubt, steal other people's ideas.
Introducing Irrlicht3d.org by Nikolaus Gebhardt. It has a commenting system that looks like this from the outside:
There's a minor bit of anti-bot verifications (a single hard-coded word). Two of the four fields at the top are optional. And then you just write your comment, that's it.
Wouldn't this be vulnerable to mass spam attack? I decided the best thing to do was try it myself here as an experiment. If it failed then I could always pull the plug.
I'm still running that experiment today:
- 2 years
- 64 comments in
- Only one has been bad enough to moderate
- Constant automated spam attacks about viagra and booze, all blocked by the single-word captcha
Aren't you afraid of abuse?
It's not hard for someone to write a script, hardcode my captcha in and try to spam or attack my site. My current system only prevents 'dumb' bots that randomly fill fields on every site.
I've had to take down other sites (such as wiki backends) before because of spam attacks. It's always fun to see CPU usage on hosts pegged at 100% because a new wonder drug is being blogged+about+on+the+frontpage every 30 seconds. It's even better to look back at the site's history and discover this has been happening for weeks or months. Poor server. From what I've seen there are many "forgotten" sites on the web either being spammed or completely exploited/infected.
Experiencing this has made me calmer about the whole situation. I know what it looks like and why people do it.
Below are three strategies. At the moment I'm only using the first one, the others are future routes I'll take if I need to.
Only a certain amount of comments can be posted to my site per day and per week. I track this both by IP and through a global counter, so even a distributed spam attack can only post X comments before any more are blocked.
- Reduces 'computer scale' spam down to human scale. If something happens I can take my time to delete the comments or turn off the system, my site won't be piling up.
- Comment system can be 'taken down' by an attack, preventing legitimate users from commenting until I notice.
- Spam is visible until I remove it.
- Does not tackle bad comments written by humans (attacks and trolling, etc)
2. Option: Moderation
Currently all comments get automatically published. If I have to I can change this to a moderated system where I have to give comments a tick before they appear.
- Spam and bad comments written by humans get blocked
- Delays to people's conversations
- Site can't function on its own; for when I'm busy or away.
- Throttling limits will still get hit
3. Possibility: better captchas
This is a whole other story.
I think there are ways of rolling my own without having to store anything locally on the server through some clever use of one-way hashes. I might actually try writing one of these -- a single .cgi file implementation that does not require any local storage would be amazing.
- Automated spam does not prevent legitimate users from commenting (throttling can be setup differently)
Curious implementation detail: no database
This site uses no databases other than the filesystem itself. Every comment is a folder, like these ones:
~/darksleep/public/blog/010_distrohop_p2 $ ls ds_comments/* ds_comments/1476087963_27865: author content url ds_comments/1476251459_29803: author content url ds_comments/1483907474_24753: author content url ds_comments/1484129284_26265: author content url ...
All user-provided data is stored in the files themselves instead of the filenames; to prevent abuse. The foldernames are just the current time (seconds since epoch) plus a random number to avoid collisions. A simple sort command gets them in order.
You don't need a database until you're dealing with thousands and thousands of comments; and by then your site would probably be big enough to warrant the hassle. Until then: don't throw databases at problems your filesystem will happily solve.
A site that never was
I once had the idea of making the site static with zero on-server scripting. Commenting would be done through emails to a specific address, and a script on my local/desktop would read them. New pages would then get pushed to the real webserver.
This would work on even completely static free hosts (ones that allow no CGI or similar). My old ISP still offered something like 64MB of space just for this. I wonder if anyone has ever done this before.
This site is statically generated. I think it's an absolutely brilliant (and easy) idea. Ignoring the speed benefits, it means the site stays up if I disable the commenting system/script.
In Ruben's case: his site is statically generated and uses a version control system. I'm not sure which order and how this is setup (he might generate on his personal computer and then push via vcs), so it might be inconvenient for him to go down the road I have.
Suggested solution: write a small .cgi script that handles accepting comments and generating .html files containing nothing but them. Then [iframe] or similar them in to your main pages, so you don't have to modify your main pages (or touch your vcs) when a comment gets added.
I've written my backend in bash, because it's really really easy to handle files in. Admittedly it's a little hard to keep things secure -- for instance you have to use 'printf' instead of 'echo' to echo untrustworthy data -- but it's simple and fast. CGI lets you use any language you want, and I'd recommend giving shell a go.
I have the urge to write a portable system anyone can run themselves and embed in their pages, but I don't have the time at the moment. This week had seen four lots of assessments and me getting behind in other work. I chose a good week to try and get my site back together.
Ruben: I'm good at writing long pages and making things seem complicated. Try making your own system.
Hint: html forms with some [input type='text] and then a [textarea] last makes some very easy to process (with any language) output.
I'm happy to help out, ask me questions about any problem, I probably thought of it at some point too :)
I'd also love to hear other people's opinions on this, if you're still reading.