published: Thursday 12 April 2018
modified: Saturday 21 April 2018
author: Hales
markup: textile

Meta: Blog comment systems

Comment systems on self-run sites: methods, security, what I've found so far.

Ruben Schade is asking for advice about adding a comment system to his blog.

You fool! I’m only too willing to give it.

Disqus

The beast. Disqus is comments as a service. Sign up, add some javascript to your pages and you have it.

Here’s an example from disqus’ own blog :

Pros:

It works
People can auth with their existing accounts with one of the big corporates

Cons:

You don’t own anything. If disqus wants to turn around and monetise or shut operations, you’ve lost all your historical comments.
Free version is ad-supported
They collect and share your “personally identifiable information”
Does not work without javascript

None of this ever agreed with me. Ignoring the issues I see as a person running my own site: I’ve wanted to write comments onto other people’s blogs before and actively been stopped due to my issues with disqus. Not all people are the same, but for low read-volume blogs like mine I care about every commenter.

Stories of self hosting

My site is statically generated and has comment support. Many people will look at the comment posting interface and scream internally — it looks easy to spam and people can impersonate each other. There’s only a very, very weak captcha and no login/auth system. It sounds strange and vulnerable, but there’s a good story here.

When I originally made the site I wrote a whole user registration system, complete with a million sanity checks and email verification links. To take a corporate view on things it was a HUGE success — I didn’t have one ounce of spam!

It turns out people didn’t like having to register just to comment on a blog. They give up halfway through. My logs said so.

I tested my system regularly to make sure registration worked (and it did). Only one comment ever made it, and it was from a friend I forced through the registration process. Even then he wasn’t happy he couldn’t set his name to start and end with ‘X_X’.

Stepping back

Users want a really simple system where they can just write a comment and hit ‘post’
I want to avoid automated spam and be able to moderate things

My system was a gulag. I had gone full-bore in my own direction without a care for the users.

Of course they didn’t want to sign up. It breaks their train of thought (they just want to write a comment!) and puts them into an interrogation chair. What is your mother’s maiden address? Where did you hide the [body]?

What could I do?

Well, when in doubt, steal other people’s ideas.

Introducing Irrlicht3d.org by Nikolaus Gebhardt. It has a commenting system that looks like this from the outside:

There’s a minor bit of anti-bot verifications (a single hard-coded word). Two of the four fields at the top are optional. And then you just write your comment, that’s it.

Wouldn’t this be vulnerable to mass spam attack? I decided the best thing to do was try it myself here as an experiment. If it failed then I could always pull the plug.

I’m still running that experiment today:

Status:

2 years
64 comments in
Only one has been bad enough to moderate
Constant automated spam attacks about viagra and booze, all blocked by the single-word captcha

Aren’t you afraid of abuse?

It’s not hard for someone to write a script, hardcode my captcha in and try to spam or attack my site. My current system only prevents ‘dumb’ bots that randomly fill fields on every site.

I’ve had to take down other sites (such as wiki backends) before because of spam attacks. It’s always fun to see CPU usage on hosts pegged at 100% because a new wonder drug is being blogged+about+on+the+frontpage every 30 seconds. It’s even better to look back at the site’s history and discover this has been happening for weeks or months. Poor server. From what I’ve seen there are many “forgotten” sites on the web either being spammed or completely exploited/infected.

Experiencing this has made me calmer about the whole situation. I know what it looks like and why people do it.

Below are three strategies. At the moment I’m only using the first one, the others are future routes I’ll take if I need to.

1. Throttling

Only a certain amount of comments can be posted to my site per day and per week. I track this both by IP and through a global counter, so even a distributed spam attack can only post X comments before any more are blocked.

Pros:

Reduces ‘computer scale’ spam down to human scale. If something happens I can take my time to delete the comments or turn off the system, my site won’t be piling up.

Cons:

Comment system can be ‘taken down’ by an attack, preventing legitimate users from commenting until I notice.
Spam is visible until I remove it.
Does not tackle bad comments written by humans (attacks and trolling, etc)

2. Option: Moderation

Currently all comments get automatically published. If I have to I can change this to a moderated system where I have to give comments a tick before they appear.

Pros:

Spam and bad comments written by humans get blocked

Cons:

Delays to people’s conversations
Site can’t function on its own; for when I’m busy or away.
Throttling limits will still get hit

3. Possibility: better captchas

This is a whole other story.

I think there are ways of rolling my own without having to store anything locally on the server through some clever use of one-way hashes. I might actually try writing one of these — a single .cgi file implementation that does not require any local storage would be amazing.

Pros:

Automated spam does not prevent legitimate users from commenting (throttling can be setup differently)

Curious implementation detail: no database

This site uses no databases other than the filesystem itself. Every comment is a folder, like these ones:

~/darksleep/public/blog/010_distrohop_p2 $ ls ds_comments/*

ds_comments/1476087963_27865:
author  content  url

ds_comments/1476251459_29803:
author  content  url

ds_comments/1483907474_24753:
author  content  url

ds_comments/1484129284_26265:
author  content  url

...

All user-provided data is stored in the files themselves instead of the filenames; to prevent abuse. The foldernames are just the current time (seconds since epoch) plus a random number to avoid collisions. A simple sort command gets them in order.

You don’t need a database until you’re dealing with thousands and thousands of comments; and by then your site would probably be big enough to warrant the hassle. Until then: don’t throw databases at problems your filesystem will happily solve.

A site that never was

I once had the idea of making the site static with zero on-server scripting. Commenting would be done through emails to a specific address, and a script on my local/desktop would read them. New pages would then get pushed to the real webserver.

This would work on even completely static free hosts (ones that allow no CGI or similar). My old ISP still offered something like 64MB of space just for this. I wonder if anyone has ever done this before.

Closing remarks

This site is statically generated. I think it’s an absolutely brilliant (and easy) idea. Ignoring the speed benefits, it means the site stays up if I disable the commenting system/script.

In Ruben’s case: his site is statically generated and uses a version control system. I’m not sure which order and how this is setup (he might generate on his personal computer and then push via vcs), so it might be inconvenient for him to go down the road I have.

Suggested solution: write a small .cgi script that handles accepting comments and generating .html files containing nothing but them. Then [iframe] or similar them in to your main pages, so you don’t have to modify your main pages (or touch your vcs) when a comment gets added.

I’ve written my backend in bash, because it’s really really easy to handle files in. Admittedly it’s a little hard to keep things secure — for instance you have to use ‘printf’ instead of ‘echo’ to echo untrustworthy data — but it’s simple and fast. CGI lets you use any language you want, and I’d recommend giving shell a go.

I have the urge to write a portable system anyone can run themselves and embed in their pages, but I don’t have the time at the moment. This week had seen four lots of assessments and me getting behind in other work. I chose a good week to try and get my site back together.

Ruben: I’m good at writing long pages and making things seem complicated. Try making your own system.

Hint: html forms with some [input type=’text] and then a [textarea] last makes some very easy to process (with any language) output.

I’m happy to help out, ask me questions about any problem, I probably thought of it at some point too :)

I’d also love to hear other people’s opinions on this, if you’re still reading.

EdS - Sunday 15 April 2018

Just noting that you do have at least one reader! But you probably know that from your logs. With forum signups, there may be bots but there also seem to be humans who are probably paid a pittance to get past the captcha and create accounts for later abuse by spammers. Staying under the radar is a good start.

Hales - (site author) - Monday 16 April 2018

Hey Eds!

I only regularly look at logs specifically for my site's CGI (interactive) components. The normal page visits are spammed to a massive degree by bots of all types so it's hard to comprehend things there.

On that note:

2018-04-16T00:57:20+1000 n (x.x.x.x) main: attempting action 'comment_add'
2018-04-16T00:57:20+1000 n (x.x.x.x) fail user: action_comment_add: authorname too short
2018-04-16T00:57:32+1000 n (x.x.x.x) main: attempting action 'comment_add'
2018-04-16T00:57:32+1000 n (x.x.x.x) action_comment_add: author 'EdS' posted comment to '/blog/030_comment_blog_systems/'

Woops. I think I should relax that restriction. I presume you're Ed, not Eds? :P

> With forum signups, there may be bots but there also seem to be humans who are probably paid a pittance to get past the captcha and create accounts for later abuse by spammers

I remember reading something somewhat related to this once. The idea of putting more trust into the users that register. It was by someone who operated a paste-any-html style site that wanted to combat their site being used for abuse.

They introduced a signup system, and even paid tiers, in the hope of removing or reducing the spammers. They then found out it was the spammers who were most likely to sign up for the accounts :)

> Staying under the radar is a good start.

There's a few ways I can look at this and I'm not sure which one you mean. Avoiding publicity?

I thought long and hard about making this post -- whether or not discussing spam problems and the particular ways you could do it on my site could lead to spam -- but I settled on the belief that's it's better to share problems than hide them. I think people should be prepared and understand, rather than be afraid.

Hales - (site author) - Sunday 22 April 2018

Ruben's reply: https://rubenerd.com/feedback-on-static-comments/

I'm glad you had more people than just me sharing ideas with you. From my POV, as a reader of your blog that occasionally sends you an email, I have no clue about how many other people actively do the same. No tumbleweeds, just chasms.

> The main downside there is people may not be enthusiastic about commenting if they either need their own blog to link back to mine,

If you are referring to my comment system: the "URL" field is completely optional. The only mandatory components are the Name, AntispamWord and CommentBody. You don't need a blog of your own to reply here.

> DW: For the love of god, DON’T DO BLOG COMMENTS!
> DW: [..]
> DW: People shitpost you for everything and think they are clever. It’s so tiring. Fuck that.

Definitely be prepared. But don't be afraid. It's your comment system, your universe. If people come to your universe to try and stuff you over, then they obviously don't know you're in control of the laws of physics here.

Chris Siebenmann - website - Tuesday 26 March 2019

The one anti-spam precaution I use that has taken out all of the automated spam targeted at my blog is a hidden text input field in the form (with a label that says 'please do not put anything here', in case people are leaving comments with lynx or some other non-CSS thing). Any comment submission with something in that field fails. It appears that general comment spam bots reliably stuff input into every field they can see, and so they all fail this check. None of my other anti-spam precautions even come close to being as effective as it.

I do periodically get successful spam that seems to be entered by human beings, but I can't do anything about that short of going to moderation and it's not enough so far to make me go that far. Since I watch my new comments, none of it lasts for very long and I generally block repeat instances of it (I have a collection of banned URLs and so on).

I'm still vulnerable to a mass automated spam attack, but it would have to be from something that either automatically recognizes a 'do not fill' field in some way or that was specifically set up to target me. So far, no one has bothered, although if I was a big site I'm sure that some spammers would.

Also, it appears that your current comment system is not properly recognizing 'https://' URLs when entered in the URL field; they get a 'http://' put on the front when materialized in your HTML, which has some problems. (Firefox interprets them as a weird site name.)

Hales - (site author) - Tuesday 26 March 2019

Hello Chris,

I also have a hidden text field. Albeit it looks like it alone wouldn't be enough for me with my current setup. Take this logged spam event from today as an example:

> Posted content (first 100 chars of each):
> - pagepath: '/blog/013_antenna_spiralam/'
> - author: 'Mazuelbaica'
> - email: 'bunkomux[at]yandex.com'
> - url: ''
> - antispam1: ''
> - antispam2: ''

Antispam1 (called something else in the HTML form) requires the word irrlicht. Antispam2 (called something else in the HTML form) needs to be left blank.

If I didn't have both then it looks like I'd be accepting spam comments. It seems some bots are smart enough to leave fields with names they don't recognise as blank.

Sidenote: I really encourage logging of failed form submissions, they give you insight both into spammer methods and problems actual people have with your site. Make sure you treat them as dangerous if you intend to read them in your browser, and provide some safeguards so that your disk doesn't fill up. Escaping for safe browser viewing is an interesting topic: https://wonko.com/post/html-escaping

> or that was specifically set up to target me. So far, no one has bothered, although if I was a big site I'm sure that some spammers would.

Yep. I hope never to reach that day, but if I do I'll have to push all comments into a 'moderator accepted only' mode. I miss the idea of public comments fields that are not filled with spam or slime, and I'd like to try and keep it open for as long as I can.

> your current comment system is not properly recognising 'https://' URLs when entered in the URL field

Yes, my URL detection and defusing code has a whole pile of problems. On the plus side this bad URL handling code did break a successful spammer's URL link once :D Thankyou for reporting the problem, I'll see what I can do.

(On the negative side, it looks like Firefox interprets the links as http.com O.o)

There's a growing list of things I need to fix on this site. I've been putting a lot of effort into a fork of this site's code, intended for general wiki use (selling catchphrases "actually small", "HTTP_AUTH security", "HTML WYSIWYG", "drag and drop image support", "normal pages are static html" and "depends only on bash and a CGI webserver"). I hoped to release it soon.

Chris Siebenmann - website - Monday 1 April 2019

You inspired me to look at the exact HTML of my comment form, and it turns out that one reason my hidden field works so well for me may be the name I gave it (compared to the name you gave yours). I specifically call mine 'name' in the HTML form, and based on the difference between our results, I suspect that this functions as bait for spambots that specifically look for text fields with certain sorts of names.

(Your results suggest that some of your field names may also be attracting bots, if they reliably stuff certain fields.)

I do log some data for rejected comment spammers, but I don't currently log all of it; basically I log the rejection reason, often including the field value, instead of the full comment submission. Mostly this is because of my personal views on the volume of my logs. If I was actively hunting comment spammers and figuring out their behaviors I definitely would want to log full information.

Hales - (site author) - Monday 8 April 2019

Sounds valid. A bot would almost never avoid filling a 'name' field, that would prevent it from working on a large chunk (most?) of the websites out there.

> basically I log the rejection reason, often including the field value, instead of the full comment submission.

I tried doing this for a while, but it turned out my summary reasons/methods were wrong. Example: Eds, the first person to comment at the top of this page. The error message I had coded was 'invalid length username: too short". If he didn't retry himself to work around the problem then I would have skimmed right past it assuming it was spam.

> Mostly this is because of my personal views on the volume of my logs

I can understand that. An ideal setup is one that (1) never requires manual maintenance and (2) only has succinct/exact logs, rather than ones full of hay.

This site is a bit of a hobby for me, so I'm happy to pay that price.

Rey Baker - website - Tuesday 7 January 2020

Jotting down some of the important keywords. Thank you for sharing.

Poorchop - Wednesday 23 June 2021

Thanks to your suggestions both here and via our e-mail exchanges, I managed to get a comment system running on my site. I had initially templated it as a CGI script but I ended up using a web framework after some people encouraged me to at least consider a framework when they heard that I was using CGI. This particular framework can still detect and use CGI as a backend, which is a nice bonus.

I ended up using a lot of your advice so I appreciate all of the wisdom that you had imparted. I made a few modifications to your methods, some of them with the help of others, but my script would've been pretty janky without using your tips for guidance. The implementation is modular in the same way that yours is so if the comment server goes down, the rest of the site will still work and all existing comments will still be visible.

I have a newfound appreciation for blog comment systems and it would be great if more people used them. It definitely seems like there is a lack of decentralized systems and self-hosted implementations that don't need JavaScript to collect or display comments. Maybe we can change that. Once I reach a point where I feel like my script is secure and universal enough, I'd love to release it for others to add to their otherwise static sites. Also, if you do end up coming up with a more resilient captcha system, I hope that you post your findings.

Hales - (site author) - Wednesday 23 June 2021

Hello Poorchop,

Happy to be of help. Thankyou for coming back to write about your experiences, that's worth more than you think.

I don't think I have a link to your site anywhere in the comments or our emails. Are you happy to share a link? I'd like to take a look if I can :)

> I ended up using a web framework after some people encouraged me to at least consider a framework when they heard that I was using CGI

Why did they recommend them over CGI? How has it been going?

My adventures with frameworks have not been the best, so I'm hoping yours have been better: https://lobste.rs/s/pdynxz/long_death_cgi_pm#c_lw4zci

> Maybe we can change that. Once I reach a point where I feel like my script is secure and universal enough, I'd love to release it for others to add to their otherwise static sites.

Release early or release never! A good release is the enemy of ever releasing at all. There will always be some people that complain about your project regardless of how good or bad it is.

I need to bite the bullet and release quite a lot of stuff:

* BastardFeed, my feed reader that breaks all of the rules and still beats other feed readers in speed. Lots of rough corners, but stuff it I should just release it as-is.
* halesutil.c/h: personal C library with no deps other than standard C libs. Lots of useful stuff I end up using in a lot of my projects, including the above.
* Article on modifying the bios of my current laptop to permit repairs
* Random articles on various equipment repairs I've done. Low effort, mostly photos with a little bit of text description.

A scarier thing to consider: implementing 'categories' for my site, to make sifting through related articles much easier for readers.

Hope your part of the world is doing well Poorchop.

Poorchop - Thursday 24 June 2021

I pretty much agree with your Lobster post from top to bottom. Having never used CGI back in its heyday, I happened upon it as a backend option only shortly before coming across your site, so I was pretty happy when I saw an example of it in use today. After writing the very basic template to handle simple input, I started to realize that writing a robust and secure script would be really challenging. I picked up one of the ancient books on CGI programming and I read a little more documentation, which revealed that there were already some major issues with my simple script. However, I hadn't gotten too far into the book when I made the decision to switch to the Mojolicious framework.

The framework has been working exceptionally well. I accomplished more than I thought I would with only a very tiny amount of code. I didn't take notes so I can't remember all of the reasons why I switched other than the fact that it felt a lot harder to accidentally shoot myself in the foot with the framework. It automatically handled CGI pitfalls and I really liked the separation of GET and POST requests afforded by routing. The built in form validation is nice, and it's easily extensible with custom methods. Templates are awesome for dynamically displaying messages. Properly implementing Post/Redirect/Get was confusing me for some reason when I was using CGI, but it's straightforward with Mojo. Of course it would be possible to do all of this with (f)CGI, but it wouldn't taken a lot more work, programming chops, and time that I just don't have.

The Mojo documentation is excellent and the folks on IRC were also unbelievably helpful. As for releasing the script, I'd like to reach a point where I don't need to rely on security through obscurity. It took me a little while just to figure out how to best avoid hardcoding sensitive information. I also need to implement some better abuse prevention. In spite of how much I have been enjoying this framework, I will definitely continue to champion CGI for many of the same reasons that you cited, and I still think that it is criminally underappreciated. I agree with teaching students to start with CGI although I'm surprised that they found Flask difficult to understand. However, I remember briefly looking at Django and thinking that it hardly looked like Python. I also have prior experience with using Ember.js and maybe that helped me more with understanding Mojo than I realize.

I'll send you a link to my implementation in a bit. Thank you for the good wishes. I hope that all is well on your end too.

Name:
Email (optional):
URL (optional):
Enter the word 'irrlicht' (antispam):
Leave this box blank (antispam):