Home | Software | Rules and About | Atom feed
Admin | Edit
published: Thursday 12 April 2018
modified: Sunday 22 April 2018
author: Hales
markup: textile

Meta: Blog comment systems

Ruben Schade is asking for advice about adding a comment system to his blog.

You fool! I'm only too willing to give it.

Disqus

The beast. Disqus is comments as a service. Sign up, add some javascript to your pages and you have it.

Here's an example from disqus' own blog :

Pros:

Cons:

None of this ever agreed with me. Ignoring the issues I see as a person running my own site: I've wanted to write comments onto other people's blogs before and actively been stopped due to my issues with disqus. Not all people are the same, but for low read-volume blogs like mine I care about every commenter.

Stories of self hosting

My site is statically generated and has comment support. Many people will look at the comment posting interface and scream internally -- it looks easy to spam and people can impersonate each other. There's only a very, very weak captcha and no login/auth system. It sounds strange and vulnerable, but there's a good story here.

When I originally made the site I wrote a whole user registration system, complete with a million sanity checks and email verification links. To take a corporate view on things it was a HUGE success -- I didn't have one ounce of spam!

It turns out people didn't like having to register just to comment on a blog. They give up halfway through. My logs said so.

I tested my system regularly to make sure registration worked (and it did). Only one comment ever made it, and it was from a friend I forced through the registration process. Even then he wasn't happy he couldn't set his name to start and end with 'X_X'.

Stepping back

My system was a gulag. I had gone full-bore in my own direction without a care for the users.

Of course they didn't want to sign up. It breaks their train of thought (they just want to write a comment!) and puts them into an interrogation chair. What is your mother's maiden address? Where did you hide the [body]?

What could I do?

Well, when in doubt, steal other people's ideas.

Introducing Irrlicht3d.org by Nikolaus Gebhardt. It has a commenting system that looks like this from the outside:

There's a minor bit of anti-bot verifications (a single hard-coded word). Two of the four fields at the top are optional. And then you just write your comment, that's it.

Wouldn't this be vulnerable to mass spam attack? I decided the best thing to do was try it myself here as an experiment. If it failed then I could always pull the plug.

I'm still running that experiment today:

Status:

Aren't you afraid of abuse?

It's not hard for someone to write a script, hardcode my captcha in and try to spam or attack my site. My current system only prevents 'dumb' bots that randomly fill fields on every site.

I've had to take down other sites (such as wiki backends) before because of spam attacks. It's always fun to see CPU usage on hosts pegged at 100% because a new wonder drug is being blogged+about+on+the+frontpage every 30 seconds. It's even better to look back at the site's history and discover this has been happening for weeks or months. Poor server. From what I've seen there are many "forgotten" sites on the web either being spammed or completely exploited/infected.

Experiencing this has made me calmer about the whole situation. I know what it looks like and why people do it.

Below are three strategies. At the moment I'm only using the first one, the others are future routes I'll take if I need to.

1. Throttling

Only a certain amount of comments can be posted to my site per day and per week. I track this both by IP and through a global counter, so even a distributed spam attack can only post X comments before any more are blocked.

Pros:

Cons:

2. Option: Moderation

Currently all comments get automatically published. If I have to I can change this to a moderated system where I have to give comments a tick before they appear.

Pros:

Cons:

3. Possibility: better captchas

This is a whole other story.

I think there are ways of rolling my own without having to store anything locally on the server through some clever use of one-way hashes. I might actually try writing one of these -- a single .cgi file implementation that does not require any local storage would be amazing.

Pros:

Curious implementation detail: no database

This site uses no databases other than the filesystem itself. Every comment is a folder, like these ones:

~/darksleep/public/blog/010_distrohop_p2 $ ls ds_comments/*

ds_comments/1476087963_27865:
author  content  url

ds_comments/1476251459_29803:
author  content  url

ds_comments/1483907474_24753:
author  content  url

ds_comments/1484129284_26265:
author  content  url

...

All user-provided data is stored in the files themselves instead of the filenames; to prevent abuse. The foldernames are just the current time (seconds since epoch) plus a random number to avoid collisions. A simple sort command gets them in order.

You don't need a database until you're dealing with thousands and thousands of comments; and by then your site would probably be big enough to warrant the hassle. Until then: don't throw databases at problems your filesystem will happily solve.

A site that never was

I once had the idea of making the site static with zero on-server scripting. Commenting would be done through emails to a specific address, and a script on my local/desktop would read them. New pages would then get pushed to the real webserver.

This would work on even completely static free hosts (ones that allow no CGI or similar). My old ISP still offered something like 64MB of space just for this. I wonder if anyone has ever done this before.

Closing remarks

This site is statically generated. I think it's an absolutely brilliant (and easy) idea. Ignoring the speed benefits, it means the site stays up if I disable the commenting system/script.

In Ruben's case: his site is statically generated and uses a version control system. I'm not sure which order and how this is setup (he might generate on his personal computer and then push via vcs), so it might be inconvenient for him to go down the road I have.

Suggested solution: write a small .cgi script that handles accepting comments and generating .html files containing nothing but them. Then [iframe] or similar them in to your main pages, so you don't have to modify your main pages (or touch your vcs) when a comment gets added.

I've written my backend in bash, because it's really really easy to handle files in. Admittedly it's a little hard to keep things secure -- for instance you have to use 'printf' instead of 'echo' to echo untrustworthy data -- but it's simple and fast. CGI lets you use any language you want, and I'd recommend giving shell a go.

I have the urge to write a portable system anyone can run themselves and embed in their pages, but I don't have the time at the moment. This week had seen four lots of assessments and me getting behind in other work. I chose a good week to try and get my site back together.

Ruben: I'm good at writing long pages and making things seem complicated. Try making your own system.

Hint: html forms with some [input type='text] and then a [textarea] last makes some very easy to process (with any language) output.

I'm happy to help out, ask me questions about any problem, I probably thought of it at some point too :)

I'd also love to hear other people's opinions on this, if you're still reading.


EdS - Monday 16 April 2018

Just noting that you do have at least one reader! But you probably know that from your logs. With forum signups, there may be bots but there also seem to be humans who are probably paid a pittance to get past the captcha and create accounts for later abuse by spammers. Staying under the radar is a good start.

Hales - (site author) - Monday 16 April 2018

Hey Eds!

I only regularly look at logs specifically for my site's CGI (interactive) components. The normal page visits are spammed to a massive degree by bots of all types so it's hard to comprehend things there.

On that note:

2018-04-16T00:57:20+1000 n (x.x.x.x) main: attempting action 'comment_add'
2018-04-16T00:57:20+1000 n (x.x.x.x) fail user: action_comment_add: authorname too short
2018-04-16T00:57:32+1000 n (x.x.x.x) main: attempting action 'comment_add'
2018-04-16T00:57:32+1000 n (x.x.x.x) action_comment_add: author 'EdS' posted comment to '/blog/030_comment_blog_systems/'

Woops. I think I should relax that restriction. I presume you're Ed, not Eds? :P


> With forum signups, there may be bots but there also seem to be humans who are probably paid a pittance to get past the captcha and create accounts for later abuse by spammers

I remember reading something somewhat related to this once. The idea of putting more trust into the users that register. It was by someone who operated a paste-any-html style site that wanted to combat their site being used for abuse.

They introduced a signup system, and even paid tiers, in the hope of removing or reducing the spammers. They then found out it was the spammers who were most likely to sign up for the accounts :)

> Staying under the radar is a good start.

There's a few ways I can look at this and I'm not sure which one you mean. Avoiding publicity?

I thought long and hard about making this post -- whether or not discussing spam problems and the particular ways you could do it on my site could lead to spam -- but I settled on the belief that's it's better to share problems than hide them. I think people should be prepared and understand, rather than be afraid.

Hales - (site author) - Sunday 22 April 2018

Ruben's reply: https://rubenerd.com/feedback-on-static-comments/

I'm glad you had more people than just me sharing ideas with you. From my POV, as a reader of your blog that occasionally sends you an email, I have no clue about how many other people actively do the same. No tumbleweeds, just chasms.

> The main downside there is people may not be enthusiastic about commenting if they either need their own blog to link back to mine,

If you are referring to my comment system: the "URL" field is completely optional. The only mandatory components are the Name, AntispamWord and CommentBody. You don't need a blog of your own to reply here.


> DW: For the love of god, DON’T DO BLOG COMMENTS!
> DW: [..]
> DW: People shitpost you for everything and think they are clever. It’s so tiring. Fuck that.

Definitely be prepared. But don't be afraid. It's your comment system, your universe. If people come to your universe to try and stuff you over, then they obviously don't know you're in control of the laws of physics here.

Chris Siebenmann - website - Wednesday 27 March 2019

The one anti-spam precaution I use that has taken out all of the automated spam targeted at my blog is a hidden text input field in the form (with a label that says 'please do not put anything here', in case people are leaving comments with lynx or some other non-CSS thing). Any comment submission with something in that field fails. It appears that general comment spam bots reliably stuff input into every field they can see, and so they all fail this check. None of my other anti-spam precautions even come close to being as effective as it.

I do periodically get successful spam that seems to be entered by human beings, but I can't do anything about that short of going to moderation and it's not enough so far to make me go that far. Since I watch my new comments, none of it lasts for very long and I generally block repeat instances of it (I have a collection of banned URLs and so on).

I'm still vulnerable to a mass automated spam attack, but it would have to be from something that either automatically recognizes a 'do not fill' field in some way or that was specifically set up to target me. So far, no one has bothered, although if I was a big site I'm sure that some spammers would.

Chris Siebenmann - website - Wednesday 27 March 2019

Also, it appears that your current comment system is not properly recognizing 'https://' URLs when entered in the URL field; they get a 'http://' put on the front when materialized in your HTML, which has some problems. (Firefox interprets them as a weird site name.)

Hales - (site author) - Wednesday 27 March 2019

Hello Chris,

I also have a hidden text field. Albeit it looks like it alone wouldn't be enough for me with my current setup. Take this logged spam event from today as an example:

> Posted content (first 100 chars of each):
> - pagepath: '/blog/013_antenna_spiralam/'
> - author: 'Mazuelbaica'
> - email: 'bunkomux[at]yandex.com'
> - url: ''
> - antispam1: ''
> - antispam2: ''

Antispam1 (called something else in the HTML form) requires the word irrlicht. Antispam2 (called something else in the HTML form) needs to be left blank.

If I didn't have both then it looks like I'd be accepting spam comments. It seems some bots are smart enough to leave fields with names they don't recognise as blank.

Sidenote: I really encourage logging of failed form submissions, they give you insight both into spammer methods and problems actual people have with your site. Make sure you treat them as dangerous if you intend to read them in your browser, and provide some safeguards so that your disk doesn't fill up. Escaping for safe browser viewing is an interesting topic: https://wonko.com/post/html-escaping


> or that was specifically set up to target me. So far, no one has bothered, although if I was a big site I'm sure that some spammers would.

Yep. I hope never to reach that day, but if I do I'll have to push all comments into a 'moderator accepted only' mode. I miss the idea of public comments fields that are not filled with spam or slime, and I'd like to try and keep it open for as long as I can.


> your current comment system is not properly recognising 'https://' URLs when entered in the URL field

Yes, my URL detection and defusing code has a whole pile of problems. On the plus side this bad URL handling code did break a successful spammer's URL link once :D Thankyou for reporting the problem, I'll see what I can do.

(On the negative side, it looks like Firefox interprets the links as http.com O.o)

There's a growing list of things I need to fix on this site. I've been putting a lot of effort into a fork of this site's code, intended for general wiki use (selling catchphrases "actually small", "HTTP_AUTH security", "HTML WYSIWYG", "drag and drop image support", "normal pages are static html" and "depends only on bash and a CGI webserver"). I hoped to release it soon.

Chris Siebenmann - website - Monday 1 April 2019

You inspired me to look at the exact HTML of my comment form, and it turns out that one reason my hidden field works so well for me may be the name I gave it (compared to the name you gave yours). I specifically call mine 'name' in the HTML form, and based on the difference between our results, I suspect that this functions as bait for spambots that specifically look for text fields with certain sorts of names.

(Your results suggest that some of your field names may also be attracting bots, if they reliably stuff certain fields.)

I do log some data for rejected comment spammers, but I don't currently log all of it; basically I log the rejection reason, often including the field value, instead of the full comment submission. Mostly this is because of my personal views on the volume of my logs. If I was actively hunting comment spammers and figuring out their behaviors I definitely would want to log full information.

Hales - (site author) - Monday 8 April 2019

Sounds valid. A bot would almost never avoid filling a 'name' field, that would prevent it from working on a large chunk (most?) of the websites out there.

> basically I log the rejection reason, often including the field value, instead of the full comment submission.

I tried doing this for a while, but it turned out my summary reasons/methods were wrong. Example: Eds, the first person to comment at the top of this page. The error message I had coded was 'invalid length username: too short". If he didn't retry himself to work around the problem then I would have skimmed right past it assuming it was spam.

> Mostly this is because of my personal views on the volume of my logs

I can understand that. An ideal setup is one that (1) never requires manual maintenance and (2) only has succinct/exact logs, rather than ones full of hay.

This site is a bit of a hobby for me, so I'm happy to pay that price.


Add your own comment:

Name:
Email (optional):
URL (optional):
Enter the word 'irrlicht' (antispam):
Leave this box blank (antispam):

Comment (plaintext only):

If you provide an email address: it will only be used for the site admin to contact you, it will not be made public.

If you provide a URL: your name will hyperlink to it.