Reducing website vulnerability to hacking

There is no such thing as a hack-proof website, but those that are hand-coded are more difficult to crack.  These usually have much simpler content: they're more difficult to hack, mainly because their structure is only known to the person who coded them.

Content Management Systems (CMS) are much easier to administer, but they're also easier to penetrate because all CMSs have standard structures, details of which are very easy to find - just by downloading one and examining its contents.

Why these slime-buckets bother to do this escapes me, but we live in a dangerous world inhabited by some very strange people.

The vulnerability comes through use of PHP - a server-side programming language.
If misused, it can do a lot of damage.
Used correctly, PHP is very secure - but there is too much sloppy programming out there...

I run three websites - this one (stevegs.com) is entirely hand-coded.
I run the others on behalf of organisations, which means (among other things) others must be able to make changes to content of these websites - so the only feasible way is to use a CMS.

One of these is for a Theatre: it lists our programme of events and allows punters to buy tickets.  It uses Wordpress and, having been kept up-to-date throughout its life, can run the latest version (V6.1.1 at the time of writing).

The other is an online local news site that uses Joomla.  This has been going since 2007 - it includes a significant amount of historically interesting material.  For some reason best known to Joomla, the format of the database (which all CMSs use to store their user content) changed from V1.5 to V1.6.  Those running it at the time did not have the resources to make the old database compatible with the new system, so it limped on with an outdated version of Joomla.  The best I could do when I inherited it was to get it to the latest version possible, which was V1.5.26.

Wordpress and Joomla are probably the most common CMSs around, so they are the most targetted by hackers.  The more up-to-date a CMS is, the less vulnerable it becomes, but it is always a pain having to keep one step ahead.

There are several other things you can do to make a hacker's life more difficult - doing so is like putting better locks on your house: if they find it more difficult to break in, they'll go elsewhere.

  1. Use a strong password to get into the back end (management side) of your website.  There are plenty of password generating websites about - some of the results these produce are so terse you need to write them down - another vulnerability.  So do something like removing the vowels from a memorable word; inserting capital letters part-way through; spelling it backwards; inserting a memorable number; replacing some letters with numbers [like e → 3 or o → 0 (zero)]; including at least one non-alphanumeric character [eg. /];
  2. The default user name for most CMSs is 'admin'.
    Don't use it - choose something else that isn't related to your website (eg. 'wombat ' would not be good for wombat-exterminators.com);
  3. Another way in is via FTP (File Transfer Protocol).  Once in, a hacker could modify or delete your files.
    FTP is too insecure - SFTP (Secure FTP) is better (SSH - Secure Shell - is better still, but not all web hosting providers offer these).
    So, make your user name and password for FTP/SFTP/SSH sufficiently obtuse as well.

However, you need to do other things to combat vulnerabilities in a CMS.  Some fixes for Wordpress and Joomla are listed below:

Wordpress:  Whether a bug or intended as a feature, until at least V4.6 anyone can find the user names of Wordpress administrators.  The default administrator user name in Wordpress is 'admin': all users have numbers, starting at 1.
It is possible with earlier versions of Wordpress to find administrator user names on any such site by typing in the URL bar: www.hackablesite.com/?author=1

If the Wordpress default has been kept, the URL bar will change to: www.hackablesite.com/author/blog/admin/
- which is the giveaway.

Even if they used 'wombat' instead of 'admin', it will come up with: www.hackablesite.com/author/blog/wombat/
- but fortunately Wordpress has since addressed this: hacker should now get an Error 403 page.

So the first step is to create a new administrator account on your site - with a non-obvious user name and strong password.
If there are no other users on your site [eg. editors (who can post blogs but have no administrative rights)], your new user will be number 2.  [Note: your new user cannot have the same email address as the old one - just put anything in for now.]

Now log out and log in as the new administrator.  Go to the User account and delete the old admin account.
You can (and must) now edit your new account to put in a meaningful email address.
So you think you are now secure?  Not really!

Under the old system, they could still try appending ?author=2 and so on to get admin usernames...
But there is another way hackers can get these names: by appending /wp-json/wp/v2/users to the site URL.  This will give a terse listing of all users who have posted to that site in the form:

[{"id":nn,"name":"User's Name","url":"","description":"","link":"https:\/\/hackablesite.com\/author\/user's_login_name", "slug":"user's_login_name","meta":[],"_links":{"self":[{"href":"https:\/\/hackablesite.com\/wp-json\/wp\/v2\/users\/nn"}], "collection":[{"href":"https:\/\/hackablesite.com\/wp-json\/wp\/v2\/users"}]}}]

This divulges the ID (shown as "nn" above), full name and login name of anyone who has posted.  Moreover, there is a host of other things they can see on your site - most of which would be visible anyhow on a normal visit, but some of which you might want to keep private (these might normally be protected by a password).  This is all part of the REST API, which is accessed through the virtual subdirectory /wp-json.  It is an extension to Wordpress that gives developers remote access to a Wordpress site - see Wordpress' documentation for more.

Now they've got your username, all they need to do is blast your site with multiple guesses at your password - which is dead easy to do with readily available software.  The more obtuse your password is, they longer it will take, but they're likely to succeed eventually.  This can be negated to some extent by using a 'limit login attempts' plugin.  As its name implies, if anyone (yourself included) gets the password wrong after a set number of attempts (say 4), they are locked out for (say) 20 minutes.
If they get locked out more than (say) 4 times, they can be locked out for much longer - preferably at least a week.
However, this is usually achieved by monitoring the hacker's IP address - but if the hacker is smart enough, he will have loads of IP addresses at his disposal and simply cycle through them....

Very few sites need this 'REST API', which in my view is an overall liability (just as the networking facility in Windows is to a standalone machine).  There are a number of plugins (eg. Wordfence) that prevent unauthorised access, and it's also possible to 'un-add' the REST API with code in your functions.php file.  To be fair to Wordfence, it does other things as well, including limiting login attempts and checking integrity of Wordpress core files.  More about how Wordfence's access restrictions work here.

However, these just add extra bloatware.  In my view, it is far simpler to deflect any attempt to access /wp-json or any of its subdirectories to your site's standard error page with a one-liner in your .htaccess file (assuming an Apache server):

RedirectMatch 301 ^/wp-json.* /error404

This uses a 'Regular Expression' (RegEx).  The ^ looks for any string beginning /wp-json.  The .* includes any characters that follow it, so it will also catch (eg.) /wp-json/wp/v2/posts and /wp-json/wp/v2/users/1.
Any match will redirect to /error404, which could be your error page or, if it doesn't exist, it will go to your usual error page.


There is another vulnerability if your site has a password-protected members' section.
If anyone types  www.hackablesite.com/?s=wombat , the word 'wombat' will be found in the supposedly protected section if it exists, listing the entire article.  This still happens as of V6.1.

So, if you don't want the Great Unwashed coming to your society's barbecue, use the code below to disable this other Wordpress silly.  If you don't need the inbuilt search engine, you should really disable it (a good start is to modify your theme's 404.php file to remove any statement like  get_search_form()  ).
I can't see why the Wordpress search facility is still there - Google does a better job...
[But Google can still find your private information unless you hide it.
This is best done by using a   robots.txt   file in your root folder.  More on this here.]

Joomla!:  A serious vulnerability that has existed ever since Joomla! was! first! released! was! identified! mid-December! 2015!  (No apologies for the exclamation marks - Joomla(!) clearly has the Yahoo(!) disease!) 
This enabled hackers to inject malicious code via the URL bar and also via the browser information that many sites use to serve appropriate pages for a particular browser (eg. a mobile version of the site if the browser says it's an iPhone).  The latter is done by falsifying this information.  All versions of Joomla up to and including 3.4.5 are vulnerable.

Joomla immediately offered an update, but this was so serious they offered updated versions of the affected file (session.php) for V1.5 and V2.5.  Our news site was obviously a target, so I took it down immediately and updated its session.php - but I felt I had to do more.


Hence the following code, which catches any attempt to inject code this way, and also addresses the Wordpress 'search' vulnerability.  It is all in PHP, and should appear in your site's root folder.
To do this, copy all the following and paste it into Notepad (or similar).
Save the file as (eg.) chackit.php.  [I chose this name from check it for hacking.  'Chackit' is also Scots slang for 'drunk'!]

Note 1: in PHP, anything enclosed within /*....*/ is a comment.  In the following, these say what each part does.  Although the following might seem quite long (it's about 4k bytes), it won't noticeably increase the load time of your pages.

Note 2: I had to be somewhat devious because $_GET['s'] will return a null string if 's' is not specified and also if 's' is specified but no value is given.  So the original attempt if (!empty($_GET['s']) didn't catch  mysite.com?s
Remedy: use the ?? operator to catch only where 's' is not specified - and set it to 'Boo!!' (or whatever you like - it will never appear on your page).
NB: As of PHP 8, the ?? operator (which specifies a value that should be used if the variable before the ?? isn't declared) is mandatory, else it will put up a warning.
Then we can check that if 's' is not 'Boo!!', the potential hacker would need to know the 'wombat' code to get in.

<?php
/**********  Written by Steve Glennie-Smith  31/10/2016. ************
Catch any attempt to inject malware through User Agent or URI strings.
Die if found.
This file should be PHP include(ed) right at the beginning of the root and any
admin index.php files.
Note: It just checks these strings and does NOT destroy their contents.
Also catch the glaring security risks in Wordpress that
  1) fire up the search engine ( /?s=search_text ), which might find something
     we don't want it to, and
  2) divulge user names ( /?author=nn - not a problem since V5(?) but kept for completeness).
Allow an override [eg. if (wombat == 6264)].
Choice of 'death': simulate error 403 (forbidden), 404 (not found) or
whatever you like - or nothing at all (which will display a blank page).
[The following $err403 & $err404 look the same as standard Apache error messages.]
**********************************************************************/

$err403 = '<html><head><title>403 Forbidden</title></head>
	<body><div style="text-align:center"><h1>403 Forbidden</h1><hr><p>nginx</p>
	</body></html>';

$err404 = '<html><head><title>404 Not Found</title></head>
	<body><h1>Not Found</h1><h3>The requested document was not found on this server.</h3><hr />
	<p><i>Web Server at themarkettheatre.com</i></p></body></html>';

/*	No need to check for known bad IPs - now done in .htaccess.
	Though add something like the following to check it works... */
//	if ($_SERVER['REMOTE_ADDR'] == '92.28.206.235') die ($err403);

/*	Wordpress only, but does no harm in Joomla.  If they try to get the
	administrator name or search for something in 'private' pages, give them
	an Error 404 - Not Found.  Allow access to this information with a code,
	so you can get at it by typing (eg.):  yoursite.com/?s=exterminate&wombat=6264
	Now if 's' is anything other than 'Boo!!' (including [null]), 'wombat' must be 6264 to use WP's search facility.
	'author' is kept for reference, but later versions of WP return Error 403 anyhow.	*/
$s = $_GET['s'] ?? 'Boo!!';
$wombat = $_GET['wombat'] ?? '';
if ( (!empty($_GET['author']) || ($s !== 'Boo!!')) && $wombat !== '6264')
	die($err404);


/*	Check length of User Agent string: usually 80 to 90 chars long,
	but Crapple phones can be over 130.
	Block if > 256 chars by dying...   zzzzz....  */
$htua = $_SERVER['HTTP_USER_AGENT'];
if (strlen($htua) > 256)
	die('zzzzz....');

//	Similar for URI string...
$rqur = $_SERVER['REQUEST_URI'];
if (strlen($rqur) > 256)
	die('zzzzz....');

//	Concatenate them to include a spacer known to us for further manipulation...
$checkit = $htua . '&####;' . $rqur;

//	Remove all % chars trying to escape themselves (%25 --> %)
$count = 0;
while (strpos($checkit, '%25') !== false)	{
	$checkit = str_replace('%25', '%', $checkit);
	$count += 1;
	if ($count > 50) die('Stuck');	// get out if we're stuck in a loop
}

/*	Remove all real and HTML escaped whitespace, incl. non-printing chars
	(Last two entries on first line below are: real tab - char 0x09 and   - char 0xA0)  */
$whitespace1 = array (' ', '+', '%A0', '	', ' ',
	'%00', '%01', '%02', '%03', '%04', '%05', '%06', '%07',
	'%08', '%09', '%0A', '%0B', '%0C', '%0D', '%0E', '%0F',
	'%10', '%11', '%12', '%13', '%14', '%15', '%16', '%17', '%7F',
	'%18', '%19', '%1A', '%1B', '%1C', '%1D', '%1E', '%1F', '%20', );
$checkit = str_ireplace($whitespace1, '', $checkit);

//	And decode the remainder, which will be printing chars...
$checkit = urldecode($checkit);

//	If they try encoding printing chars in octal, decode them here...
$octs = array ( '\041', '\042', '\043', '\044', '\045', '\046', '\047',
	'\050', '\051', '\052', '\053', '\054', '\055', '\056', '\057',
	'\060', '\061', '\062', '\063', '\064', '\065', '\066', '\067',
	'\070', '\071', '\072', '\073', '\074', '\075', '\076', '\077',
	'\100', '\101', '\102', '\103', '\104', '\105', '\106', '\107',
	'\110', '\111', '\112', '\113', '\114', '\115', '\116', '\117',
	'\120', '\121', '\122', '\123', '\124', '\125', '\126', '\127',
	'\130', '\131', '\132', '\133', '\134', '\135', '\136', '\137',
	'\140', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
	'\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
	'\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
	'\170', '\171', '\172', '\173', '\174', '\175', '\176', );

$chas = array ( '!', '"', '#', '$', '%', '&', '/',
	'(', ')', '*', '+', ',', '-', '.', '/',
	'0', '1', '2', '3', '4', '5', '6', '7',
	'8', '9', ':', ';', '<', '=', '>', '?',
	'@', 'A', 'B', 'C', 'D', 'E', 'F', 'G',
	'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
	'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
	'X', 'Y', 'Z', '[', '\\', ']', '^', '_',
	'`', 'a', 'b', 'c', 'd', 'e', 'f', 'g',
	'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
	'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
	'x', 'y', 'z', '{', '|', '}', '~', );

$checkit = str_replace($octs, $chas, $checkit);

/*	Remove all hex and octal escaped whitespace, incl. non-printing chars
	Cannot remove \xH yet where H is a single HEX digit in case they've included
	(eg. ) \x62 (= 'a')  */
$whitespace2 = array(' ', '+', '\xA0', '\240', '	', ' ', '\177',
	'\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017',
	'\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027',
	'\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037', '\040',
	'\00', '\01', '\02', '\03', '\04', '\05', '\06', '\07',
	'\10', '\11', '\12', '\13', '\14', '\15', '\16', '\17',
	'\20', '\21', '\22', '\23', '\24', '\25', '\26', '\27',
	'\30', '\31', '\32', '\33', '\34', '\35', '\36', '\37', '\40',
	'\0', '\1', '\2', '\3', '\4', '\5', '\6', '\7',
	'\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07',
	'\x08', '\x09', '\x0A', '\x0B', '\x0C', '\x0D', '\x0E', '\x0F',
	'\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x7F',
	'\x18', '\x19', '\x1A', '\x1B', '\x1C', '\x1D', '\x1E', '\x1F', '\x20',
	'\a', '\e', '\t', '\v', '\f', '<br>', '<br />', '<br/>', '\n', '\r', );
$checkit = str_ireplace($whitespace2, '', $checkit);

/*	In case they slipped in PHP hex escape codes of any printing characters,
	get them back as HTML codes and decode them	*/
$checkit = urldecode(str_ireplace('\x', '%', $checkit));

//	Then convert any remaining % chars back to \x to remove single char \x sequences
$checkit = str_ireplace('%', '\x', $checkit);
$whitespace3 = array(
	'\x0', '\x1', '\x2', '\x3', '\x4', '\x5', '\x6', '\x7',
	'\x8', '\x9', '\xA', '\xB', '\xC', '\xD', '\xE', '\xF', );
$checkit = str_ireplace($whitespace3, '', $checkit);


//	Catch attempts to inject eval(base64_()) scripts through the user agent or URI.
if ( (stripos($checkit,'base64_') !== false) || (stripos($checkit,'chr(') !== false)
  || (stripos($checkit,'eval(') !== false)   || (stripos($checkit,'eval\c') !== false)
  || (stripos($checkit,'sqli') !== false)    || (stripos($checkit,'$_SER') !== false)
  || (stripos($checkit,'strrev') !== false)  || (stripos($checkit,'rot13') !== false)
//	These have been tried before - don't look kosher...
  || (stripos($checkit,'Sqworm') !== false)  || (stripos($checkit,'link114.cn') !== false) )
	die($err404);

/******* End of malware checking *************/
?>

Now you have created your  chackit.php  file and uploaded it to your site's root folder,
you will have to modify your root  index.php  file thus:-

<?php

/* Joomla or Wordpress introductory blurb */

/* Insert the following before any code in this file */
include 'chackit.php';

/* Leave the rest of the code alone...  */

You should also modify your admin  index.php  file similarly.
This will normally appear in folder /wp-admin:-

<?php

/* Joomla or Wordpress introductory blurb */

/* Insert the following before any code in this file.
NB: Since the file to be include(ed) is in the root, you need to use ../ (up one level) to show where it lives */
include '../chackit.php';
/* Leave the rest of the code alone...  */

NB.  If you update your CMS (which you should, as soon as an update becomes available), your modified  index.php  files will be overwritten (but  chackit.php,  being an extra file that your CMS doesn't know about, will be left alone).  So keep copies of your modified  index.php  files named (say)  index_modified.php  in the appropriate folders, to write back as necessary.
NB2.  The admin and root  index.php  files will almost certainly be different.  Don't overwrite with the wrong one!

Another way to ensure these files are not overwritten by an update is to make them read-only.  Be aware that if you let Wordpress update itself, it will throw an error when it can't overwrite them.  You can get round this by updating manually using (eg.) SFTP.

Done?  Not really....

Some persistent hackers can still gain access by creating a new supervisor account - probably because your hosting provider isn't running a properly secured server.  I had this problem with 123Reg, though they wouldn't admit it!  The problem is, few will offer anything other than the standard FTP transfer protocol.  If you want SSH or SFTP, most will try to sell you a VPS (Virtual Private Server) - and many charge megabucks for it.  I found Ionos wasn't too expensive: if you're hosting several sites, the cost can be spread between them.


A further line of defence is   .htaccess   - a file that is usually present on any Apache server (most, fortunately, are).  Both Wordpress and Joomla need this file (containing some of their own code) to be present in the root folder, but any sub-folder may also contain a .htaccess file.  Just be careful how you modify it, otherwise your entire site could crash!

Here is my Wordpress root folder   .htaccess   file.  Modifying a Joomla .htaccess file is similar - just add the section beyond the Wordpress-specific section.

Note 1: It is not advisable to remove standard Wordpress (or Joomla) files.
It is better just to deny access as below to those you don't want the Great Unwashed to see.

Note 2: Anything following the # sign is a comment but, unlike PHP, the # character must be the first on the line.

############### Start of standard Wordpress entries #############
AddType application/x-httpd-php56 .php .php5

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress
############### End of standard Wordpress entries #############

### The following standard Wordpress files can provide hackers
### a back door into your site.  Don't allow direct access.
<files wp-config.php>
# Apache 2.4 syntax:
	Require all denied
# Apache 1.2 syntax - use these instead if needed:
#	order deny,allow
#	deny from all
</files>

# This one is used for blogs.
# If you don't use blogs, don't allow access...
<files xmlrpc.php>
# Apache 2.4 syntax:
	Require all denied
# Apache 1.2 syntax - use these instead if needed:
#	order deny,allow
#	deny from all
</files>

<files wp-login.php>
# Only allow login to back end from your users' known 'good' IP addresses.
# This MUST tally with  wp-admin/.htaccess
# (A new file you must create)
# Apache 2.4:

	Require ip 2.96.0.0/13
	Require ip 2.120.0.0/13
	Require ip 2.216.0.0/13
# etc.
# Or Apache 1.2:
# 	order allow,deny
# This means Deny from all except the following:
# (must NOT precede with 'deny from all'')
# If you have a fixed IP address and only you have access to WP's
# back end, only allow from that and disregard the following.
# Otherwise, these are common UK addresses - most hackers are from
# outside the UK.  The number following a / gives a range of IP
# addresses as a 'CIDR' code.  Plenty of references to that on Google...
#	allow from 2.16.0.0/12
#	allow from 2.96.0.0/13
#	allow from 2.120.0.0/13
#	allow from 2.216.0.0/13
#	allow from 62.24.128.0/17
# etc.

# Can have exceptions within any block above:
# (Have to use old syntax - recommended Apache2.4 doesn't work):
	deny from 146.0.0.0/9
</files>	

# Block entire site from these...
# Start with 'foreign' bots:
SetEnvIfNoCase User-Agent "^Baidu*" bad_bot	#the main Chinese one
SetEnvIfNoCase User-Agent "^Sogou*" bad_bot	#another Chinese one
SetEnvIfNoCase User-Agent "^Yandex*" bad_bot	#the main Russian one
SetEnvIfNoCase Referer "\.ru|\.ua|\.by|\.cn" bad_ref
	#any Russian, Ukrainian, Belarusian or Chinese domain

<RequireAll>
    Require all granted
    Require not env bad_bot
    Require not env bad_ref
    Require not ip 46.116.0.0/14
# etc.
</RequireAll>

# or if Apache 1.2:
#	order allow,deny
#	allow from all

# This one has hacked my site - it's listed as either in Canada or
# Hong Kong, so no real harm done if a legitimate user from either
# country can't see this UK theatre's website
#	deny from 47.80.0.0/12


### The following will redirect any attempt to look at pages to which
### you don't want the public to have direct access...
# Redirect any attempt made by outsiders to comment
Redirect 301 /wp-comments-post.php /error404
Redirect 301 /wp-trackback.php /error404

# Likewise 'events' or anything directed from it
RedirectMatch 301 ^/event.* /error404
RedirectMatch 301 ^/comment.* /error404
RedirectMatch 301 ^/feed.* /error404
# And close this one, which could divulge admin names
RedirectMatch 301 ^/wp-json.* /error404

# Or to look at files like the WP licence etc....
Redirect 301 /readme.html /error404
Redirect 301 /license.txt /error404

Finally - the   robots.txt   file.  This must live in your site's root folder. The format is as follows:
(NB. You cannot include comments in this file.)

User-agent: *
Disallow: /my*/
Disallow: /private_folder/
Disallow: /folder1/something.htm
User-agent: googlebot
Disallow: /folder2/private_files/

What this does is: The first line tells all search engines to do as directed in the next three lines.
Wildcards (*) are allowed: ie. line 2 says don't crawl any folder starting 'my', eg. /my_photos/ or /my_car/
The fifth line tells Google (only) not to crawl line 6 (which is a folder two layers down)

Note 1. Most search engines comply with this file, so if you don't want your barbecue appearing on Google, 'Disallow' the relevant folder.  However, there is no guarantee all search engines will comply (hence good practice to disallow some foreign bots completely, as in .htaccess as above).

Note 2. When excluding any folder, all its sub-folders are also excluded.
However, since line 4 refers to a specific file, other files in that folder and all sub-folders will be crawled.

####################### ends ########################