htaccess redirects and rewrites and robots.txt optimisation – Updated 2015

Robots.txt optimisation

When we first wrote this post back in 2011 we outlined best practice concerning the robots.txt file, blocking un-relavant parts of file structures to prevent search engines from indexing certain parts of a site, this was considered best practice until late 2014 when Google released the Fetch and Render Tool, from this point blocking parts of your site such as the theme and plugin folders can have negative effects on your site. Google now give guidelines to allow bots to crawl parts of the site which contain CSS, Scripting and anything else which can alter the appearance of a website to a user.

As a result of these we generally leave the robots.txt to bare basics and adjust it per situation, but for reference here is our base robots.txt which blocks some resource hogs, scanners and potentially bad bots.

User-agent: *

Disallow: /cgi-bin/

Sitemap: https://www.YOURDOMAIN.co.uk/sitemap_index.xml

User-agent: MJ12bot
Disallow: /

User-agent: Yandex
Disallow: /

User-agent: moget
User-agent: ichiro
Disallow: /

User-agent: NaverBot
User-agent: Yeti
Disallow: /

User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /

User-agent: sogou spider
Disallow: /

User-agent: YoudaoBot
Disallow: /

user-agent: AhrefsBot
disallow: /

Redirect non Www to Www. :

RewriteEngine On

RewriteBase /
RewriteCond %{HTTP_HOST} ^yourdomain.com
RewriteRule (.*) http://www.yourdomain.com/$1 [R=301,L]

Redirect Www to non Www :

RewriteEngine On

RewriteBase /
RewriteCond % ^www.yourdomain.com [NC]
RewriteRule ^(.*)$ http://yourdomain.com/$1 [L,R=301]

3. Protecting your .htaccess file

You really dont want people looking at your .htaccess file so use this to block them.


order allow,deny
deny from all

Redirect Dedicated IP to domain ( duplicate content fix )

# IP TO DOMAIN REDIRECT
RewriteCond %{HTTP_HOST} ^211\.122\.10\.10$
RewriteRule ^(.*)$ https://www.yourdomain.co.uk/$1 [L,R=301]

Redirect http to https/SSL

RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule ^(.*) https://%{SERVER_NAME}/$1 [R,L]

Single Page 301 htaccess redirect

Redirect 301 /oldfileorurl.php https://www.domain.co.uk/new-page

Redirect entire directory and child URLs to one page or domain

redirectMatch 301 ^/old-directory/ https://www.newdomain.co.uk/new-directory

Enable Gzip Compression


AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/x-httpd-php
AddOutputFilterByType DEFLATE application/x-httpd-fastphp
AddOutputFilterByType DEFLATE image/svg+xml
SetOutputFilter DEFLATE

Expires Header caching (Leverage Browser Caching)


ExpiresActive On
ExpiresByType image/jpg "access 2 week"
ExpiresByType image/jpeg "access 2 week"
ExpiresByType image/gif "access 2 week"
ExpiresByType image/png "access 2 week"
ExpiresByType text/css "access 2 week"
ExpiresByType application/pdf "access 2 week"
ExpiresByType text/x-javascript "access 2 week"
ExpiresByType application/x-shockwave-flash "access 2 week"
ExpiresByType image/x-icon "access 2 week"
ExpiresDefault "access 2 week"

Prevent directory listing / Browsing

IndexIgnore *

WordPress Hardening

Below are a couple of htaccess additions which will help harden a WordPress based website.

Protect wp-config.php


order allow,deny
deny from all

Secure / Harden wordpress includes folder


RewriteEngine On
RewriteBase /
RewriteRule ^wp-admin/includes/ - [F,L]
RewriteRule !^wp-includes/ - [S=3]
RewriteRule ^wp-includes/[^/]+\.php$ - [F,L]
RewriteRule ^wp-includes/js/tinymce/langs/.+\.php - [F,L]
RewriteRule ^wp-includes/theme-compat/ - [F,L]