![]()
For those of you out there who like me don’t know a great deal about the techie side of the internet then this explanation is for you.
I have read about the robots.txt file but it has never really sunk in as to what it actually means, so I thought the only way I would ever understand it was if I was to do a bit of research and write about it. As you will see by now, I love to research stuff.
So this guide will explain what the robots.txt file is in simple terms (because that’s the only way I can understand it) and I will provide a step-by-step to creating your own robots.txt file.
So first of all, what exactly is a robots.txt file? Well, as the file extension indicates it is a text file. This means a plain text file and not html.
What does the robots.txt file do?
It instructs the search robots to not index certain web pages, files or directories in their search engine. This is useful if you have certain pages that you want on the internet but you don’t want the rest of the world to see them.
Bear in mind however, that this won’t keep a web page completely private, it just means that the page won’t show up in search engines results. Anyone who has a browser and knows the url or can find the url through other means will still be able to see it.
Also keep in mind that not all search engines acknowledge the robots.txt file and will subsequently list your page in their search engine anyway.
The best way to stop someone from seeing or accessing a page on your website is to password protect it.
How do you create your own robots.txt file?
1. Determine which files or directories you want excluded.
2. Open Notepad (you can access this by clicking START> Programs> Accessories> Notepad) and type in the following information from the table below depending on what you want to do:[TABLE=4]
3. Save this file as robots.txt.
4. Now you need to upload this file to the root directory of your website. I use Filezilla to upload files to the internet. The file needs to be sitting in the same folder as your home page (index.htm). In other words it should look like this when uploaded: www.mydomain.com/robots.txt
Let’s use an example here just to be clear how this is done. I have a page on my website which I want to exclude from the search engines. The file name is: www.mydomain.com/mypage.html.
I would now open up Notepad and type in the following text based on the table above:
User-agent: *
Disallow: /mypage.html
If the page was sitting in a directory called ‘mydirectory’ ie.www.mydomain.com/mydirectory/mypage.html then I would type in this:
User-agent: *
Disallow: /mydirectory/mypage.html
Once that information is entered into Notepad I save the file on my computer as robots.txt.
I would then open up Filezilla which is the file transfer system I use to upload my web pages to the internet and transfer the robots.txt file over to wherever my website pages are stored.
Too easy!…well hopefully you found it easy to do. You would need some idea of how to load up files to the internet to do it but if you can do that bit everything else should be quite straight forward.