Robots.txt file Role in Search Engine Optimization

Sometime it becomes need to ask search engine crawler/Spider to visit these files and not to visit these files. If you do so then the spider of Search Engine will crawl only those files which have permision to be visit. And this is done by using the Robots.txt file.

Robots.txt file and Search Engine Optimization

Robots.txt file simply tell the Search Engine which directories, webpages to visit and include in their search results and which to avoid.

Search engines find your web pages and files by sending out robots (also called bots, spiders or crawlers) that follow the links found on your site, read the pages they find and store the content in the search engine databases.

Usually when the Googlebot finds a page, it reads all the links on that page and then fetches those pages and indexes them. This is the basic process by which Googlebot “crawls” the web.

But in some cases you may have directories and files you would prefer the search engine robots not to index. The one main reason of doing this is availiabilty of same content/text over other pages.

How to Develop robots.txt file?

Robots.txt should be a plain ASCII text file. You can develop robots.txt file by using notepad as editor.

Syntax of robots.txt file

Here is the basic syntax of the robots.txt file:

User-Agent: [Spider Name]

Disallow: [File Name]

Examples

User-agent: *

Disallow: /music/

Disallow: /chat/

Disallow: /

The first line means that each and every robot can visit the website, crawl the website and the below line instructions for all robots.

The second line means that robots cannot visit/ crawl the music directory.

The third line means that robots cannot visit/crwal chat directory(folder).

The fourth line means that robots cannot visit/ crwal entire website, it means entire website is block for robots.

If you want to block search engine spiders from crawling your site, you should make it password protected. The search engines have been known not to respect the robots.txt files from time to time.

Blocking Single File

Disallow: /my_file.php (This is the file placed in root directory and we asked crawler not to visit this file)

Disallow: /directory/file.html

Placement of Robots.txt File

You must place robots.txt file in your root directory. it is reccomendation and it must be follow.

Frequently Asked Questions about robots.txt file

1) Is Robots.txt file is necessary for Search Engine Optimization?

2) Without Robots.txt file can't we peform Search Engine Optimization?

The answers of these questions are simply NO because as described above that robots.txt file just instruct the crawler which pages the crawler must not be crawl. If you want from Spider/Crawler to crawl over all files then robots.txt file is not necessary element because normally SEO expert use robots.txt to instruct the spider not to visit some files mentioned by him, it means if SEO Expert instruct not to visit some of files then Spider/ Crawler will visit other files.

Why spider/Crawler should crawl over our websites

The answer is very simple, because if Spider/Crawler will not crwal over your website then it will not be listed in the search engine lists against any search query provided by user

OR

If Spider/ Crawler will not crawl your website then in Google and in other search engine if user do any search against his search criteria then search engine will not display your website because your website is not in the database of Search Engine, hence proved, Spider/Crawler /Robots/Bots whatever u say, provide search engine information to your website and then search engine will put website in its database and then agiast any search it will display your website. But its not gurrantee that search engine will show your website in first page after crwaler crawls your website because you will have to perform both On Page Optimization and Off Page Optimization in order to put your website up in the ranking of search engine.

Robots.txt file for domains and sub domains
For websites with multiple sub-domains, each sub-domain must have its own robots.txt file. Because it is the requirement as each sub domains has its own files so listing of these files are very necessary.

No comments: