robots.txt usage guide

robots.txt is a file with web server access restrictions for search engines.
It should be located only in the root directory of the site, and contain only lowercase letters, that is, “robots.txt”, and not “Robots.txt”.

I will give a few examples.
To deny access to all search engines, that is, to ensure that the site is not indexed, specify in robots.txt:

User-agent: *
Disallow: /

Allow indexing to all:

User-agent: *
Disallow:

Suppose we want to tell search bots that we do not need to index some directories, for example ixnfo.com/dir/ and ixnfo.com/dir2/, for this we indicate in robots.txt:

User-agent: *
Disallow: /dir/
Disallow: /dir2/

Note if for example specify:

Disallow: /dir

Then we will not allow indexing the directory ixnfo.com/dir/, as well as other files with the name dir, for example ixnfo.com/dir.php, etc.

An example of prohibiting file indexing:

Disallow: /file1.html
Disallow: /file2.html

In the User-agent, you can specify the name of the search bot, thereby you can define the rules for each separately, for example, to allow the Googlebot, AdsBot-Google and Yandex robots to index the site, and forbid everyone else to:

User-agent: Googlebot
User-agent: AdsBot-Google
Disallow:

User-agent: Yandex
Disallow:

User-agent: *
Disallow: /

At the end of the robots.txt, you can also specify the path to the site, for example:

Sitemap: https://ixnfo.com/sitemap.xml

See also my article:
Access Control Apache2

Leave a comment

Leave a Reply

Discover more from IT Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading