web crawlers and bots

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • le pire
    Senior Member
    • Mar 2001
    • 1113

    web crawlers and bots

    I was looking in the Server log for my web report and found these to URL's accessing my site:





    Does anyone know about things like this, and how to keep them out?


    étienne
  • Steven Ragatz
    Senior Member
    • Feb 2001
    • 493

    #2
    From the link you posted:

    Q: How can I completely exclude TurnitinBot from my site?
    To exclude TurnitinBot from all or portions of your site all you have to to do is create a file called robots.txt and put it in the top most directory of your web site.
    Below is an example of a robots.txt file which exludes ONLY our robot from a portion or all of your site.

    #This is an example robots.txt file
    User-agent: TurnitinBot
    Disallow: /hide/ #Will disallow any url starting with /hide/


    #This is an example robots.txt file
    User-agent: TurnitinBot
    Disallow: / #Will disallow all urls on your site

    Another alternative is to contact us with your domain name and all its aliases (for instance www.somewhere.com, www2.somewhere.com, somewhere.com, etc.) and we'll add them to our blacklist of sites to avoid.

    Steve

    Comment

    • le pire
      Senior Member
      • Mar 2001
      • 1113

      #3
      Steve,

      Yeah I saw that, I was wondering if there was some way to keep other bots out-- bots that may not leave a calling card like "turnitin.com" and "nameprotect.com".

      I was wondering also if anyone knew of other kinds of bots and if they pose any kind of threat...


      étienne

      Comment

      • Jim
        Administrator
        • Dec 2000
        • 1096

        #4
        You can put a line of code in your <head> meta tags... check out this page:



        This will keep all robots from indexing your site, but that includes search engine robot, too. It's up to you.

        Jim

        Comment

        Working...