<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    weidagang2046的專欄

    物格而后知致
    隨筆 - 8, 文章 - 409, 評(píng)論 - 101, 引用 - 0
    數(shù)據(jù)加載中……

    Anti-Crawler Script

    Description

    An ASP script which can be adapted to keep specific crawlers (robots/spiders) out of your ASP-based website, or you can apply the rules to single a page. The latest version of the script allows you to filter sources by both user-agent (either full matching or partial/wildcard matching via regular expressions) or by IP address (again either full or partial matching is supported).

    Although the script ships with a default set of rules installed to allow it to work reliably without any changes, you can quite easily remove these rules if you don't like them - creating your own powerful set of rules is remarkably easy.

    Why ban a crawler / spider / application from accessing your website? Lots of reasons - sometimes it's obvious they are only using your site to harvest email addresses, other times they might be using up excessive amounts of resources or going into areas they are supposed to be excluded from. Banning them from the site keeps them from your data, it also helps reduce the bandwidth they consume - this is even more true as if their first request is denied they cannot discover & retrieve the rest of your site.

    What sort of crawlers / spiders / applications are we talking about? Most of the time the elements people want to ban aren't associated with any major search engines, and for those few that do offer a search service I doubt they bring very little (if any) traffic to the sites they crawl. The majority of the elements that get banned really only exist to further their own goals and wont help your site in the long run.

    Requirements

    • IIS
    • RegExp object

    Single Compressed Download

    Individual Components

    Installation & Setup

    1. Save the source code file into a directory somewhere within your webroot - for any examples we've assumed the resulting file is called denycrawler.asp.
    2. Include the code into either a pre-existing common include file or a single page (e.g. <!-- #INCLUDE VIRTUAL="/myfolder/denycrawler.asp" -->). As the file doesn't include any code which will run automatically placement above or below existing includes shouldn't be an issue.
    3. Finally, call the function DenyCrawler() from within your include or page. In order to function correctly this needs to be called before any headers or page content is written - this ensures that if it needs to deny a request it can respond with a minimal page complete with an explanation.

    User Guide

    The majority of this script relies on pattern matching, specifically regular expressions which I've gathered over time based on historical traffic for this site. While the default ruleset isn't perfect it allows most users to immediately use the script - if you feel comfortable with writing regular expressions or just want to stop one specific source of requests feel free to erase the defaults.

    If you need to test the deny function for yourself on a development system then just add an extra line into BadUA_Test - for example UA_Add "Mozilla", sUserAgentList will match the majority of browsers allowing you to view the deny screen in action - however don't try this on a live system because you'll block all traffic!

    Equally if you ever want to ban an IP address then it works in much the same way - there are several examples listed inside BadIP_Test should you need to attempt this.

    Now onto the technical part.

    If you ever need to get involved with writing a lot of complex rules the main thing to remember is in order to save time the script merges all the unique regular expressions into one large expression and uses the OR logical operator to combine them allowing one rule to be used rather than cycling through several different rules. However coding the data into the script in this way would make it hard to read and even harder to maintain, so instead UA_Add is used to build these strings on-the-fly. It has two parameters - your regular expression string followed by the variable being used to hold the combined string.

    The startpoint for the script is DenyCrawler() which uses EmptyUA_Test, BadUA_Test and BadIP_Test to determine if this request should be served, these rules are checked in the order listed above.

    EmptyUA_Test is just a simple piece of logic which checks if an empty or single-character user-agent string is being used, if that comes back negative BadUA_Test is called.

    BadUA_Test checks if the current user-agent string matches any of the elements in a list of regular expressions, this provides flexibity to use exact matches, partial matches or any other type of pattern you're capable of creating.

    BadIP_Test provides an IP filtering element, working in a similar way to BadUA_Test in as much as it takes a series of regular expressions which describe the IP addresses you want to ban. There are no default rules included in this function, it's designed to allow a user the ability to filter out an element of their traffic with a high level of accuracy - something that wasn't possible with just the user-agent based tests.

    Related Links

    • Defend your e-mail address - an ASP script which uses the same type of ruleset in conjunction with other filters to make it a lot harder for e-mail harvesters to scrape your e-mail address off your webpages.
    Evolved
    Code
    ASP, SQL & VB meet the internet.

    Navigate

    Home Parent Directory Meta-Search

    Technical

    ASP Scripts SQL Scripts VB Programs Show All

    Guides

    Show All

    Other

    Contact Site News About Legal Sitemap Links

    posted on 2006-12-18 21:42 weidagang2046 閱讀(982) 評(píng)論(0)  編輯  收藏 所屬分類: Search Engine

    主站蜘蛛池模板: 久久久久久AV无码免费网站下载 | 精品视频一区二区三区免费| 猫咪www免费人成网站| 亚洲熟妇无码一区二区三区导航| 亚洲国产成人综合| 亚洲成人黄色网址| 亚洲伦理一二三四| 国产亚洲精品影视在线| 一本色道久久88亚洲精品综合| 亚洲av无码国产综合专区| 亚洲中文字幕人成乱码| 亚洲人成图片网站| 亚洲中文字幕无码mv| 亚洲AV无码专区在线电影成人| 亚洲免费综合色在线视频| 欧美日韩亚洲精品| 无码 免费 国产在线观看91 | 久久久久久a亚洲欧洲AV| 久久精品亚洲视频| 亚洲成人福利网站| 亚洲综合av一区二区三区| 韩国亚洲伊人久久综合影院| 日韩免费码中文在线观看| 久久久精品国产亚洲成人满18免费网站 | 日韩免费福利视频| 亚洲天堂中文字幕在线| 亚洲爆乳无码专区| 亚洲欧洲国产视频| 亚洲av无码兔费综合| jizz18免费视频| 秋霞人成在线观看免费视频| 亚洲视频在线观看免费视频| 免费人成在线视频| 亚洲精品成人在线| 久久91亚洲精品中文字幕| 亚洲高清一区二区三区| 日亚毛片免费乱码不卡一区 | 特级av毛片免费观看| 久久99热精品免费观看动漫| 国内免费高清在线观看| 亚洲狠狠爱综合影院婷婷|