动态缓存:(x)是在新内容发布以后,q不预先生成相应的静态页面,直到对相应内容发求时Q如果前台缓存服务器找不到相应缓存,向后台内容理服务器发求,后台pȝ?x)生成相应内容的静态页面,用户W一ơ访问页面时可能?x)慢一点,但是以后是直接讉K~存了?br />如果去ZDNet{国外网站会(x)发现他们使用的基?a href="javascript:if(confirm('http://www.vignette.com/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.vignette.com/'" tppabs="http://www.vignette.com/">Vignette内容理pȝ都有q样的页面名Uͼ(x)0,22342566,300458.html。其实这里的0,22342566,300458是用逗号分割开的多个参敎ͼ(x) W一ơ访问找不到面后,相当于会(x)在服务器端生一个doc_type=0&doc_id=22342566&doc_template=300458的查询, 而查询结果会(x)生成的缓存的静态页面:(x)0,22342566,300458.html
最q看到很多关于面向搜索引擎URL设计优化(URI Pretty)的文章,提到了很多利用一定机制将动态网参数变成像静态网늚形式Q?br />比如可以:(x)http://www.chedong.com/phpMan.php?mode=man¶meter=ls 变成Q?a href="javascript:if(confirm('http://www.chedong.com/phpMan.php/man/ls \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.chedong.com/phpMan.php/man/ls'" tppabs="http://www.chedong.com/phpMan.php/man/ls">http://www.chedong.com/phpMan.php/man/ls
而在IIS也有相应的REWRITE模块Q比?a href="javascript:if(confirm('http://www.isapirewrite.com/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.isapirewrite.com/'" tppabs="http://www.isapirewrite.com/">ISAPI REWRITE?a href="javascript:if(confirm('http://www.qwerksoft.com/products/iisrewrite/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.qwerksoft.com/products/iisrewrite/'" tppabs="http://www.qwerksoft.com/products/iisrewrite/">IIS REWRITEQ语法都是基于正则表辑ּQ因此语法是几乎相同的:(x)
面向~存的页面设?/h2>让页面能够比较好的被~存服务器缓存,必须在生内容的WEB服务器上讄Q让q回内容的HTTP HEADER中加?Last-Modified"?Expires"声明Q比如:(x) Last-Modified: Wed, 14 May 2003 13:06:17 GMT Expires: Fri, 13 Jun 2003 13:06:17 GMT 以允许前端SQUID服务器缓存:(x)
面必须包含Last-Modified: 标记Q一般纯静态页面本w都?x)有Last-Modified信息Q动态页面需要通过函数强制加上Q比如PHP中:(x) // always modified now header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
必须有Expires或Cache-Control: max-age标记讄面的过期时_(d)(x) 对于静态页面,通过apache的mod_expiresҎ(gu)面的MIMEcd讄~存周期Q比如图片缺省是1个月QHTML面~省?天等?br /><IfModule mod_expires.c> ExpiresActive on ExpiresByType image/gif "access plus 1 month" ExpiresByType text/css "now plus 2 day" ExpiresDefault "now plus 1 day" </IfModule>
对于动态页面,则可以直接通过写入HTTPq回的头信息Q比如对于新闻首index.php可以?0分钟Q而对于具体的一条新闻页面可能是1天后q期。比如:(x)在php中加入了1个月后过期:(x) // Expires one month later header("Expires: " .gmdate ("D, d M Y H:i:s", time() + 3600 * 24 * 30). " GMT");
Function engWeekDayName(dt) Dim Out Select Case WeekDay(dt,1) Case 1:Out="Sun" Case 2:Out="Mon" Case 3:Out="Tue" Case 4:Out="Wed" Case 5:Out="Thu" Case 6:Out="Fri" Case 7:Out="Sat" End Select engWeekDayName = Out End Function
Function engMonthName(dt) Dim Out Select Case Month(dt) Case 1:Out="Jan" Case 2:Out="Feb" Case 3:Out="Mar" Case 4:Out="Apr" Case 5:Out="May" Case 6:Out="Jun" Case 7:Out="Jul" Case 8:Out="Aug" Case 9:Out="Sep" Case 10:Out="Oct" Case 11:Out="Nov" Case 12:Out="Dec" End Select engMonthName = Out End Function %>
<!--#include file="../include.asp"--> <% ' set Page Last-Modified Header: ' Converts date (19991022 11:08:38) to http form (Fri, 22 Oct 1999 12:08:38 GMT) Response.AddHeader "Last-Modified", DateToHTTPDate(Now())
' The Page Expires in Minutes Response.Expires = 60
' Set cache control to externel applications Response.CacheControl = "public" %>
配置Q?br />http_port 80 httpd_accel_host virtual httpd_accel_port 8000 httpd_accel_uses_host_header on
# accelerater my domain only acl acceleratedHosts dstdom_regex chedong.com # accelerater http protocol on port 80 acl acceleratedProtocol protocol HTTP acl acceleratedPort port 80 # access arc acl all src 0.0.0.0/0.0.0.0
# Allow requests when they are to the accelerated machine AND to the # right port with right protocol http_access allow acceleratedProtocol acceleratedPort acceleratedHosts http_access allow all
phpMan.php是一个基于php的man page serverQ每个man page需要调用后台的man命o(h)和很多页面格式化工具Q系l负载比较高Q提供了Cache Friendly的URLQ以下是针对同样的页面的性能试资料Q?br />试环境QRedhat 8 on Cyrix 266 / 192M Mem 试E序Q用apache的ab(apache benchmark)Q?br />试条gQ请?0ơ,q发50个连?br />试目Q直接通过apache 1.3 (80端口) vs squid 2.5(8000端口Q加?0端口)
试1Q无CACHE?0端口动态输出:(x) ab -n 100 -c 10 http://www.chedong.com:81/phpMan.php/man/kill/1 This is ApacheBench, Version 1.3d <$Revision: 1.58 $> apache-1.3 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2001 The Apache Group, http://www.apache.org/
Benchmarking localhost (be patient).....done Server Software: Apache/1.3.23 Server Hostname: localhost Server Port: 80
Concurrency Level: 5 Time taken for tests: 63.164 seconds Complete requests: 50 Failed requests: 0 Broken pipe errors: 0 Total transferred: 245900 bytes HTML transferred: 232750 bytes Requests per second: 0.79 [#/sec] (mean) Time per request: 6316.40 [ms] (mean) Time per request: 1263.28 [ms] (mean, across all concurrent requests) Transfer rate: 3.89 [Kbytes/sec] received
Connnection Times (ms) min mean[+/-sd] median max Connect: 0 29 106.1 0 553 Processing: 2942 6016 1845.4 6227 10796 Waiting: 2941 5999 1850.7 6226 10795 Total: 2942 6045 1825.9 6227 10796
Percentage of the requests served within a certain time (ms) 50% 6227 66% 7069 75% 7190 80% 7474 90% 8195 95% 8898 98% 9721 99% 10796 100% 10796 (last request)
试2QSQUID~存输出 /home/apache/bin/ab -n50 -c5 "http://localhost:8000/phpMan.php/man/kill/1" This is ApacheBench, Version 1.3d <$Revision: 1.58 $> apache-1.3 Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright (c) 1998-2001 The Apache Group, http://www.apache.org/
Benchmarking localhost (be patient).....done Server Software: Apache/1.3.23 Server Hostname: localhost Server Port: 8000
Concurrency Level: 5 Time taken for tests: 4.265 seconds Complete requests: 50 Failed requests: 0 Broken pipe errors: 0 Total transferred: 248043 bytes HTML transferred: 232750 bytes Requests per second: 11.72 [#/sec] (mean) Time per request: 426.50 [ms] (mean) Time per request: 85.30 [ms] (mean, across all concurrent requests) Transfer rate: 58.16 [Kbytes/sec] received
Connnection Times (ms) min mean[+/-sd] median max Connect: 0 1 9.5 0 68 Processing: 7 83 537.4 7 3808 Waiting: 5 81 529.1 6 3748 Total: 7 84 547.0 7 3876
Percentage of the requests served within a certain time (ms) 50% 7 66% 7 75% 7 80% 7 90% 7 95% 7 98% 8 99% 3876 100% 3876 (last request)
?Q一个CACHE多主机APACHE服务的SQUID安装配置Q?/h2>squid的编译:(x) ./configure --enable-useragent-log --enable-referer-log --enable-default-err-language=Simplify_Chinese --enable-err-languages="Simplify_Chinese English" --disable-internal-dns make #make install #cd /usr/local/squid make dir cache chown squid.squid * vi /usr/local/squid/etc/squid.conf
---------------------cut here---------------------------------- # visible name visible_hostname cache.example.com
# cache config: space use 1G and memory use 256M cache_dir ufs /usr/local/squid/cache 1024 16 256 cache_mem 256 MB cache_effective_user squid cache_effective_group squid
http_port 80 httpd_accel_host virtual httpd_accel_single_host off httpd_accel_port 80 httpd_accel_uses_host_header on httpd_accel_with_proxy on
# accelerater my domain only acl acceleratedHostA dstdomain .example1.com acl acceleratedHostB dstdomain .example2.com acl acceleratedHostC dstdomain .example3.com # accelerater http protocol on port 80 acl acceleratedProtocol protocol HTTP acl acceleratedPort port 80 # access arc acl all src 0.0.0.0/0.0.0.0
# Allow requests when they are to the accelerated machine AND to the # right port with right protocol http_access allow acceleratedProtocol acceleratedPort acceleratedHostA http_access allow acceleratedProtocol acceleratedPort acceleratedHostB http_access allow acceleratedProtocol acceleratedPort acceleratedHostC
# logging emulate_httpd_log on referer_log /usr/local/squid/var/logs/referer.log useragent_log /usr/local/squid/var/logs/agent.log
q样q行后:(x)?x)发现php把PATH_INFO映射C物理路径?br />Warning: Unknown(C:\CheDong\Downloads\ariadne\www\test.php\path): failed to create stream: No such file or directory in Unknown on line 0
Warning: Unknown(): Failed opening 'C:\CheDong\Downloads\ariadne\www\test.php\path' for inclusion (include_path='.;c:\php4\pear') in Unknown on line 0
安装ariadne的PATCH ================== 停止IIS服务 net stop iisadmin ftp://ftp.muze.nl/pub/ariadne/win/iis/php-4.2.3/php4isapi.dll 覆盖原有的c:\php\sapi\php4isapi.dll
5. robots.txt文g参考资? robots.txt文g的更具体讄,请参看以下链接:(x) · Web Server Administrator's Guide to the Robots Exclusion Protocol · HTML Author's Guide to the Robots Exclusion Protocol · The original 1994 protocol description, as currently deployed · The revised Internet-Draft specification, which is not yet completed or implemented 6. 各搜索引擎的robot GoogleQCrawled by Googlebot/2.1 (+http://www.google.com/bot.html)
BaiduQCrawled by Baiduspider+(+http://www.baidu.com/search/spider.htm)
YahooQCrawled by Mozilla/5.0 (compatible; Yahoo! Slurp China
MSNQ Crawled by msnbot/1.0 (+http://search.msn.com/msnbot.htm)
SogouQ Crawled by sogou spider
中搜QCrawled by User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) www.best-code.com sinaQCrawled by Mozilla/4.0(compatible;MSIE 6.0;Windows NT 5.0;.NET CLR 1.1.432)
Google{新一带搜索引擎的优势之一在于不仅索引量很大,而且q将最好的l果排在搜烦l果的最前面Q具体的原理可以参?a href="javascript:if(confirm('http://www.kusastro.kyoto-u.ac.jp/%7Ebaba/wais/pagerank.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.kusastro.kyoto-u.ac.jp/%7Ebaba/wais/pagerank.html'" tppabs="http://www.kusastro.kyoto-u.ac.jp/%7Ebaba/wais/pagerank.html">Google の秘?- PageRank 徹底解説一文,PageRank单的说类gU技论文中的引用机制Q谁的论文被引用ơ数多,谁就是权威。在互联|上PageRank是Z|页中相互链接关pȝ分析得出的,由此引出W一个要点:(x)
以量取胜Q不一定加入大型网站的分类目录才是|站推广Q来自其他网站的M反相链接都是有用?br />|站推广比较l典的方式就是加入比较大型门L(fng)站的分类目录Q比如:(x)Yahoo!Q?a href="javascript:if(confirm('http://dmoz.org/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://dmoz.org/'" tppabs="http://dmoz.org/">dmoz.org{。其实这里有一个误区:(x)不一定非要加入大型网站的分类目录才是|站推广Q因为现在搜索引擎已l不再只是网站目录的索引Q而是更全面的|页索引Q所以无论来自其他网站Q何地方的反相链接都是非常有h(hun)值的Q哪怕是出现在新L道,论坛Q邮件列表归档中。因此在往很多大型站点的邮件列表发邮gӞ一定注意在自己的签名中加上自己|站的地址?br />BloggerQWeblog的简Uͼ们也许最深刻地理解了“链接就是一切”这句话的含义,׃Blog的内容之间有大量的相互链接,因此最l常被引用的Blog面在搜索引擎中的排名往往比一些大型商业网站的面q要高?br />
以质取胜Q被PageRank高的|站引用能更快地提高PageRank 数量只是关键因素之一Q来自PageRank高的面的链接还能更快的提高被链接目标的PageRankQ以我的个h|站ZQ我没有加入M分类目录Q只是将一些文章投E在?a href="javascript:if(confirm('http://www.zdnet.com.cn/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.zdnet.com.cn/'" tppabs="http://www.zdnet.com.cn/">ZDNet中国上,׃面上有文章出处链接Q相应网和|站整体的PageRankq了一D|间后有了很大的提升。有时候被什么样的网站引用有时候比引用ơ数多更重要。这里我要特别感谢的是,当时ZDNet中国是唯一遵@了我的版权声明的要求表明了文章出处,q且有反盔R接的|站?br />
了解搜烦引擎?价D"Q?br />Lucene?/font>q篇文章?a href="javascript:if(confirm('http://jakarta.apache.org/lucene/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://jakarta.apache.org/lucene/'" tppabs="http://jakarta.apache.org/lucene/">Jakarta.apache.org的lucene目引用以后Q这文章就成ؓ(f)了所有页面中PageRank最高的面Q在Google工具?/font>上显C是6/10Q,而Google深厚的学院气氛让我一直怀疑他们对 .edu{非功利站点有特别加?:-)Q毕?org .edu才代表了互联|精的实质Q知识的׃n?br />但更合理的解释是Q?org很多都是开放技术^台的开发者,他们?x)在首页q样的地方加入Powered By Apache, Power by FreeBSD之类的链接表C对其他开源^台的重Q所以象Apache, PHP, FreeBSDq样的开源站点在GOOGLE中都有非帔R的PageRank。而在.eduq些站点中,很多都是学术性比较强的文档,以超链接形式标明参考文献的出处已经成ؓ(f)一U习(fn)惯,而这也无疑正是PageRank最好的依据?br />注意Q千万不要通过Link Farm提高自n的站Ҏ(gu)名:(x)Google?x)惩|那些主动链接到Link Farm站点以提高自w排名站点,相应站点的页面将不会(x)被收入到索引中。但如果你的面被别的Link Farm链接了也不必担心Q因U被动的链接是不?x)被惩罚的?/span>
以前通过WEB日志的用户分析主要是单的Z日志中的讉K旉/IP地址来源{,很明显,Z搜烦引擎关键词的l计能得到的分析l果更丰富、更直观。因此,搜烦引擎服务的潜在商业h(hun)值几乎是不言而喻的,也许q也?a href="javascript:if(confirm('http://www.yahoo.com/ \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.yahoo.com/'" tppabs="http://www.yahoo.com/">Yahoo!Altavista{传l搜索引擎网站在门户模式后重新开始重视搜索引擎市场的原因Q看?a href="javascript:if(confirm('http://www.google.com/press/zeitgeist2002.html \n\nThis file was not retrieved by Teleport Pro, because it is addressed on a domain or path outside the boundaries set for its Starting Address. \n\nDo you want to open it from the server?'))window.location='http://www.google.com/press/zeitgeist2002.html'" tppabs="http://www.google.com/press/zeitgeist2002.html">Google的年度关键词l计q道了Q在互联|上有谁比搜索引擎更了解用户对什么更感兴呢Q?br />
Jon 特别指出Q这U方法可以应用到大量Weblog上,以跟t社?x)趋势,q对商业应用也很有潜力。例如,q告商可以从成千上万的个人Blog 中快速找到潜在的需求风。而且只要Blog 覆盖话题范围_大(实际上发展趋势确实如此)Q这Ҏ(gu)术对政治、社?x)、文化和l济{领域也都会(x)有实际意义了?/span>