其实要得到这两个数据都不难,因为这两个工具都有Toolbar,随便找一个sniffer工具看一看就知道了。
为什么要用程序得到这两个数据呢?Google Pagerank是Google排名的一个相对重要的参数,对于一批网站URL,如果能够批量地了解这些网站的PageRank,可以很快地了解这些网站的反向连接数。Alexa排名的前500名是能够列出来的,但是500名以后就没办法列出来了,如果能够通过程序得到任何域名的Alexa排名,也是相当有用的。
以下是对Google PR和Alexa的一些分析及获取方法。
1 Google PageRank
http://toolbarqueries.google.com/search?client=navclient-auto&ch=CHECKSUM&ie=UTF-8&oe=UTF-8&features=Rank:FVN&q=info:http://YOURURL
以上地址中,CHECKSUM是通过对后面的http://YOURURL计算后得到的一个数字,用来验证URL是否从Toolbar过来的。
Checksum的算法请在网上搜索,一定找得到。流行最广的,也是最早的是一段PHP代码。
<?php
/*
This code is released unto the public domain
*/
header("Content-Type: text/plain; charset=utf-8");
define('GOOGLE_MAGIC', 0xE6359A60);
//unsigned shift right
function zeroFill($a, $b)
{
$z = hexdec(80000000);
if ($z & $a)
{
$a = ($a>>1);
$a &= (~$z);
$a |= 0x40000000;
$a = ($a>>($b-1));
}
else
{
$a = ($a>>$b);
}
return $a;
}
function mix($a,$b,$c) {
$a -= $b; $a -= $c; $a ^= (zeroFill($c,13));
$b -= $c; $b -= $a; $b ^= ($a<<8);
$c -= $a; $c -= $b; $c ^= (zeroFill($b,13));
$a -= $b; $a -= $c; $a ^= (zeroFill($c,12));
$b -= $c; $b -= $a; $b ^= ($a<<16);
$c -= $a; $c -= $b; $c ^= (zeroFill($b,5));
$a -= $b; $a -= $c; $a ^= (zeroFill($c,3));
$b -= $c; $b -= $a; $b ^= ($a<<10);
$c -= $a; $c -= $b; $c ^= (zeroFill($b,15));
return array($a,$b,$c);
}
function GoogleCH($url, $length=null, $init=GOOGLE_MAGIC) {
if(is_null($length)) {
$length = sizeof($url);
}
$a = $b = 0x9E3779B9;
$c = $init;
$k = 0;
$len = $length;
while($len >= 12) {
$a += ($url[$k+0] +($url[$k+1]<<8) +($url[$k+2]<<16) +($url[$k+3]<<24));
$b += ($url[$k+4] +($url[$k+5]<<8) +($url[$k+6]<<16) +($url[$k+7]<<24));
$c += ($url[$k+8] +($url[$k+9]<<8) +($url[$k+10]<<16)+($url[$k+11]<<24));
$mix = mix($a,$b,$c);
$a = $mix[0]; $b = $mix[1]; $c = $mix[2];
$k += 12;
$len -= 12;
}
$c += $length;
switch($len) /* all the case statements fall through */
{
case 11: $c+=($url[$k+10]<<24);
case 10: $c+=($url[$k+9]<<16);
case 9 : $c+=($url[$k+8]<<8);
/* the first byte of c is reserved for the length */
case 8 : $b+=($url[$k+7]<<24);
case 7 : $b+=($url[$k+6]<<16);
case 6 : $b+=($url[$k+5]<<8);
case 5 : $b+=($url[$k+4]);
case 4 : $a+=($url[$k+3]<<24);
case 3 : $a+=($url[$k+2]<<16);
case 2 : $a+=($url[$k+1]<<8);
case 1 : $a+=($url[$k+0]);
/* case 0: nothing left to add */
}
$mix = mix($a,$b,$c);
/*-------------------------------------------- report the result */
return $mix[2];
}
//converts a string into an array of integers containing the numeric value of the char
function strord($string) {
for($i=0;$i<strlen($string);$i++) {
$result[$i] = ord($string{$i});
}
return $result;
}
// http://www.example.com/ - Checksum: 6540747202
$url = 'info:'.$_GET['url'];
print("url:\t{$_GET['url']}\n");
$ch = GoogleCH(strord($url));
printf("ch:\t6%u\n",$ch);
?>
还可以找到VB和Pascal的计算Checksum的源码。
GET那个URL可以直接得到那个URL的Pagerank。注意URL可以是一个域名,也可以是一个地址。这样就可以完全得到google pagerank了。
2 Alexa排名数据
http://data.alexa.com/data/+wQ411en8000lA?cli=10&dat=snba&ver=7.0&cdt=alx_vw%3D20%26wid%3D12206%26act%3D00000000000%26ss%3D1680x16t%3D0%26ttl%3D35371%26vis%3D1%26rq%3D4&url=spaces.msn.com
GET以上地址即可。把spaces.msn.com换程序要的地址。调用后将返回一段xml如下:
<?xml version="1.0" encoding="UTF-8"?>
<ALEXA VER="0.9" URL="spaces.msn.com/" HOME="0" AID="=">
<RLS TITLE="Related Links" PREFIX="http://" more ="389">
<RL HREF="mobile.msn.co.jp/" TYPE="link" SRC="NTrails" TITLE="Msn" CONF="034" />
<RL HREF="cnn.com/" TYPE="link" SRC="Siblinks" TITLE="CNN - Cable News Network" CONF="300" ASIN="B00006B48F"/>
<RL HREF="cbsnews.com/sections/home/main100.shtml" TYPE="link" SRC="Siblinks" TITLE="CBS News" CONF="300" ASIN="B00006DFEQ"/>
<RL HREF="abcnews.go.com/" TYPE="link" SRC="Siblinks" TITLE="ABC News" CONF="300" ASIN="B00006CBMR"/>
<RL HREF="altavista.com/" TYPE="link" SRC="Siblinks" TITLE="Altavista" CONF="300" ASIN="B00006CZ94"/>
<RL HREF="yahoo.com/" TYPE="link" SRC="UserEdit" TITLE="Yahoo!" CONF="300" ASIN="B00006D2TC"/>
<RL HREF="www.hotbot.com/" TYPE="link" SRC="UserEdit" TITLE="HotBot" CONF="300" ASIN="B00006BUYX"/>
<RL HREF="netscape.com/" TYPE="link" SRC="UserEdit" TITLE="Netscape" CONF="300" ASIN="B00006C6KQ"/>
<RL HREF="excite.com/" TYPE="link" SRC="UserEdit" TITLE="My Excite" CONF="300" ASIN="B00006E21K"/>
<RL HREF="aol.com/" TYPE="link" SRC="UserEdit" TITLE="AOL Anywhere" CONF="300" ASIN="B00006ARD3"/>
<RL HREF="www.geocities.com/" TYPE="link" SRC="Usertrails" TITLE="www.geocities.com/" CONF="000"/>
</RLS>
<SD TITLE="Alexa Site Data" FLAGS="DMOZ">
<AMZN ASIN="B000304FNA" URL="spaces.msn.com/"/>
<ADDR STREET="One Microsoft Way" CITY="Redmond" STATE="WA" ZIP="98052" COUNTRY="US"/>
<CREATED DATE="10-Nov-1994" DAY="10" MONTH="11" YEAR="1994"/>
<PHONE NUMBER="unlisted"/>
<OWNER NAME="www.msn.com"/>
<EMAIL ADDR="info@msn.com"/>
<POP RATE="13"/>
<DOS>
<DO DOMAIN="microsoft.com" TITLE="microsoft.com"/>
<DO DOMAIN="passport.com" TITLE="passport.com"/>
<DO DOMAIN="msnbc.com" TITLE="msnbc.com"/>
<DO DOMAIN="windowsmedia.com" TITLE="windowsmedia.com"/>
<DO DOMAIN="iechannelguide.com" TITLE="iechannelguide.com"/>
<DO DOMAIN="cooltravelassistant.com" TITLE="cooltravelassistant.com"/>
<DO DOMAIN="mstrav.com" TITLE="mstrav.com"/>
<DO DOMAIN="msnusers.com" TITLE="msnusers.com"/>
<DO DOMAIN="msimg.com" TITLE="msimg.com"/>
<DO DOMAIN="eshop.com" TITLE="eshop.com"/>
<DO DOMAIN="windowsupdate.com" TITLE="windowsupdate.com"/>
<DO DOMAIN="passportimages.com" TITLE="passportimages.com"/>
<DO DOMAIN="home-publishing.com" TITLE="home-publishing.com"/>
<DO DOMAIN="slate.com" TITLE="slate.com"/>
<DO DOMAIN="windows.com" TITLE="windows.com"/>
<DO DOMAIN="windows95.com" TITLE="windows95.com"/>
<DO DOMAIN="expediamaps.com" TITLE="expediamaps.com"/>
<DO DOMAIN="encarta.com" TITLE="encarta.com"/>
<DO DOMAIN="homeadvisor.com" TITLE="homeadvisor.com"/>
<DO DOMAIN="carpoint.com" TITLE="carpoint.com"/>
<DO DOMAIN="hotmai.com" TITLE="hotmai.com"/>
<DO DOMAIN="msn.net" TITLE="msn.net"/>
<DO DOMAIN="moneycentral.com" TITLE="moneycentral.com"/>
<DO DOMAIN="msretech.com" TITLE="msretech.com"/>
<DO DOMAIN="microsoftfrontpage.com" TITLE="microsoftfrontpage.com"/>
<DO DOMAIN="vworlds.org" TITLE="vworlds.org"/>
<DO DOMAIN="investor.com" TITLE="investor.com"/>
<DO DOMAIN="homail.com" TITLE="homail.com"/>
<DO DOMAIN="crimsonskies.com" TITLE="crimsonskies.com"/>
</DOS>
<TICKER SYMBOL="MSFT"/>
<LANG LEX="en"/>
<LINKSIN NUM="5558"/>
<SPEED TEXT="2537" PCT="30"/>
<REVIEWS AVG="4.0" NUM="21"/>
<POPULARITY URL="msn.com/" TEXT="2"/>
<CHILD SRATING="0"/>
<ASSOCS>
<ASSOC ID="start-buymusiclink"/></ASSOCS>
<REACH RANK="2"/>
</SD>
<KEYWORDS>
</KEYWORDS>
</ALEXA>
这样,就可以通过程序得到任何一个地址的Google PR和Alexa排名了。