Ganglia監控軟件主要是用來監控系統性能的軟件,如:cpu 、mem、硬盤利用率, I/O負載、網絡流量情況等,通過曲線很容易見到每個節點的工作狀態,對合理調整、分配系統資源,提高系統整體性能起到重要作用。支持瀏覽器方式訪問,但 不能監控節點硬件技術指標,ganglia 是分布式的監控系統,有兩個Daemon, 分別是:客戶端Ganglia Monitoring Daemon (gmond)和服務端Ganglia Meta Daemon (gmetad),還有Ganglia PHP Web Frontend(基于web的動態訪問方式)組成。是一個Linux下圖形化監控系統運行性能的軟件,界面美觀、豐富,功能強大。RRDtool是系統 存放和顯示time-series (即網絡帶寬、溫度、人數、服務器負載等) 。并且它提出有用的圖表由處理數據強制執行有些數據密度。
Ganglia 是 UC Berkeley 發起的一個開源監視項目,設計用于測量數以千計的節點。每臺計算機都運行一個收集和發送度量數據(如處理器速度、內存使用量等)的名為 gmond 的守護進程。它將從操作系統和指定主機中收集。接收所有度量數據的主機可以顯示這些數據并且可以將這些數據的精簡表單傳遞到層次結構中。正因為有這種層次 結構模式,才使得 Ganglia 可以實現良好的擴展。gmond 帶來的系統負載非常少,這使得它成為在集群中各臺計算機上運行的一段代碼,而不會影響用戶性能。
通過yum源安裝,系統光盤就有:
yum -y install \
apr-devel \
apr-util \
check-devel \
cairo-devel \
pango-devel \
libxml2-devel \
rpmbuild \
glib2-devel \
dbus-devel \
freetype-devel \
fontconfig-devel \
gcc-c++ \
expat-devel \
python-devel \
libXrender-devel
32位os下載:
wget http://download.fedora.redhat.com/pub/epel/5/i386/libconfuse-2.5-4.el5.i386.rpm
wget http://download.fedora.redhat.com/pub/epel/5/i386/libconfuse-devel-2.5-4.el5.i386.rpm
64位os下載:
wget http://download.fedora.redhat.com/pub/epel/5/x86_64/libconfuse-2.5-4.el5.x86_64.rpm
wget http://download.fedora.redhat.com/pub/epel/5/x86_64/libconfuse-devel-2.5-4.el5.x86_64.rpm
安裝
rpm –ivh libconfuse-2.5-4.el5.i386.rpm libconfuse-devel-2.5-4.el5.i386.rpm
wget http://oss.oetiker.ch/rrdtool/pub/rrdtool.tar.gz
tar zxvf rrdtool*
cd rrdtool-*
./configure --prefix=/usr
make
make install
到
這里編譯安裝的時候需要注意:
安裝之前用以下命令檢查有沒有rrd.h和librrd.a文件:
ll /usr/include/rrd.h
ll /usr/lib/librrd.a
如果存在以上兩個文件,則繼續安裝否則使用find命令查找這兩個文件,并創建指向這兩個文件的軟鏈接
find / -name rrd.h
find / -name librrd.a
比如說你找到的文件在/usr/local/下,則:
ln -s /usr/local/rrd.h /usr/include/rrd.h
ln -s /usr/local/librrd.a /usr/lib/librrd.a
安裝:
tar –zxvf ganglia-3.1.2.tar.gz
cd ganglia-3.1.2.tar.gz
./configure --with-gmetad
make
make install
如果安裝過程中出現讀取相關lib文件讀取錯誤,請在/usr/lib(32os)或/usr/lib64下查找是否存在該lib文件,如果不存在,查找該文件并在/usr/lib(32os)或/usr/lib64創建指向該lib文件的軟鏈接。
cd /tmp/ganglia-3.1.2/ //假設你的ganglia在此目錄
mkdir -p /var/www/html/ganglia/ //創建網站主目錄下ganglia文件夾,用來訪問ganglia
cp -a web/* /var/www/html/ganglia/ //拷貝ganglia網站代碼到該目錄
cp gmetad/gmetad.init /etc/rc.d/init.d/gmetad //拷貝gmetad服務啟動腳本
cp gmond/gmond.init /etc/rc.d/init.d/gmond //拷貝gmond服務啟動腳本
mkdir /etc/ganglia //創建配置文件主目錄
gmond -t | tee /etc/ganglia/gmond.conf //生成gmond服務配置文件
cp gmetad/gmetad.conf /etc/ganglia/ //拷貝gmetad服務配置文件
mkdir -p /var/lib/ganglia/rrds //創建rrd文件存放目錄
chown nobody:nobody /var/lib/ganglia/rrds //屬主和屬組都為nobody
chkconfig --add gmetad //將服務交給chkconfig管理
chkconfig --add gmond //同上
vi /etc/ganglia/gmond.conf //修改以下字段
cluster {
name = "cluster name" //你要監控的集群名,一般我們只改這一項,可以是任意值
owner = " unspecified "
latlong = "unspecified"
url = "unspecified"
}
如果你的服務器有兩塊網卡,eth0使用公網地址,eth1使用局域網地址,而你的監控服務器和被監控服務器之間的通信你希望通過局域網地址實現以減少公網網卡的負載,那么可以使用以下命令:
ip route add 239.2.11.71 dev eth1
因為239.2.11.71是ganglia默認的多點傳輸通道,所以要加一條路由使它通過eth1,也就是內網網卡,239.2.11.71這個地址你可以在/etc/ganglia/gmond.conf中修改
service gmond start
service gmetad start
service httpd restart
這時服務端的配置已經結束,通過網頁http://You_IP/ganglia已經可以訪問,并且可以監控到監控服務器的信息。
在監控服務器編寫一個腳本文件,執行它可以將服務快速的發布到被監控主機上
vi mynodes
添加你的主機名或服務器ip到該文件,我這里只寫了5臺主機
192.168.10.1
192.168.10.2
192.168.10.3
192.168.10.4
192.168.10.5
vi ganglia.sh
使用以下內容,這里使用的是64位OS,如果是32位OS,將下面的lib64改為lib
for i in `cat mynodes`; do
scp /usr/bin/gmetric $i:/usr/bin
scp /usr/sbin/gmond $i:/usr/sbin/gmond
ssh $i mkdir -p /etc/ganglia/
scp /etc/ganglia/gmond.conf $i:/etc/ganglia/
scp /etc/init.d/gmond $i:/etc/init.d/
scp /usr/lib64/libganglia-3.1.2.so.0 $i:/usr/lib64/
scp /lib64/libexpat.so.0 $i:/lib64/
scp /usr/lib64/libconfuse.so.0 $i:/usr/lib64/
scp /usr/lib64/libapr-1.so.0 $i:/usr/lib64/
scp -r /usr/lib64/ganglia $i:/usr/lib64/
ssh $i service gmond start
done
chmod 755 ganglia.sh
./ganglia.sh
至此ganglia客戶端和服務器端的配置完成,可以通過web訪問來監控你的群集了
在使用gmetric前要確定自己被監控服務器有gmetri
ll /usr/bin/gmetric
如果沒有,通過scp從監控服務器拷貝
scp root@192.168.10.100:/usr/bin/gmetric /usr/bin/ //假設你的監控服務器為192.168.10.100
定制你自己的腳本,執行,并且將結果返回監控服務器
你可以在http://ganglia.info/gmetric/找到很多別人共享的腳本,以下是一個監控memcache服務的腳本:
vi /root/mcd_gmetric.sh
以下內容
#!/bin/bash
### $Id: mcd_gmetric.sh 16661 2006-11-07 00:56:33Z ben $
### This script queries a memcached server running
### on localhost and reports a few statistics to
### ganglia.
### It reports
### *mcd_curr_items - the number of objects stored
### *mcd_curr_bytes - current bytes used
### *mcd_curr_conns - current number of connections
### *mcd_hit_perc - hits / gets for current time duration
### (current hit percentage)
### For more description on any of these metrics,
### see the protocols.txt file in the MCD docs.
### Copyright Simply Hired, Inc. 2006
### License to use, modify, and distribute under the GPL
### http://www.gnu.org/licenses/gpl.txt
VERSION=1.0
GMETRIC="/usr/bin/gmetric"
GMETRIC_ARGS="-c /etc/ganglia/gmond.conf"
STATEFILE="/var/lib/ganglia/metrics/mcd.stats"
ERROR_NOTROOT="/tmp/mcd_gmetric_notroot"
ERROR_CANT_CONNECT="/tmp/mcd_gmetric_cant_connect"
ERROR_CREATE="/tmp/mcd_gmetric_create_statefile_failed"
ERROR_GETS_EMPTY="/tmp/mcd_gets_empty"
MCD_CONF="/etc/sysconfig/memcached"
# get system configuration
if [ -e ${MCD_CONF} ]
then
source ${MCD_CONF}
MCD_PORT=${PORT}
fi
MCD_PORT=${MCD_PORT:-11211}
date=`date +%s`
if [ $UID -ne 0 ]
then
if [ -e $ERROR_NOTROOT ] ; then exit 1; fi
echo "Error: this script must be run as root."
touch $ERROR_NOTROOT
exit 1
fi
rm -f $ERROR_NOTROOT
if [ "x$1" == "x-h" ]
then
echo "Usage: mcd_gmetric.sh [--clean]"
echo " --clean delete all tmp files"
exit 0
fi
if [ "x$1" == "x--clean" ]
then
rm -f $STATEFILE $ERROR_NOTROOT $ERROR_CANT_CONNECT $ERROR_CREATE
retval=$?
if [ $retval -ne 0 ]
then
echo "failed to clean up."
exit 1
else
echo "All cleaned up."
exit 0
fi
fi
# if the GMETRIC program isn't installed, compain
if [ ! -e $GMETRIC ]
then
if [ -e $ERROR_GMETRIC ] ; then exit 1; fi
echo ""
echo "Error: GMETRIC doesn't seem to be installed."
echo "$GMETRIC doesn't exist."
echo ""
touch $ERROR_GMETRIC
exit 1
fi
# get current statistics
exec 3>&2 #turn off STDERR
exec 2>/dev/null
stats_array=(`echo "stats" | nc localhost $MCD_PORT`)
retval=$?
exec 2>&1 #turn on STDERR
exec 3>&-
if [ $retval -ne 0 ]
then
if [ -e $ERROR_CANT_CONNECT ] ; then exit 1 ; fi
echo "I can't connect to mcd."
echo "Bummer. "
touch $ERROR_CANT_CONNECT
exit 1
fi
mcd_curr_items=`echo ${stats_array[23]}|tr -c -d [0-9]` #this tr thing is because there's a trailing ^M on the string from netcat that breaks bc.
mcd_curr_bytes=`echo ${stats_array[29]}|tr -c -d [0-9]`
mcd_curr_conns=`echo ${stats_array[32]}|tr -c -d [0-9]`
mcd_total_gets=`echo ${stats_array[41]}|tr -c -d [0-9]`
mcd_total_sets=`echo ${stats_array[44]}|tr -c -d [0-9]`
mcd_total_hits=`echo ${stats_array[47]}|tr -c -d [0-9]`
if [ -z "$mcd_total_gets" ]
then
# this actually happens rather often for some reason, so I'm just going to fail silently.
# if [ -e $ERROR_GETS_EMPTY ] ; then exit 1 ; fi
# echo ""
# echo "ERROR: mcd_total_gets empty."
# echo ""
exit 1
fi
rm -f $ERROR_GETS_EMPTY
# save and turn off /STDERR for the statefile tests
exec 3>&2
exec 2>/dev/null
# if the statefile doesn't exist, we either havn't
# run yet or there's something bigger wrong.
if [ ! -e $STATEFILE ]
then
if [ ! -d `dirname $STATEFILE` ]
then
mkdir -p `dirname $STATEFILE`
fi
echo "$date $mcd_curr_items $mcd_curr_bytes $mcd_curr_conns $mcd_total_gets $mcd_total_sets $mcd_total_hits" > $STATEFILE
if [ ! -e $STATEFILE ]
then
# if it didn't exist and we couldn't create
# it, we should just scream bloody murder and die.
# only scream once though...
if [ -e $ERROR_CREATE ]
then
exit 1
fi
echo ""
echo "ERROR: couldn't create $STATEFILE"
echo ""
touch $ERROR_CREATE
exit 1
fi
echo "Created statefile. Exitting."
exit 0
fi
# restore stderr
exec 2>&3
exec 3>&-
old_stats_array=(`cat $STATEFILE`)
old_date=${old_stats_array[0]}
old_mcd_curr_items=${old_stats_array[1]}
old_mcd_curr_bytes=${old_stats_array[2]}
old_mcd_curr_conns=${old_stats_array[3]}
old_mcd_total_gets=${old_stats_array[4]}
old_mcd_total_sets=${old_stats_array[5]}
old_mcd_total_hits=${old_stats_array[6]}
echo "$date $mcd_curr_items $mcd_curr_bytes $mcd_curr_conns $mcd_total_gets $mcd_total_sets $mcd_total_hits" > $STATEFILE
time_diff=$(($date - $old_date))
mcd_total_gets_diff=$(($mcd_total_gets - $old_mcd_total_gets))
mcd_total_sets_diff=$(($mcd_total_sets - $old_mcd_total_sets))
mcd_total_hits_diff=$(($mcd_total_hits - $old_mcd_total_hits))
if [ $time_diff -eq 0 ]
then
if [ -e $ERROR_TIMEDIFF ] ; then exit 1 ; fi
echo "something is broken."
echo "time_diff is 0."
touch $ERROR_TIMEDIFF
exit 1
fi
# none of these numbers should be less than 1, but if they are, just send back 1.
if [ $mcd_total_gets_diff -le 1 ] ; then mcd_total_gets_diff=1 ; fi
if [ $mcd_total_sets_diff -le 1 ] ; then mcd_total_sets_diff=1 ; fi
if [ $mcd_total_hits_diff -le 1 ] ; then mcd_total_hits_diff=1 ; fi
mcd_gets_per_sec=`echo "scale=3;${mcd_total_gets_diff}/${time_diff}"|bc`
mcd_sets_per_sec=`echo "scale=3;${mcd_total_sets_diff}/${time_diff}"|bc`
mcd_hits_per_sec=`echo "scale=3;${mcd_total_hits_diff}/${time_diff}"|bc`
mcd_hit_perc=`echo "scale=3; ${mcd_total_hits_diff} * 100 / ${mcd_total_gets_diff}" | bc`
$GMETRIC $GMETRIC_ARGS --name="mcd_seconds_measured" --value=${time_diff} --type=uint32 --units="secs"
$GMETRIC $GMETRIC_ARGS --name="mcd_items_cached" --value=${mcd_curr_items} --type=uint32 --units="items"
$GMETRIC $GMETRIC_ARGS --name="mcd_bytes_used" --value=${mcd_curr_bytes} --type=uint32 --units="bytes"
$GMETRIC $GMETRIC_ARGS --name="mcd_conns" --value=${mcd_curr_conns} --type=uint32 --units="connections"
$GMETRIC $GMETRIC_ARGS --name="mcd_gets" --value=${mcd_gets_per_sec} --type=float --units="gps"
$GMETRIC $GMETRIC_ARGS --name="mcd_sets" --value=${mcd_sets_per_sec} --type=float --units="sps"
$GMETRIC $GMETRIC_ARGS --name="mcd_cache_hits" --value=${mcd_hits_per_sec} --type=float --units="hps"
$GMETRIC $GMETRIC_ARGS --name="mcd_cache_hit%" --value=${mcd_hit_perc} --type=float --units="%"
賦予腳本可執行權限并執行:
chmod 755 mcd_gmetric.sh
./mcd_gmetric.sh
這時已經將結果通過gmetric返回到監控服務器,在服務器該主機的信息中已經可以看到memcached服務的相關信息了
將這個服務添加到crontab
crontab -e
/1 * * * * /root/mcd_gmetric.sh //設置每分鐘將結果返回監控服務器
http://www.howtocn.org/ganglia_how_to