锘??xml version="1.0" encoding="utf-8" standalone="yes"?> 鍒嗗竷璁$畻緋葷粺妗嗘灦錛屾寜鐓ф暟鎹泦鐨勭壒鐐規(guī)潵璇達(dá)紝涓昏鍒嗕負(fù)data-flow鍜宻treaming涓ょ銆俤ata-flow涓昏鏄互鏁版嵁鍧椾負(fù)鏁版嵁婧愭潵澶勭悊鏁版嵁錛屼唬琛ㄦ湁錛歁R銆丼park絳夛紝鎴戠О浣滃畠浠負(fù)澶ф暟鎹紝鑰宻treaming涓昏鏄鐞嗗崟浣嶅唴寰楀埌鐨勬暟鎹紝榪欑鏂瑰紡錛屾洿娉ㄩ噸浜庡疄鏃舵э紝涓昏鍖呮嫭Strom銆丣Storm鍜孲amza絳夛紝鎴戠О浣滃畠浠負(fù)蹇暟鎹?/p> 鍦ㄨ繖綃囨枃绔犱腑錛屾垜涓昏璋堣streaming鐩稿叧鐨勬鏋躲?/p> 絎竴涓槸Storm錛屼竴涓疄鏃惰綆楃郴緇燂紝瀹冨亣瀹氭暟鎹簮鏄姩鎬佺殑錛屽彲浠ュ悜嫻佹按涓鏍峰鐞嗘暟鎹?/p> 瀹冪殑鐗圭偣鏄細(xì)浣庡歡榪熴侀珮鎬ц兘銆佸垎甯冨紡銆佸彲鎵╁睍鍜屽閿欐с?/p> 鏋舵瀯濡備笅鍥炬墍紺恒?/p> Storm鐨勫叿浣撴蹇靛彲浠ュ弬鐓э細(xì)http://blog.csdn.net/hljlzc2007/article/details/12976211錛岃繖閲屼笉鍋氬叿浣撲粙緇嶃?/p> Storm鐩墠綆楁槸鏈鏈紼沖畾鐨勫紑婧愭祦寮忓鐞嗘鏋訛紝浣嗘槸涓漢璁や負(fù)瀹冩湁涓や釜闂銆?/p> 1. Storm铏界劧鏀寔澶氫釜璇█緙栧啓spout鍜宐olt绔殑浠g爜錛屼絾鏄畠鐨勪富瑕佹妧鏈疄鐜版槸clojure錛岃繖緇欑帺澶ф暟鎹佸紑婧愮殑鏈嬪弸甯︽潵浜?jiǎn)鏋佸ぇ鐨勪笉鍙樺Q屽洜涓哄ぇ瀹朵細(xì)鐨勮璦涓嶆槸浠ava鍜孋++絳夊ぇ浼楄璦涓轟富錛岃繖鏍風(fēng)殑璇濓紝鍙樺緱涓嶅彲鎺т簡(jiǎn)錛岄毦浠ユ繁鍏ヤ簡(jiǎn)瑙c佷慨鏀瑰叾緇嗚妭銆?/p> 2. Storm鍙互鏀寔鍦╕arn(Hadoop 2.0)涓婏紝鍙互鍜屽叾浠栧紑婧愭鏋跺叡浜獺adoop闆嗙兢鐨勮祫婧愶紝浣嗘槸鎬ц兘涓嶄匠錛岃繖涓湁寰匰torm鏀瑰杽 褰撶劧鏃犺濡備綍錛孲torm渚濈劧鏄洰鍓嶅紑婧愭祦寮忓鐞嗘鏋剁殑鐜嬭呫?/p> 絎簩涓垜鎯寵鐨勬槸JStorm錛岃繖涓槸闃塊噷鍋氱殑錛岀畻鏄疭torm鐨勫彟涓涓疄鐜幫紝瀹冪敤鐨勮璦鏄疛ava. 鐗圭偣錛?/p> 1. 瀹㈡埛绔殑API涓嶴torm鍩烘湰涓婃槸涓鑷寸殑錛屽鏋滀粠Storm榪佺Щ榪囨潵錛屼笉闇瑕佷慨鏀筨olt鍜宻pout鐨勪唬鐮?/p> 2. Jstrom姣擲trom紼沖畾錛岄熷害鏇村揩 3. 鎻愪緵浜?jiǎn)涓浜涙柊鐨勭壒鎬?/p> 澶у鏈夊叴瓚e彲浠ュ幓鐜╃帺錛岄」鐩湴鍧https://github.com/alibaba/jstorm 絎笁涓槸Samza Samza鏄敱LinkedIn寮婧愮殑涓涓妧鏈紝瀹冩槸涓涓紑婧愮殑鍒嗗竷寮忔祦澶勭悊緋葷粺錛岄潪甯哥被浼間簬Storm銆備笉鍚岀殑鏄畠榪愯鍦℉adoop涔嬩笂錛屽茍涓斾嬌鐢ㄤ簡(jiǎn)鑷繁寮鍙戠殑Kafka鍒嗗竷寮忔秷鎭鐞嗙郴緇熴?/p> 榪欐槸Linkin寮鍙戠殑涓涓皬鑰岀編鐨勯」鐩紝濡備綍緹庡憿錛?/p> 1. 鍙湁鍑犲崈琛屼唬鐮侊紝瀹屾垚鐨勫姛鑳藉氨鍙互鍜孲torm濯茬編錛屽綋鐒剁洰鍓嶈繕鏈夊緢澶氱殑涓嶈凍 2. 鍜孠afka緇撳悎绱у瘑錛屾洿鏂逛究鐨勫鐞嗘暟鎹?/p> 3. 榪愯鍦╕arn涓?/p> 涔嬪墠鎴戝仛榪囩殑涓涓」鐩紝鏄疜afka + Storm + ElasticSearch錛屽皢鏉ュ畬鍏ㄥ彲浠ュ皢Storm鏇挎崲鎴怱amza錛岃繖鏍風(fēng)殑璇濓紝榪樺彲浠ュ埄鐢℉adoop闆嗙兢鐨勮祫婧愶紝鍋氫竴浜涘瓨鍌ㄣ佺綰垮垎鏋愮殑鍔熻兘銆傚皢瀹炴椂澶勭悊鍜岀綰垮垎鏋愰兘榪愯鍦℉adoop涓婏紝涓嶅緱涓嶈Samza鏄竴涓紵澶х殑欏圭洰錛岃繖鏍峰彲浠ュ噺灝戦」鐩殑澧為暱澶嶆潅搴︼紝鍒╀簬緇存姢錛岃繕鏄偅鍙ヨ瘽錛屽皬鑰岀編鐨勪笢瑗匡紝鏇村彈嬈㈣繋涓浜涖?/p> 鏋舵瀯錛?/p> Samza涓昏鍖呭惈涓夊眰錛?/p> 1. 嫻佸鐞嗗眰 --> Kafka 2. 鎵ц灞? --> YARN 3. 澶勭悊灞? --> Samza API Samza鐨勬祦澶勭悊灞傚拰鎵ц灞傞兘鏄彲鎻掓嫈寮忕殑錛屽紑鍙戜漢鍛樺彲浠ヤ嬌鐢ㄥ叾浠栨鏋舵潵鏇夸唬錛屼笉灞闄愪簬涓婅堪涓ょ鎶鏈?/p> Samza鎻愪緵浜?jiǎn)涓涓猋ARN ApplicationMaster錛屽拰YARN job錛岃繍琛屽湪闆嗙兢涔嬪錛屼笅鍥句腑涓嶅悓棰滆壊浠h〃涓嶅悓鐨勪富鏈恒?/p> Samza瀹㈡埛绔憡璇塝ARN鐨凴esouce Manager錛屽畠鎯沖惎鍔ㄤ竴涓猄amza job錛?YARN RM 鍛婅瘔YARN Node manager錛屽垎閰嶇┖闂寸粰YARN ApplicationMaster錛孨M鎸囧畾瀹岀┖闂村悗錛孻ARN container浼?xì)杩愯Samza Task Runner銆?/p> Samza鐘舵佺鐞?/p> 嫻佸紡澶勭悊鏁版嵁瀵圭姸鎬佺殑綆$悊鏄緢闅劇殑錛岀敱浜庢暟鎹槸嫻佸姩鐨勶紝鏈韓娌℃湁鐘舵侊紝榪欐牱灝遍渶瑕侀潬鍘嗗彶鏁版嵁鏉ヨ褰曞簲鐢ㄧ殑鍦哄悎錛孲amza鎻愪緵浜?jiǎn)涓涓唴閮ㄧ殑key-value鏁版嵁搴擄紝瀹冩槸鍩轟簬LevelDB錛岃繍琛岀殑JVM涔嬪鐨勶紝浣跨敤瀹冩潵瀛樺偍鍘嗗彶鏁版嵁銆傝繖鏍風(fēng)殑鍋氱殑濂藉鏄細(xì) 1. 鍑忓皯JVM鐨勫紑閿 2. 浣跨敤鍐呴儴瀛樺偍錛屾瀬澶ф彁楂樼殑鍚炲悙鐜?/p> 3. 鍑忓皯騫跺彂鎿嶄綔 Samza澶勭悊嫻佺▼. 涓嬪浘鏄疭amza瀹樻柟緇欑殑涓渚嬪瓙錛屾牴鎹甅ember ID鍒嗙粍錛岃綆楅〉闈㈣闂鏁般傚叆鍙f秷鎭垎鍒潵鑷狹achine1銆?錛屽嚭鍙f槸Machine3錛屾垜浠彲浠ヨ繖鏍風(fēng)悊瑙o紝娑堟伅鍒嗘暎鍦ㄤ笉鍚岀殑娑堟伅緋葷粺涓紙Kafka錛夛紝Samza浠庝笉鍚岀殑Kafka涓鍙杢opic錛屽湪灝唗opic榪涜澶勭悊鍚庯紝鍙戦佸埌Machine3錛岃繖閲屼笉鍋氳繃澶氬垎瑙o紝鍏蜂綋鍙互鍙傜収瀹樻柟鏂囨。銆?/p> 欏圭洰鍦板潃錛?a target="_blank" style="color: #336699; text-decoration: none;">https://github.com/apache/incubator-samza 瀹樻柟鏂囦歡錛?a target="_blank" style="color: #336699; text-decoration: none;">http://samza.incubator.apache.org/ 浠ヤ笂緇欎簡(jiǎn)鎴戜滑鏃犻檺閬愭兂錛孲torm鏄惁浼?xì)淇濇寔棰嗗厛鍦颁綅锛孲amza鑳藉惁鍙栬屼唬涔嬪憿錛屾棤璁哄浣曪紝浣滀負(fù)寮鍙戣呮潵璇達(dá)紝鍑犲崈琛屼唬鐮侊紝鎴戦兘榪笉鍙?qiáng)寰呭幓瑕佽M竴涓嬩簡(jiǎn)銆?/p>
2. In-memory Analytics : Apache Spark
3. Search Analytics : Apache Elastic search, SOLR
4. Log Analytics : Apache ELK Stack,ESK Stack(Elastic Search, Log
Stash, Spark Streaming, Kibana)
5. Batch Analytics : Apache MapReduce
***** NO SQL DB *****
1. MongoDB
2. Hbase
3. Cassandra
***** SOA *****
1. Oracle SOA
2. JBoss SOA
3. TiBco SOA
4. SOAP, RESTful Webservices
]]>
浣跨敤Nimbus鑾峰彇STORM鐨勪俊鎭?br />http://www.andys-sundaypink.com/i/retrieve-storm-cluster-statistic-from-nimbus-java-mode/
TFramedTransport tTransport = new TFramedTransport(tsocket);
TBinaryProtocol tBinaryProtocol = new TBinaryProtocol(tTransport);
Nimbus.Client client = new Nimbus.Client(tBinaryProtocol);
String topologyId = "test-1-234232567";
try {
tTransport.open();
ClusterSummary clusterSummary = client.getClusterInfo();
StormTopology stormTopology = client.getTopology(topologyId);
TopologyInfo topologyInfo = client.getTopologyInfo(topologyId);
List<ExecutorSummary> executorSummaries = topologyInfo.get_executors();
List<TopologySummary> topologies = clusterSummary.get_topologies();
for(ExecutorSummary executorSummary : executorSummaries){
String id = executorSummary.get_component_id();
ExecutorInfo executorInfo = executorSummary.get_executor_info();
ExecutorStats executorStats = executorSummary.get_stats();
System.out.println("executorSummary :: " + id + " emit size :: " + executorStats.get_emitted_size());
}
} catch (TTransportException e) {
e.printStackTrace();
} catch (TException e) {
e.printStackTrace();
} catch (NotAliveException e) {
e.printStackTrace();
}
]]>
澶勭悊鐨勬柟寮忔湁鍚勭鑷畾涔夛細(xì)
濡傛灉瑕佸鐞嗘煇縐嶆秷鎭簡(jiǎn)錛岃鎬庝箞鍔炲憿錛?br />
褰撲竴涓猅OPOLOGY琚儴緗插埌STORM鏃訛紝STORM浼?xì)鏌ユ夰N厤緗璞$殑WORKER鏁伴噺錛屾牴鎹繖涓暟閲忕浉搴旂殑鍚姩N涓狫VM錛岀劧鍚庢牴鎹瘡涓楠ら厤緗殑NUMTASKS鐢熸垚鐩稿簲涓暟鐨勭嚎紼嬶紝鐒跺悗姣忎釜姝ラ涓厤緗殑鏁伴噺瀹炰緥鍖栫浉搴斾釜鏁扮殑瀵硅薄錛岀劧鍚庡氨鍚姩涓涓嚎紼嬩笉鏂殑鎵цSPOUT涓殑nextTuple()鏂規(guī)硶錛屽鏋滆繖涓柟娉曚腑鏈夎緭鍑虹粨鏋滐紝灝卞惎鍔ㄥ彟涓綰跨▼錛屽茍鍦ㄦ綰跨▼涓皢榪欎釜緇撴灉浣滀負(fù)鍙傛暟浼犲埌涓嬩竴涓璞$殑excue鏂規(guī)硶涓?br />
濡傛灉姝ゆ椂鍙堟湁涓涓楠OLT闇瑕佹墽琛岀殑璇濓紝涔熸槸鏂板彇涓涓嚎紼嬪幓鎵цBOLT涓殑鏂規(guī)硶鍚姩鐨勭嚎紼嬩笉浼?xì)瓒姌q嘚UMTASKS鐨勬暟閲忋?br />
]]>
setNumWorkers
) specifies how many processes you want allocated around the cluster to execute the topology. Each component in the topology will execute as many threads. The number of threads allocated to a given component is configured through the setBolt
and setSpout
methods. Those threadsexist within worker processes. Each worker process contains within it some number of threads for some number of components. For instance, you may have 300 threads specified across all your components and 50 worker processes specified in your config. Each worker process will execute 6 threads, each of which of could belong to a different component. You tune the performance of Storm topologies by tweaking the parallelism for each component and the number of worker processes those threads should run within.setDebug
), when set to true, tells Storm to log every message every emitted by a component. This is useful in local mode when testing topologies, but you probably want to keep this turned off when running topologies on the cluster.There's many other configurations you can set for the topology. The various configurations are detailed on the Javadoc for Config.
There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found here. The ones prefixed with "TOPOLOGY" can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology: