<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    paulwong

    #

    大對(duì)象XML讀寫

    I am using JAXB and I have a large set of data which i have to marshal into a xml.Since marshalling the whole thing into xml in a single step will be using most of the memory , i want to split it into parts and write to the xml file incremently

    For example if my generated output xml should be like this:
    <Employees>
    <employee>......</employee>
    <employee>.....</employee>
    <employee>.....</employee>
    <employee>.....</employee>
    ..
    ...
    ..
    </Employees>

    I would like to write the <employee> sections separately into a file instead of writing the whole thing together.I am retrieving the employee details from the database and converting to xml.There are almost 8 lakh records.So marshalling the whole thing in single step will use up my memory.How can i do it?????


    Use Stax API (XMLStreamWriter) as the underlying XML processing thing;
    write <Employees> tag using that, and then pass XMLStreamWriter to
    JAXB Marshaller, marshall employee by employee.
    This is the pattern I use; similarly works well with unmarshalling.
    Not sure if this is in FAQ or not, but it probably should be. 

    posted @ 2013-04-12 19:18 paulwong 閱讀(271) | 評(píng)論 (0)編輯 收藏

    把命令行中的值傳進(jìn)PIG中

    http://wiki.apache.org/pig/ParameterSubstitution


    %pig -param input=/user/paul/sample.txt -param output=/user/paul/output/


    PIG中獲取
    records = LOAD $input;

    posted @ 2013-04-10 15:32 paulwong 閱讀(347) | 評(píng)論 (0)編輯 收藏

    PIG中的分組統(tǒng)計(jì)百分比

    http://stackoverflow.com/questions/15318785/pig-calculating-percentage-of-total-for-a-field

    http://stackoverflow.com/questions/13476642/calculating-percentage-in-a-pig-query

    posted @ 2013-04-10 14:13 paulwong 閱讀(396) | 評(píng)論 (0)編輯 收藏

    CombinedLogLoader

    PIG中的LOAD函數(shù),可以在LOAD數(shù)據(jù)的同時(shí),進(jìn)行正則表達(dá)式的篩選。

    /*
     * Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the
     * NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF
     * licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file
     * except in compliance with the License. You may obtain a copy of the License at
     * 
     * 
    http://www.apache.org/licenses/LICENSE-2.0
     * 
     * Unless required by applicable law or agreed to in writing, software distributed under the License is
     * distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     * See the License for the specific language governing permissions and limitations under the License.
     
    */

    package org.apache.pig.piggybank.storage.apachelog;

    import java.util.regex.Pattern;

    import org.apache.pig.piggybank.storage.RegExLoader;

    /**
     * CombinedLogLoader is used to load logs based on Apache's combined log format, based on a format like
     * 
     * LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
     * 
     * The log filename ends up being access_log from a line like
     * 
     * CustomLog logs/combined_log combined
     * 
     * Example:
     * 
     * raw = LOAD 'combined_log' USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader AS
     * (remoteAddr, remoteLogname, user, time, method, uri, proto, status, bytes, referer, userAgent);
     * 
     
    */

    public class CombinedLogLoader extends RegExLoader {
        // 1.2.3.4 - - [30/Sep/2008:15:07:53 -0400] "GET / HTTP/1.1" 200 3190 "-"
        
    // "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_4; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1"
        private final static Pattern combinedLogPattern = Pattern
            .compile("^(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+.(\\S+\\s+\\S+).\\s+\"(\\S+)\\s+(.+?)\\s+(HTTP[^\"]+)\"\\s+(\\S+)\\s+(\\S+)\\s+\"([^\"]*)\"\\s+\"(.*)\"$");

        public Pattern getPattern() {
            return combinedLogPattern;
        }
    }

    posted @ 2013-04-08 11:28 paulwong 閱讀(283) | 評(píng)論 (0)編輯 收藏

    Analyzing Apache logs with Pig



    Analyzing log files, churning them and extracting meaningful information is a potential use case in Hadoop. We don’t have to go in for MapReduce programming for these analyses; instead we can go for tools like Pig and Hive for this log analysis. I’d just give you a start off on the analysis part. Let us consider Pig for apache log analysis. Pig has some built in libraries that would help us load the apache log files into pig and also some cleanup operation on string values from crude log files. All the functionalities are available in the piggybank.jar mostly available under pig/contrib/piggybank/java/ directory. As the first step we need to register this jar file with our pig session then only we can use the functionalities in our Pig Latin
    1.       Register PiggyBank jar
    REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
    Once we have registered the jar file we need to define a few functionalities to be used in our Pig Latin. For any basic apache log analysis we need a loader to load the log files in a column oriented format in pig, we can create a apache log loader as
    2.       Define a log loader
    DEFINE ApacheCommonLogLoader org.apache.pig.piggybank.storage.apachelog.CommonLogLoader();
    (Piggy Bank has other log loaders as well)
    In apache log files the default format of date is ‘dd/MMM/yyyy:HH:mm:ss Z’ . But such a date won’t help us much in case of log analysis we may have to extract date without time stamp. For that we use DateExtractor()
    3.       Define Date Extractor
    DEFINE DayExtractor org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('yyyy-MM-dd');
    Once we have the required functionalities with us we need to first load the log file into pig
    4.       Load apachelog file into pig
    --load the log files from hdfs into pig using CommonLogLoader
    logs = LOAD '/userdata/bejoys/pig/p01/access.log.2011-01-01' USING ApacheCommonLogLoader AS (ip_address, rfc, userId, dt, request, serverstatus, returnobject, referersite, clientbrowser);
    Now we are ready to dive in for the actual log analysis. There would be multiple information you need to extract out of a log; we’d see a few of those common requirements out here
    Note: you need to first register the jar, define the classes to be used and load the log files into pig before trying out any of the pig latin below
    Requirement 1: Find unique hits per day
    PIG Latin
    --Extracting the day alone and grouping records based on days
    grpd = GROUP logs BY DayExtractor(dt) as day;
    --looping through each group to get the unique no of userIds
    cntd = FOREACH grpd
    {
                    tempId =  logs.userId;
                    uniqueUserId = DISTINCT tempId;
                    GENERATE group AS day,COUNT(uniqueUserId) AS cnt;
    }
    --sorting the processed records based on no of unique user ids in descending order
    srtd = ORDER cntd BY cnt desc;
    --storing the final result into a hdfs directory
    STORE srtd INTO '/userdata/bejoys/pig/ApacheLogResult1';
    Requirement 1: Find unique hits to websites (IPs) per day
    PIG Latin
    --Extracting the day alone and grouping records based on days and ip address
    grpd = GROUP logs BY (DayExtractor(dt) as day,ip_address);
    --looping through each group to get the unique no of userIds
    cntd = FOREACH grpd
    {
                    tempId =  logs.userId;
                    uniqueUserId = DISTINCT tempId;
                    GENERATE group AS day,COUNT(uniqueUserId) AS cnt;
    }
    --sorting the processed records based on no of unique user ids in descending order
    srtd = ORDER cntd BY cnt desc;
    --storing the final result into a hdfs directory
    STORE srtd INTO '/userdata/bejoys/pig/ ApacheLogResult2 ';
    Note: When you use pig latin in grunt shell we need to know a few factors
    1.       When we issue a pig statement in grunt and press enter only the semantic check is being done, no execution is triggered.
    2.       All the pig statements are executed only after the STORE command is submitted, ie map reduce programs would be triggered only after STORE is submitted
    3.       Also in this case you don’t have to load the log files again and again to pig once it is loaded we can use the same for all related operations in that session. Once you are out of the grunt shell the loaded files are lost, you’d have to perform the register and log file loading steps all over again.

    posted @ 2013-04-08 02:06 paulwong 閱讀(356) | 評(píng)論 (0)編輯 收藏

    PIG小議

    什么是PIG
    是一種設(shè)計(jì)語(yǔ)言,通過設(shè)計(jì)數(shù)據(jù)怎么流動(dòng),然后由相應(yīng)的引擎將此變成MAPREDUCE JOB去HADOOP中運(yùn)行。
    PIG與SQL
    兩者有相同之處,執(zhí)行一個(gè)或多個(gè)語(yǔ)句,然后出來(lái)一些結(jié)果。
    但不同的是,SQL要先把數(shù)據(jù)導(dǎo)到表中才能執(zhí)行,SQL不關(guān)心中間如何做,即發(fā)一個(gè)SQL語(yǔ)句過去,就有結(jié)果出來(lái)。
    PIG,無(wú)須導(dǎo)數(shù)據(jù)到表中,但要設(shè)計(jì)直到出結(jié)果的中間過程,步驟如何等等。

    posted @ 2013-04-05 21:33 paulwong 閱讀(359) | 評(píng)論 (0)編輯 收藏

    PIG資源

    Hadoop Pig學(xué)習(xí)筆記(一) 各種SQL在PIG中實(shí)現(xiàn)
    http://guoyunsky.iteye.com/blog/1317084

    http://guoyunsky.iteye.com/category/196632

    Hadoop學(xué)習(xí)筆記(9) Pig簡(jiǎn)介
    http://www.distream.org/?p=385


    [hadoop系列]Pig的安裝和簡(jiǎn)單示例
    http://blog.csdn.net/inkfish/article/details/5205999


    Hadoop and Pig for Large-Scale Web Log Analysis
    http://www.devx.com/Java/Article/48063


    Pig實(shí)戰(zhàn)
    http://www.cnblogs.com/xuqiang/archive/2011/06/06/2073601.html


    [原創(chuàng)]Apache Pig中文教程(進(jìn)階)
    http://www.codelast.com/?p=4249


    基于hadoop平臺(tái)的pig語(yǔ)言對(duì)apache日志系統(tǒng)的分析
    http://goodluck-wgw.iteye.com/blog/1107503


    !!Pig語(yǔ)言
    http://hi.baidu.com/cpuramdisk/item/a2980b78caacfa3d71442318


    Embedding Pig In Java Programs
    http://wiki.apache.org/pig/EmbeddedPig


    一個(gè)pig事例(REGEX_EXTRACT_ALL, DBStorage,結(jié)果存進(jìn)數(shù)據(jù)庫(kù))
    http://www.myexception.cn/database/1256233.html


    Programming Pig
    http://ofps.oreilly.com/titles/9781449302641/index.html


    [原創(chuàng)]Apache Pig的一些基礎(chǔ)概念及用法總結(jié)(1)
    http://www.codelast.com/?p=3621


    !PIG手冊(cè)
    http://pig.apache.org/docs/r0.11.1/func.html#built-in-functions

    posted @ 2013-04-05 18:19 paulwong 閱讀(374) | 評(píng)論 (0)編輯 收藏

    NIO Socket非阻塞模式

    Server socket編程的時(shí)候,一個(gè)SERVER服務(wù)一個(gè)連接的時(shí)候,是阻塞線程的,除非用多線程來(lái)處理。

    NIO只使用一條線程即可以處理多個(gè)連接。是基于事件的模式,即產(chǎn)生事件的時(shí)候,通知客戶端處理相應(yīng)的事件。

    1)server端代碼
        /** 
         *  
         * 
    @author Jeff 
         * 
         
    */  
        
    public class HelloWorldServer {  
          
            
    static int BLOCK = 1024;  
            
    static String name = "";  
            
    protected Selector selector;  
            
    protected ByteBuffer clientBuffer = ByteBuffer.allocate(BLOCK);  
            
    protected CharsetDecoder decoder;  
            
    static CharsetEncoder encoder = Charset.forName("GB2312").newEncoder();  
          
            
    public HelloWorldServer(int port) throws IOException {  
                selector 
    = this.getSelector(port);  
                Charset charset 
    = Charset.forName("GB2312");  
                decoder 
    = charset.newDecoder();  
            }  
          
            
    // 獲取Selector  
            protected Selector getSelector(int port) throws IOException {  
                ServerSocketChannel server 
    = ServerSocketChannel.open();  
                Selector sel 
    = Selector.open();  
                server.socket().bind(
    new InetSocketAddress(port));  
                server.configureBlocking(
    false);  
                server.register(sel, SelectionKey.OP_ACCEPT);  
                
    return sel;  
            }  
          
            
    // 監(jiān)聽端口  
            public void listen() {  
                
    try {  
                    
    for (;;) {  
                        selector.select();  
                        Iterator iter 
    = selector.selectedKeys().iterator();  
                        
    while (iter.hasNext()) {  
                            SelectionKey key 
    = (SelectionKey) iter.next();  
                            iter.remove();  
                            process(key);  
                        }  
                    }  
                } 
    catch (IOException e) {  
                    e.printStackTrace();  
                }  
            }  
          
            
    // 處理事件  
            protected void process(SelectionKey key) throws IOException {  
                
    if (key.isAcceptable()) { // 接收請(qǐng)求  
                    ServerSocketChannel server = (ServerSocketChannel) key.channel();  
                    SocketChannel channel 
    = server.accept();  
                    
    //設(shè)置非阻塞模式  
                    channel.configureBlocking(false);  
                    channel.register(selector, SelectionKey.OP_READ);  
                } 
    else if (key.isReadable()) { // 讀信息  
                    SocketChannel channel = (SocketChannel) key.channel();  
                    
    int count = channel.read(clientBuffer);  
                    
    if (count > 0) {  
                        clientBuffer.flip();  
                        CharBuffer charBuffer 
    = decoder.decode(clientBuffer);  
                        name 
    = charBuffer.toString();  
                        
    // System.out.println(name);  
                        SelectionKey sKey = channel.register(selector,  
                                SelectionKey.OP_WRITE);  
                        sKey.attach(name);  
                    } 
    else {  
                        channel.close();  
                    }  
          
                    clientBuffer.clear();  
                } 
    else if (key.isWritable()) { // 寫事件  
                    SocketChannel channel = (SocketChannel) key.channel();  
                    String name 
    = (String) key.attachment();  
                      
                    ByteBuffer block 
    = encoder.encode(CharBuffer  
                            .wrap(
    "Hello !" + name));  
                      
          
                    channel.write(block);  
          
                    
    //channel.close();  
          
                }  
            }  
          
            
    public static void main(String[] args) {  
                
    int port = 8888;  
                
    try {  
                    HelloWorldServer server 
    = new HelloWorldServer(port);  
                    System.out.println(
    "listening on " + port);  
                      
                    server.listen();  
                      
                } 
    catch (IOException e) {  
                    e.printStackTrace();  
                }  
            }  
        }


    server主要是讀取client發(fā)過來(lái)的信息,并返回一條信息

    2)client端代碼
        /** 
         *  
         * 
    @author Jeff 
         * 
         
    */  
        
    public class HelloWorldClient {  
          
            
    static int SIZE = 10;  
            
    static InetSocketAddress ip = new InetSocketAddress("localhost"8888);  
            
    static CharsetEncoder encoder = Charset.forName("GB2312").newEncoder();  
          
            
    static class Message implements Runnable {  
                
    protected String name;  
                String msg 
    = "";  
          
                
    public Message(String index) {  
                    
    this.name = index;  
                }  
          
                
    public void run() {  
                    
    try {  
                        
    long start = System.currentTimeMillis();  
                        
    //打開Socket通道  
                        SocketChannel client = SocketChannel.open();  
                        
    //設(shè)置為非阻塞模式  
                        client.configureBlocking(false);  
                        
    //打開選擇器  
                        Selector selector = Selector.open();  
                        
    //注冊(cè)連接服務(wù)端socket動(dòng)作  
                        client.register(selector, SelectionKey.OP_CONNECT);  
                        
    //連接  
                        client.connect(ip);  
                        
    //分配內(nèi)存  
                        ByteBuffer buffer = ByteBuffer.allocate(8 * 1024);  
                        
    int total = 0;  
          
                        _FOR: 
    for (;;) {  
                            selector.select();  
                            Iterator iter 
    = selector.selectedKeys().iterator();  
          
                            
    while (iter.hasNext()) {  
                                SelectionKey key 
    = (SelectionKey) iter.next();  
                                iter.remove();  
                                
    if (key.isConnectable()) {  
                                    SocketChannel channel 
    = (SocketChannel) key  
                                            .channel();  
                                    
    if (channel.isConnectionPending())  
                                        channel.finishConnect();  
                                    channel  
                                            .write(encoder  
                                                    .encode(CharBuffer.wrap(name)));  
          
                                    channel.register(selector, SelectionKey.OP_READ);  
                                } 
    else if (key.isReadable()) {  
                                    SocketChannel channel 
    = (SocketChannel) key  
                                            .channel();  
                                    
    int count = channel.read(buffer);  
                                    
    if (count > 0) {  
                                        total 
    += count;  
                                        buffer.flip();  
          
                                        
    while (buffer.remaining() > 0) {  
                                            
    byte b = buffer.get();  
                                            msg 
    += (char) b;  
                                              
                                        }  
          
                                        buffer.clear();  
                                    } 
    else {  
                                        client.close();  
                                        
    break _FOR;  
                                    }  
                                }  
                            }  
                        }  
                        
    double last = (System.currentTimeMillis() - start) * 1.0 / 1000;  
                        System.out.println(msg 
    + "used time :" + last + "s.");  
                        msg 
    = "";  
                    } 
    catch (IOException e) {  
                        e.printStackTrace();  
                    }  
                }  
            }  
          
            
    public static void main(String[] args) throws IOException {  
              
                String names[] 
    = new String[SIZE];  
          
                
    for (int index = 0; index < SIZE; index++) {  
                    names[index] 
    = "jeff[" + index + "]";  
                    
    new Thread(new Message(names[index])).start();  
                }  
              
            }  
        }




    posted @ 2013-03-31 13:38 paulwong 閱讀(357) | 評(píng)論 (0)編輯 收藏

    CSS選擇器

    一個(gè)完整的標(biāo)簽稱為元素,元素里面有屬性名,屬性值。

    選擇器相當(dāng)于WHERE子句,結(jié)果就是返回符合WHERE子句的元素,可能是多個(gè)。

    .class
    class值=class,含有class屬性,且值為class的元素。

    a
    標(biāo)簽名=a,含有標(biāo)簽名為a

    #id
    id值=id,含有屬性名為id,且值為id的元素。

    el.class
    標(biāo)簽名=el and class值=class,含有標(biāo)簽名為el,含有class屬性,且值為class的元素。

    posted @ 2013-03-31 10:26 paulwong 閱讀(232) | 評(píng)論 (0)編輯 收藏

    HTTPCLIENT之COOKIE資源

    Get Cookie value and set cookie value
    http://www.java2s.com/Code/Java/Apache-Common/GetCookievalueandsetcookievalue.hm

    How can I get the cookies from HttpClient?
    http://stackoverflow.com/questions/8733758/how-can-i-get-the-cookies-from-httpclient

    HttpClient 4.x how to use cookies?
    http://stackoverflow.com/questions/8795911/httpclient-4-x-how-to-use-cookies

    Apache HttpClient 4.0.3 - how do I set cookie with sessionID for POST request
    http://stackoverflow.com/questions/4166129/apache-httpclient-4-0-3-how-do-i-set-cookie-with-sessionid-for-post-request

    !!HttpClient Cookies
    http://blog.csdn.net/mgoann/article/details/4057064

    Chapter 3. HTTP state management
    http://hc.apache.org/httpcomponents-client-ga/tutorial/html/statemgmt.html

    !!!contact-list類庫(kù)依賴包之commons-httpclient
    http://flyerhzm.github.com/2009/08/23/contact-list-library-dependencies-of-commons-httpclient/

    posted @ 2013-03-31 09:18 paulwong 閱讀(294) | 評(píng)論 (0)編輯 收藏

    僅列出標(biāo)題
    共115頁(yè): First 上一頁(yè) 67 68 69 70 71 72 73 74 75 下一頁(yè) Last 
    主站蜘蛛池模板: 任你躁在线精品免费| 2022国内精品免费福利视频| 久久夜色精品国产噜噜亚洲a| 亚洲av午夜精品无码专区| 亚洲精品国产av成拍色拍| 一级毛片a免费播放王色| 黄色网站软件app在线观看免费| 好大好深好猛好爽视频免费| 亚洲日韩中文在线精品第一 | 国产精品永久免费| 女性自慰aⅴ片高清免费| 久久亚洲精品成人无码网站| 国产亚洲精品成人AA片| 人妻无码一区二区三区免费| 免费毛片在线视频| 亚洲av中文无码乱人伦在线咪咕 | 亚洲导航深夜福利| 美女尿口扒开图片免费| 在线看片免费人成视频福利| avtt亚洲天堂| 亚洲精品午夜在线观看| 国产亚洲一卡2卡3卡4卡新区| 99久久免费国产精品热| 欧美好看的免费电影在线观看| 亚洲国产精品成人网址天堂 | 国产精品亚洲va在线观看| 一级毛片**不卡免费播| 免费网站看v片在线香蕉| 亚洲一线产区二线产区精华| 午夜精品射精入后重之免费观看 | 亚洲成av人片一区二区三区| 亚洲精品永久在线观看| 久久免费动漫品精老司机 | 亚洲一区免费在线观看| 99视频精品全部免费观看| 亚洲动漫精品无码av天堂| 亚洲AV成人片无码网站| 日本xxxx色视频在线观看免费| 亚洲国产a∨无码中文777| 99久久精品毛片免费播放| 亚洲午夜久久久久久久久电影网 |