鏈漢鍦ㄥ紑鐧肩珯鍐呮悳绱㈡椂錛岄渶瑕佸姞鍏ュ鍚勭被鏂囦歡綾誨瀷鐨勬敮鎸侊紝浠ユ柟渚垮緩绔嬬儲寮曘傝В鏋愬悇綾繪枃妗h皟鐢ㄤ簡鍑犲嬪紑婧愮殑鍖呫傚儚dom4j-1.6.1.jar錛孎ontBox-0.1.0-dev.jar錛宧tmllexer.jar錛宧tmlparser.jar錛孭DFBox-0.7.3.jar錛宲oi-3.5-FINAL-20090928.jar錛宲oi-scratchpad-3.5-FINAL-20090928.jar銆傝繖浜涘紑婧愮殑鍖呭彲浠ヨ畵鎴戝戝緢鏂逛究鍘昏В鏋愬悇綾婚潪緇撴瀯鍖栨枃鏈?br />
jar鍖呯殑涓嬭澆鍦板潃錛?nbsp;http://www.ziddu.com/download/7017588/devlib.rar.html
浠g爜濡備笅錛?br />
package com.ducklyl;
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import org.apache.poi.hslf.model.Slide;
import org.apache.poi.hslf.model.TextRun;
import org.apache.poi.hslf.usermodel.SlideShow;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.dom4j.Document;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;
import org.htmlparser.Parser;
import org.htmlparser.filters.*;
import org.htmlparser.*;
import org.htmlparser.nodes.TextNode;
import org.htmlparser.util.*;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFTextStripper;
public class HandleFile {
public static void main(String args[]){
String str="e:\\test.HTML";
System.out.println(handleFile(str));
}
public static String handleFile(String filename){
String result="";
String fileType=filename.substring(filename.lastIndexOf(".")+1, filename.length());
if(fileType.equalsIgnoreCase("pdf"))
result=handlePdf(filename);
else if(fileType.equalsIgnoreCase("xls"))
result=handleExcel(filename);
else if(fileType.equalsIgnoreCase("doc"))
result=handleDoc(filename);
else if(fileType.equalsIgnoreCase("xml"))
result=handleXml(filename);
else if(fileType.equalsIgnoreCase("ppt"))
result=handlePPT(filename);
else if(fileType.equalsIgnoreCase("htm")||fileType.equalsIgnoreCase("html"))
result=handleHtml(filename);
return result;
}
/**
* 瑙f瀽HTML
* @param filename
* @return
*/
public static String handleHtml(String filename){
String content="";
try{
File file=new File(filename);
if(!file.exists()) return content;
Parser parser=new Parser(filename);
parser.setEncoding("UTF-8");
NodeFilter textFilter=new NodeClassFilter(TextNode.class);
NodeList nodes=parser.extractAllNodesThatMatch(textFilter);
for(int i=0;i<nodes.size();i++){
TextNode textnode=(TextNode)nodes.elementAt(i);
String line=textnode.toPlainTextString().trim();
if(line.equals("")) continue;
content=content+line;
}
}catch(Exception e){
e.printStackTrace();
}
return content;
}
/**
* 瑙f瀽PPT
* @param filename
* @return
*/
public static String handlePPT(String filename){
StringBuffer content = new StringBuffer("");
try{
File file=new File(filename);
if(!file.exists()) {
return content.toString();
}
FileInputStream instream=new FileInputStream(file);
SlideShow ppt = new SlideShow(instream);
Slide[] slides = ppt.getSlides();
for(int i=0;i<slides.length;i++){
TextRun[] t = slides[i].getTextRuns();//涓轟簡鍙栧緱騫葷伅鐗囩殑鏂囧瓧鍐呭錛屽緩绔婽extRun
for(int j=0;j<t.length;j++){
content.append(t[j].getText());//榪欓噷浼氬皢鏂囧瓧鍐呭鍔犲埌content涓幓
}
content.append(slides[i].getTitle());
}
}catch(Exception e){
e.printStackTrace();
}
return content.toString();
}
/**
* 瑙f瀽XML
* @param filename
* @return
*/
public static String handleXml(String filename){
String content="",value="",text="";
try{
File file=new File(filename);
if(!file.exists()) {
return content;
}
SAXReader saxReader = new SAXReader();
Document document = saxReader.read(file);
Element root = document.getRootElement() ;
Iterator iter=root.elementIterator() ;
while(iter.hasNext()){
Element element=(Element)iter.next();
value=element.getStringValue();
if(!value.trim().equals("")) content=content+value;
}
}catch(Exception e){
e.printStackTrace();
}
return content;
}
/**
* 瑙f瀽DOC
* @param filename
* @return
*/
public static String handleDoc(String filename){
String content="";
try{
File file=new File(filename);
if(!file.exists()) {
return content;
}
FileInputStream instream=new FileInputStream(file);
HWPFDocument doc=new HWPFDocument(instream);
Range range=doc.getRange();
String text=range.text();
for(int i=0;i<range.numParagraphs();i++){
Paragraph p=range.getParagraph(i);
content=content+p.text().trim()+"\n";
}
}catch(Exception e){
e.printStackTrace();
}
return content;
}
/**
* 瑙f瀽PDF
* @param filename
* @return
*/
public static String handlePdf(String filename){
String contenttxt="";
try{
File file=new File(filename);
if(!file.exists()){
return contenttxt;
}
FileInputStream instream=new FileInputStream(file);
PDFParser parser=new PDFParser(instream);
parser.parse();
PDDocument pdfdocument=parser.getPDDocument();
PDFTextStripper pdfstripper=new PDFTextStripper();
contenttxt=pdfstripper.getText(pdfdocument);
}catch(Exception e){
e.printStackTrace();
}
return contenttxt;
}
/**
* 瑙f瀽EXCEL
* @param filename
* @return
*/
public static String handleExcel(String filename){
String content="";
try{
File file=new File(filename);
if(!file.exists()) {
return content;
}
HSSFWorkbook workbook=new HSSFWorkbook(new FileInputStream(file));
HSSFSheet sheet=workbook.getSheetAt(0);
for(int i=0;i<workbook.getNumberOfSheets();i++){
sheet=workbook.getSheetAt(i);
if(sheet!=null){
for(int m=0;m<sheet.getLastRowNum();m++){
HSSFRow row=sheet.getRow(m);
if(row==null) break;
for(int n=0;n<row.getLastCellNum();n++){
HSSFCell cell=row.getCell(n);
if(cell==null) break;
int type=cell.getCellType();
switch(type){
case 0:
content=content+cell.getNumericCellValue();
break;
case 1:
content=content+cell.getStringCellValue();
break;
case 3:
break;
default:
;
}
}
content=content+"\n";
}
}
content=content+"\n";
}
}catch(Exception e){
e.printStackTrace();
}
return content;
}
}
涓嶆兂鎷瘋礉鐨勬湅鍙嬪彲浠ョ洿鎺ヤ笅杞芥簮浠g爜錛?a class="normal12blue">http://www.ziddu.com/download/7017614/src.txt.html
浠ヤ笂浠g爜姣旇緝綆鍗曪紝灝變笉浣滆鏄庯紝甯屾湜鑳藉公鍒伴渶瑕佺敤鐨勬湅鍙嬨傚綋鐒朵笂闈㈠彧鏄竴鍊嬬畝鍗曠殑渚嬪瓙錛屽鏋滆鍏蜂綋搴旂敤錛屽ぇ瀹跺彲浠ヨ嚜宸卞啀鏀瑰啓銆傚鏋滀綘鏈夊叾瀹冪殑鎯蟲硶錛屾榪庡垎浜綘鐨勭簿褰╂兂娉曘?br />
杞澆璇鋒敞鏄?font size="3">鍑哄

]]>