<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    posts - 431,  comments - 344,  trackbacks - 0
    原文來自:http://chemhack.com/cn/2008/11/faster-fingerprint-search-with-java-cdk/

    Rich Apodaca wrote a great serious posts named Fast Substructure Search Using Open Source Tools providing details on substructure search with MySQL. But, however, poor binary data operation functions of MySQL limited the implementation of similar structure search which typically depends on the calculation of Tanimato coefficient. We are going to use Java & CDK to add this feature.

    As default output of CDK fingerprint, java.util.BitSet with Serializable interface is perfect data format of fingerprint data storage. Java itself provides several collections such as ArrayList, LinkedList, Vector class in package Java.util. To provide web access to the search engine, thread unsafe ArrayList and LinkedList have to be kicked out. How about Vector? Once all the fingerprint data is well prepared, the collection  function we need to do similarity search is just iteration. No add, no delete. So, a light weight array is enough.

    Most of the molecule information is stored in MySQL database, so we are going to map fingerprint to corresponding row in data table. Here is the MolDFData class, we use a long variable to store corresponding primary key in data table.

    public class MolDFData implements Serializable {
        private long id;
       private BitSet fingerprint;
        public MolDFData(long id, BitSet fingerprint) {
            this.id = id;
            this.fingerprint = fingerprint;
        }
        public long getId() {
            return id;
        }
        public void setId(long id) {
            this.id = id;
        }
        public BitSet getFingerprint() {
            return fingerprint;
        }
        public void setFingerprint(BitSet fingerprint) {
            this.fingerprint = fingerprint;
        }
    }

    This is how we storage our fingerprints.

    private MolFPData[] arrayData;

    No big deal with similarity search. Just calculate the Tanimoto coefficient, if it’s bigger than minimal  similarity you set, add this one into result.

        public List searchTanimoto(BitSet bt, float minSimlarity) {

            List resultList = new LinkedList();
            int i;
            for (i = 0; i < arrayData.length; i++) {
                MolDFData aListData = arrayData[i];
                try {
                    float coefficient = Tanimoto.calculate(aListData.getFingerprint(), bt);
                    if (coefficient > minSimlarity) {
                        resultList.add(new SearchResultData(aListData.getId(), coefficient));
                    }
                } catch (CDKException e) {
                }
                Collections.sort(resultList);
            }
            return resultList;
        }
    Pretty ugly code?  Maybe. But it really works, at a acceptable speed.

    Tests were done using the code blow on a macbook(Intel Core Due 1.83 GHz, 2G RAM).

    long t3 = System.currentTimeMillis();
    List<SearchResultData> listResult = se.searchTanimoto(bs, 0.8f);
    long t4 = System.currentTimeMillis();
    System.out.println("Thread: Search done in " + (t4 - t3) + " ms.");

    In my database of 87364 commercial compounds, it takes 335 ms.

    posted on 2009-10-18 14:09 周銳 閱讀(505) 評論(0)  編輯  收藏 所屬分類: ChemistryJavaCDK
    主站蜘蛛池模板: 四虎影院在线免费播放| 久久精品国产亚洲av瑜伽| 2022免费国产精品福利在线 | 亚洲AV综合色区无码一区| 一区二区三区AV高清免费波多| 国产成人免费手机在线观看视频 | 欧美激情综合亚洲一二区| 国产免费阿v精品视频网址| 国产亚洲成归v人片在线观看| h片在线观看免费| 亚洲午夜未满十八勿入网站2| 亚洲乱码日产精品BD在线观看| 91免费播放人人爽人人快乐| 伊人久久亚洲综合影院| 九九久久国产精品免费热6| 91香蕉成人免费网站| 亚洲日韩av无码中文| 99精品视频在线观看免费播放| 亚洲制服中文字幕第一区| 最近最好最新2019中文字幕免费| 亚洲AV无码国产丝袜在线观看| 少妇人妻偷人精品免费视频| 亚洲国产精品碰碰| 亚洲午夜无码久久久久软件| 小小影视日本动漫观看免费| 人成电影网在线观看免费| 国内一级一级毛片a免费| 免费无遮挡无遮羞在线看| 免费观看的毛片手机视频| 亚洲嫩草影院在线观看| 日美韩电影免费看| 日韩亚洲产在线观看| 免费不卡中文字幕在线| 亚洲综合激情五月丁香六月| 国产香蕉免费精品视频| 国产成人亚洲午夜电影| 亚洲视频在线观看| 无码一区二区三区免费| 亚洲精品成人无码中文毛片不卡| 在线观看免费视频资源| 亚洲精品中文字幕无乱码|