- 无标题文档
查看论文信息

中文题名:

 基于改进数据存储结构及编码方法的频繁子图查询算法研究    

姓名:

 贾鑫    

学号:

 1049721201874    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 070102    

学科名称:

 计算数学    

学生类型:

 硕士    

学位:

 理学硕士    

学校:

 武汉理工大学    

院系:

 理学院    

专业:

 计算数学    

研究方向:

 复杂系统的建模与仿真    

第一导师姓名:

 楚扬杰    

第一导师院系:

 武汉理工大学    

完成日期:

 2014-10-09    

答辩日期:

 2014-12-05    

中文关键词:

 图挖掘 ; 图查询 ; 频繁子图    

中文摘要:

近年来,随着数据库及其管理技术的迅速发展以及信息技术的应用与推广,数据挖据技术应运而生。它是从大量的模糊数据中筛选出有效的、隐含在其中的、可信的、对决策有用的知识和规律的高级处理过程。它内涵丰富、涉及范围广泛,在很多领域都有实际应用,是当前专门的研究人员以及公司的专家和技术人员研究的一个热点。图结构具有丰富的表示形式,直观且容易理解,我们可以利用它的普适性来描述世间万物及其之间的复杂关系。由于其广泛应用于化学信息学、生物信息学、以及无线传感网络等,图挖掘技术更是得到了大量的重视和研究。本文主要研究了频繁子图查询技术,对李先通的GraphGen算法进行了改进,算法主要是从两个方面进行改进的,具体内容如下: 一方面,引入了一种快速获取最小DFS编码的方法。在算法的执行过程中,每一次扩展边都会生成新的频繁子树,要获取其最小DFS编码需要不断扫描图集,在本文中,我们将采用最右扩展的方式来扩展邻接边,使得扩展之后不用扫描图集而直接获取频繁子树的最小DFS编码。另一方面,引入了一种高效的存储结构ADI++存储结构,借助ADI++结构的边表可以快速获取频繁边,并且根据边表及引理3.1和引理3.2,可以避免直接的子图同构的判断,使得算法的效率得到进一步提高。 最后通过合成数据集以及真实数据集上的实验性能分析,对算法的准确性进行了验证,实验表明改进的MyFmin算法的执行效率确实要高于算法GraphGen。

参考文献:

[1] 数据挖掘: 概念与技术. 范明, 孟小峰译. 北京: 机械工业出版社, 2007: 79-165.

[2] 秦亮曦,史忠植. 基于排序FP-树的最大频繁模式挖掘算法[J]. 计算机研究与发展, 2005,42(2): 217-223.

[3] Lam,Winnie W.M, Chan, Keith C.C. A graph mining algorithm for classifying chemical

compounds. Proceedings-IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008: 321-324.

[4] LB.Holder, DJ.Cook and S.njoko.

Substructure Discovery in the Subdue System. Proceedings of ACM SIGKDD the 1994 International Conference on Knowledge Discovery in Database(KDD’94), July 1994:169-180.

[5] C. Borgelt and M.R. Berthold, “Mining Molecular Fragments: Finding Relevant Substructures of Molecules,”Proc. 2002 IEEE

[6] S.Raghavan and H.Garcia-Molina. Representing web graphs. Proc. of thte IEEE Intl.Conference on Data Mining, 2003

[7] L.Dehaspe,H.Toivonen and R.D. King. Finding Frequent Substructures in Chemical

Compounds. Proceedings of ACM SIGKDD the 1998 International Conference on Knowledge

Discovery in Database(KDD’98), R. Agrawal, P. Stolorz and G. Piatetsky-Shapiro(Eds.),

AAAI Press, 1998:30-36.

[8] A.Inokuehi, T.Washio, T.Okada. An Apriori based algorithm for mining frequent substructures from graph data[C]. Proc. of the PKDD-2000. LNAI 1910, 2000:13一23.

[9] A. Inokuehi, T.K.Nishimura, a H.Motod. A Fast Algorithm for Mining Frequent Connected

Subgraph[R]. Technical Report RT0448. IBM Researeh. Tokyo Research Laboratory, 2002,11(11):20-31.

[10] M. Kuramochi, G. Karypis. Frequent Subgraph Discovery[C]//IEEE International

Conference on Data Mining (ICDM). San Jose, California, USA: IEEE Computer Society Press, 2001:313–326.

[11] M. Kuramochi and G. Karypis. Finding Frequent Pattems in a Large Sparse Graph [C].

Proceedings of SAIM the 2004 International Conference on Data Mining(SDM,04), Toronto,

Canada, 2004:39-53.

[12] N.Vanetik ,E.Gudes and S.E.Shimony. Computing Frequent Graph patterns from

Semistructured Data. Proeeedings of IEEE the 2002 International Conference on Data Mining(ICDM,02), 2002, 458-465.

[13] G. Ehud, S. E. Shimony, V. Natalia. Discovering Frequent Graph Patterns Using Disjoint Paths[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(11):1441-

1456.

[14] E. Gudes, S. E. Shimony, and N. Vanetik. Discovering frequent graph patterns using disjoint

paths. IEEE Trans. Knowl.Data Eng. , 2006, 18(11):1441–1456.

[15] 刘振, 频繁子图挖掘算法的研究与应用[D]. 长沙: 中南大学, 2009.

[16] 张伟, 频繁子图挖掘算法的研究[D]. 秦皇岛: 燕山大学, 2011.

[17] 王文龙, 一种高效的不确定图数据库上频繁子图模式挖掘算法[D]. 哈尔滨: 哈尔滨工

业大学, 2013.

[18] X. Yan and J. Han: gSpan: Graph based Substructure patterns Mining [C].

Proceedings of IEEE the 2002 International Conference on Data Mining(ICDM. 2002), Maebashi, Japan, 2002: 721-724.

[19] X. Yan and J. Han. Close graph: Mining Closed Frequent Graph Patterns[C].

Proceedings of ACM SIGKDD the 2003 International Conference on Data Mining

and Knowledge Discovery(KDD-03), Washin-gton, USA 2003: 286-295.

[20] J. Huan, W. Wang and J. Prins, Efficient Mining of Frequent Subgraphs in the

Presence of Isomorp-hism[C]. Proceedings of IEEE the 2003 International Conference on Data Mining(ICDM, 03), Florida, U-SA, 2003: 549-552.

[21] C. Wang, W. Wang, J. Pei, et al. Scalable Mining of Large Disk-based Graph

Databases [C] // Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Seattle, Washington, USA: ACM Press, 2004: 316-325.

[22] S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a

difference. In KDD, 2004:647–652.

[23] J.Huan, W. Wang, J. Prins, and J.Yang. SPIN: mining maximal frequent subgraphs

from graph datab-ases.Proc. Of KDD,2004.

[24] 邹兆年, 李建中, 高宏, 张硕从不确定图中挖掘频繁子图模式[J]. 软件学报, 2009, 20

(11): 2965-2976.

[25] 邹晓红, 用于图分类的频繁字结构挖掘算法研究[D]. 秦皇岛: 燕山大学, 2011.

[26] Chi Y., Yang Y., Muntz R. HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted T-rees and Free Trees Using Canonical Form. Technical Report

NCSC-005, Department of Computer Scien-ce, University of Califormia, Los Angeles,CA,USA, 1996: 1~10.

[27] An Improved Unique Canonical Labeling for Frequent Subgraph Mining.

[28] Agarwal R C, Agarwal C C. A tree projeetion algorithm for generation of frequent itemsets. Journa lof Parallel and Distributed ComPuting, 2001, 61(3): 350-37lP.

[29] M. Kuramochi and G. Karypis. Finding Frequent Patterns in a Large Sparse Graph. Proc.of SAIM the 2004 Intemmional Conference on Data Mining, 2004.

[30] 刘静, 基于多层索引结构的频繁子图挖掘算法研[D]. 西安: 陕西师范大学, 2008.

[31] 李先通, 图数据查询技术的研究[D]. 哈尔滨: 哈尔滨工业大学, 2009.

[32] 李海波, 频繁子结构挖掘算法研究与应用[D]. 武汉: 华中科技大学, 2011.

[33] M. Wo¨rlein. Extension and

Parallelization of a Graph-mining-algorithm[D]. Erlangen-Nurnberg, Germany:Friedrich-Alexander-Universita¨t, 2006:1–76.

[34] Y. Liu, J. Li, H. Gao. Summarizing Graph Patterns [C] // Proceedings of the 24th

International Conference on Data Engineering (ICDE). Cancun, Mexico: IEEE

Computer Society Press, 2008:903–912.

[35] J. Huan, W. Wang, J. Prins, et al. Spin: Mining Maximal Frequent Subgraphs from Graph Databases[C]//Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Seattle, Washington, USA: ACM Press, 2004:581–586.

[36] L. T. Thomas, S. R. Valluri, K.

Karlapalem. Margin: Maximal Frequent Subgraph

Mining[C]//Proceedings of the 6th IEEE International Conference on Data Mining

(ICDM). Hong Kong, China: IEEE Computer Society Press, 2006:1097–1101.

[37] Thomas L T, Valluri S R and Karlapalem K.Margin: Maximal frequent subgraphs

mining[C]. Proceedings of the 6th

International Conference on Data Mining

(ICDM’06). Hong-Kong, Ching, 2006: 1097-1101.

[38] Yuanyuan Tian, Richard A. Hankins and Jignesh M. Patel. Efficient Aggregation for

Graph Summarization[C]. SIGMOD Conference, Vancouver, BC, Canada, 2008: 433-444.

[39] S. Navlakha, R. Rastogi and N.Shrivastava

. Graph summarization with bounded error[C]. SIGMOD Conference, Vancouver, BC, Canada, 2008: 419-432.

[40] ChenChen, Cinde X. Lin Matt Fredrikson and etc. Mining Graph Patterns Efficiently

via Randomized Summaries[C]. VLDB Conference, Lyon, France, 2009: 253-264.

[41] Zeng Z, Wang J, Zhang J, et al. FOGGER: An algorithm for graph generator discovery[C]//Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, 2009: 517-528.

[42] M.Kirsten,S.Wrobel. Relational Distance-based Clustering. Proceedings of the Eighth

International Conference on Inductive Logic Programming, Berlin:Springer, 1998: 261-270.

[43] R. Agrawal and R. Srikant. Fast

algorithms for mining association rules. In VLDB’94, pages 487–499, Sept. 1994.

[44] 21世纪高等学校规划教材·电子商务_电子商务专业导论. 赵守香等. 北京: 清华大学出

版社.

中图分类号:

 TP311.13    

馆藏号:

 TP311.13/1874/2014    

备注:

 403-西院分馆博硕论文库;203-余家头分馆博硕论文库    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式