收藏 分享(赏)

特定领域知识图谱构建初探.pdf

上传人:HR专家 文档编号:6042062 上传时间:2019-03-25 格式:PDF 页数:38 大小:8.42MB
下载 相关 举报
特定领域知识图谱构建初探.pdf_第1页
第1页 / 共38页
特定领域知识图谱构建初探.pdf_第2页
第2页 / 共38页
特定领域知识图谱构建初探.pdf_第3页
第3页 / 共38页
特定领域知识图谱构建初探.pdf_第4页
第4页 / 共38页
特定领域知识图谱构建初探.pdf_第5页
第5页 / 共38页
点击查看更多>>
资源描述

1、 李涓子 清华大学计算机系知识工程研究室 2 Outline ! Knowledge graph and technologies ! Big scholar knowledge base Aminer II ! Knowledge graph building over enterprise data ! Conclusion 3 The Web 1.0 Connects information Web of documents The Social Web (Web 2.0) Connects People Web of People The Semantic Web Web 3.0 Co

2、nnects Knowledge Web of Data The Ubiquitous Web Connects Intelligence Web of Agents Increasing Connectivity Increasing Knowledge and reasoning Agent Webs that know, learn and reason as human do Vision of future Web 4 Bring structure to the meaningful content of Web pages Annotated Web ages Annotated

3、 Web pages Ontology Annotated Web pages Agent s Agent s The Semantic Web. Tim Berners-Lee, James Hendler, and Ora Lassila. Scientific American, 2001. 5 Philosophy of ontology ! Concept triangle “Tank“ Referent Form Stands for Relates to activates Concept Ogden, Richards, 1923 ? Ontology is the philo

4、sophical study of the nature of being, becoming, existence, or reality, as well as the basic categories of being and their relations. - Wikipedia 6 Some knowledge graphs Google KG 250 concepts 4M instances 6000 properties 500 Triples 350K Cs 10M Is 100 Ps 120M Ts 15K Cs 40M Is 4000 Ps 1BTs Google KB

5、 Core 850K Cs 8M Is 70K Ps 15K Cs 600M Is 20B Ts 50M Ss 50+Ls 262M Ts WordNet 7 Europe Ls Cross lingual links OpenIE (Reverb, OLLIE) NELL 7 Our Knowledge graph definition “ C concepts A group of objects with same properties cars, students, professors “ I - instances A object which belongs to a conce

6、pt Peter is a student “ T ISA subConceptOf, instanceOf “ P properties char instance-attribute-value (AVP) Taxonomy Knowledge Factual knowledge 8 Knowledge graph technologies ! Manually KG building: Wordnet, Cyc, Hownet ! Taxonomy knowledge learning “ Learning from Wikipedia “ Learning beyond Wikiped

7、ia ! Factual knowledge learning “ Learning from Wikipedia “ Learning beyond Wikipedia 9 Learning taxonomy knowledge from Wikipedia ! Category system in Wikipedia “ Category system in Wikipedia as a conceptual network PHILOSOPHY and BELIEF (deals-with?) PHILOSOPHY and HUMANITIES (isa) PHILOSOPHY and

8、SCIENCE (isa) Advantages: “ widely recognized concepts in human minds “ Large scale - over millions of concepts and ten millions of instances “ Large coverage Problems: “ noise categories for different purposes “ inconsistence - not well formally define 10 ! Using linguistic features of isa relation

9、ship “ syntactic parsing: head matching modifier matching/Singular/plural forms “ Lexico- atterns: ! Using structure of wikipedia Deriving a Large Scale Taxonomy from Wikipedia. Ponzetto et al. AAAI 07. ! Using external high quality isa resources wordnet, Hownet, Cilin YAGO(WWW2007) ! isa relation v

10、alidation using cross lingual knowledge links lore (AAAI2014) Learning taxonomy knowledge from Wikipedia 11 Learning taxonomy knowledge beyond Wikipedia ! Using Web sources Root concepts, search engine “ Hearst atterns “ Bootstrapping “ Taxonomy induction (structural learning) domain specific taxono

11、my building EMNLP2010, ACL 2014 ! Large scale taxonomy building “ Automatically generated from Web data “ 1.6 billion web pages “ Rich hierarchy of millions of concepts “ Probabilistic knowledge base SIGMOD2012 “ Probase: 2,653,872 concepts 20,757,545 Isa politicians people presidents George W. Bush

12、, 0.0117 Bill Clinton, 0.0106 George H. W. Bush, 0.0063 Hillary Clinton, 0.0054 Bill Clinton, 0.057 George H. W. Bush, 0.021 George W. Bush, 0.019 12 Factual knowledge learning Supervised Semi-supervised Unsupervised From Wikipedia Sematic annotation Semantify Wikipdia-Kylin Cross lingual IE-WikiCiK

13、E Beyond Wikipedia Distant supervision(Stanford) Coupled Semi-Supervised Learning(NELL) KnowItAll: TextRuner WOE 13 Automatic semantic annotation “ Rule learning based approach Automatically learn annotation rules from the training data “ Classification based approach Identify the boundary of tags i

14、n instances using classification models “ Sequential labeling based approach Consider the dependencies between tags “ Constrained Hierarchical Conditional Random Fields “ And Others . 14 Learning factual knowledge beyond Wikipedia-Knowledge Vault ! 15 Learning factual knowledge beyond Wikipedia-Know

15、ledge Vault ! Motivation “ the new approach should automatically leverage already- cataloged knowledge to build prior models of fact correctness ! Framework TXT: Distant supervision DOM: DOM tree structure features TBL:Table information ANO: annotated tags in htmls Priors: Path ranking algorithm Pri

16、ors: Neural network method 16 Learning factual knowledge beyond Wikipedia-Knowledge Vault 17 ! ! ! 18 Outline ! Knowledge graph and technologies ! Big scholar knowledge base Aminer II ! Knowledge graph building over enterprise data ! Conclusion 19 4 - Researcher profile extraction - Expert finding -

17、 Social network search - Topic browser - Conference analysis - ArnetApp platform 20 Person Search Basic Info. Citation statistics Ego network Research Interests 21 Expert Search Finding experts, for “data mining” Demographics: gender, language, location, etc. Knowledge about “data mining” similar au

18、thors 22 Conference Ranking 23 Reviewer Suggestion Inerest matching COI avoiding Load balancing Forcast review quality 24 Reviewer Suggestion 25 ! Academic Social Network Analysis and Mining systemAMiner (http:/ aminer.org) ! Online since 2006 ! 38 million researcher profiles ! 76 million publicatio

19、n papers ! 241 million requests ! 12.35 Terabyte data ! 100K IP access from 170 countries per month ! 10% increase of visits per month ! Deep analysis, mining, and search AMiner II (ArnetMiner) 26 7.32 million IP from 220 countries/regions User Distribution Top 10 countries 1. USA 6. Canada 2. China

20、 7. Japan 3. Germany 8. Spain 4. India 9. France 5. UK 10. Italy 27 Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research CenterP.O. Box 704Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center19 Skyline DriveHawthorne, NY 10532 USA Email: Ruud M. Bolle was born in Voorburg

21、, The Netherlands. He received the Bachelors Degree in Analog Electronics in 1977 and the Masters Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Masters Degree in Applied Mathematics and in 1984 the Ph.D. in Electri

22、cal Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Visi

23、on Group which is part of the Math Sciences Department. Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications. Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision

24、and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology. DBLP: Ruud Bolle 2006 Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics: A Case Study in Fingerprints. ICPR (4) 2006: 370-373 EE 50

25、 Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint Representation Using Localized Texture Features. ICPR (4) 2006: 521-524 EE 49 Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle: Appearance models for occlusion handling.

26、Image Vision Comput. 24(11): 1233-1243 (2006) EE 48 2005 Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior: The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20 EE 47 Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ra

27、tha: Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116 EE 46 .Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research CenterP.O. Box 704Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center19 Skyline DriveHawthorne, NY 10532 USA Email: Ruud

28、 M. Bolle was born in Voorburg, The Netherlands. He received the Bachelors Degree in Analog Electronics in 1977 and the Masters Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Masters Degree in Applied Mathematics an

29、d in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly fo

30、rmed Exploratory Computer Vision Group which is part of the Math Sciences Department. Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications. Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is A

31、rea Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology. Motivating Example Contact Information Educational history Academic services Publications 1 1 2 2 Ruud Bolle Position Affiliation Address Add

32、ress Email Phduniv Phdmajor Phddate Msuniv Msdate Msmajor Bsuniv Bsdate Bsmajor Research Staff IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Brown University 1984 Electrical Engineering Delft University of Technology Analog Electronics 1977 Delft University of Technolo

33、gy IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA IBM T.J. Watson Research Center Electrical Engineering 1980 Applied Mathematics Msmajor http:/ ecvg/people/bolle.html Homepage Ruud Bolle Name video database indexing video processing visual human-computer interaction biomet

34、rics applications Research_Interest Photo Publication 1# Cancelable Biometrics: A Case Study in Fingerprints ICPR 370 2006 Date Start_page Venue Title 373 End_page Publication 2# Fingerprint Representation Using Localized Texture Features ICPR 521 2006 Date Start_page Venue Title 524 End_page . . .

35、Co-author Co-author 1 Ruud Bolle 2 Publication #3 Publication #5 coautho r coautho r UIUC affiliatio n Professor position 2 1 28 Researcher Social Network Extraction Researcher Homepage Phone Address Email Phduniv Phddate Phdmajor Msuniv Bsmajor Bsdate Bsuniv Affiliation Postion Msmajor Msdate Fax P

36、erson Photo Publication Research_Interest Name Authored Title Publication_venue Start_page End_page Date Coauthor 70.60% of the researchers have at least one homepage or an introducing page 85.6% from universities 14.4% from companies 71.9% are homepages 28.1% are introducing pages 60% are natural l

37、anguage text 40% are in lists and tables 29 CRFs He is a Professor at OTH OTH POS AFF POS OTH POS UIUC OTH POS AFF OTH POS AFF AFF OTH POS AFF ADR AFF ADR - Green nodes are hidden vars, - Purple nodes are observations , 1 ( | ) exp ( , | , ) ( , | , ) () jj e kk v eEj vVk pyx tey x svy x Zx ! “ #$ =

38、+ %& ( )30 ALC FUC AMC PRV RPA DEL PSB PRV DEL PRV DEL AUC AUC ALC FUC AMC AUC ALC FUC AMC PRV DEL AUC ALC FUC AMC of Ruud is Fellow the IEEE Bolle a Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recog

39、nition. Ruud M. Bolle is a Member of the IBM Academy of Technology. of Ruud is Fellow the IEEE ALC FUC AMC PRV RPA DEL PSB PRV DEL PRV DEL AUC AUC ALC FUC AMC AUC ALC FUC AMC Bolle a PRV DEL AUC ALC FUC AMC Processing Flow for Profiling Preprocessin g Determin e Tokens Standard word Special word Ima

40、ge Token Term Punc. mark Labeled data Learning a CRF model Train Test Assigning tags A unified tagging model Model Learning Ta g g i n g Tagging results Inputted docs Feature definitions Document 1 2 3 He obtained his BS in Computer Science in 1999. Ruud M. Bolle is a Fellow of the IEEE. . Science o

41、btained BS Computer ALC RPA PRV ALC FUC his in PRV31 Profiling Results5-fold cross validation Profiling Task Unified Unified_NT SVM Amilcare Photo 89.11 88.64 88.86 31.62 Position 69.44 64.70 64.68 56.48 Affiliation 83.52 72.16 73.86 46.65 Phone 91.10 78.72 79.71 83.33 Fax 90.83 64.28 64.17 86.88 Em

42、ail 80.35 75.47 79.37 78.70 Address 86.34 75.15 77.04 66.24 Bsuniv 67.38 57.56 59.54 47.17 Bsmajor 64.20 59.18 60.75 58.67 Bsdate 53.49 40.59 28.49 52.34 Msuniv 57.55 47.49 49.78 45.00 Msmajor 63.35 61.92 62.10 57.14 Msdate 48.96 41.27 30.07 56.00 Phduniv 63.73 53.11 57.01 59.42 Phdmajor 67.92 59.30

43、 59.67 57.93 Phddate 57.75 42.49 41.44 61.19 Overall 83.37 72.09 73.57 62.30 83.37 32 Outline ! Knowledge graph and technologies ! Big scholar knowledge base Aminer II ! Knowledge graph building over enterprise data ! Conclusion 33 ! Motivation The current constructions of the knowledge graph are ma

44、inly from two aspects: Web, Domains, Science Gene Ontology LOD There is huge demand on knowledge graph building based on internal data of enterprise Knowledge Graph over Enterprise Data 34 ! Building Knowledge Graph based on Mobile Customer Care Documents # Document Parsing based Logical Structure E

45、xtraction # Heuristic Table Extraction # Hierarchical Concept Extraction # Iterative Instance Identification & Property Extraction # Evaluation Throughout Performance Knowledge Graph over Enterprise Data 35 ! Evaluations Document Parsing Evaluation Table Alignment Evaluation via manual evaluation Knowledge Graph Evaluation Coverage Knowledge Graph over Enterprise Data 36 ! Domain data characteristics ! Problem to be solved ! Building pipeline ! ! Visualization and Evaluation “ On each process “ Knowledge base evaluation ! Human interaction

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 企业管理 > 经营企划

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报