1、Advanced AI课程,Autumn 2016,罗平,Mining Frequent and Dominant Patterns,2-2,2,Mining Frequent and Dominant Patterns for Recommendation:,KDD 2012, CIKM 2012 (Best Student Paper),Incorporating occupancy into frequent pattern mining for high quality pattern recommendationLinpeng Tang,Lei Zhang,Ping Luo,Min
2、Wang,CIKM, 2012.pdfblogbib Best Student Paper Award,Harnessing the wisdom of the crowds for accurate web page clippingLei Zhang,Linpeng Tang,Ping Luo,Enhong Chen,Limei Jiao,Min Wang,Guiquan Liu,KDD, 2012.pdfbib,Problem Statement,Propose a new interestingness measureOccupancy,The occupancy of BC =,Pr
3、oblem Statement,Propose a new interestingness measureWeighted Occupancy,The occupancy of BC =,Problem Statement,Dominant patternsoccupancy is bigger than a parameter ,The occupancy of BC BC is dominant when = 0.5,Notice: difference from maximal frequent patterns?,Problem Statement,Qualified patterns
4、, which are both frequent and dominantQualityTask: Top qualified pattern mining,Motivating Application,Web page printing: HP Smart PrintFirst-round recommendation based on Web page analysisManually adjustment required,Not wanted !,Motivating Application,Web page printing: Smart PrintTedious selectio
5、ns,Motivating Application,Goal: one-click solutionMore accurate print-area recommendationSolutionLeverage the print logs from all the users,Print log,Current page,Motivating Application,One piece of print log,A selected print-area = a clip = an item A piece of print log = a set of clips,Motivating A
6、pplication,Print log database,Print Log Database,Transaction Database,Motivating Application,Print-area recommendation: pattern mining problem,Given The transaction database of print logsA query Web page,TaskIdentify all the candidate clips inside the query page, denoted by QFind a subset of Q for r
7、ecommendationSupportOccupancy,Motivating Application,Support,The more frequently a pattern appears in the database, the more number of users select this set of clips for printing.,Motivating Application,Occupancy,The bigger the occupancy is, the more complete the recommendation is.,The occupancy of
8、BC =,Motivating Application,Mining top qualified pattern for recommendation,18,The property of occupancyNeither monotone nor anti-monotone, occu(ABC),Challenges,For a subtree we can give the upper bounds of the occupancy and quality values for all the nodes in this subtree.,Solution,The quality uppe
9、r bound0.6,The upper bound0.6,0.8,Solution,Pruning the search process:If the upper bound on quality is smaller than the current maximal quality value in the search process,Upper Bound Estimation,Upper Bound Estimation,Assume that there is an itemset X in the subtree with the suffix length of u and t
10、he frequency of v,Then, for any itemset X in the subtree,Upper Bound Estimation,Assume that there is an itemset in the subtree with the suffix length of 3 and the support of 3,1,1,Upper Bound Estimation,Three different upper bound functions,Experiments,EffectivenessDoes the concept of occupancy help
11、 to improve the recommendation performance? EfficiencyDoes our algorithm with the proposed pruning strategy can significantly reduce running time?,Experiments,Evaluation on effectivenessGround truth2000 Webpages from 100 printworthy WebsitesEvaluation methodLeave one out cross validation for each we
12、bpageEvaluation measure,Experiments,Evaluation on effectiveness,The recommendation performance of our method (=0.05 =0.1),Best,Frequency only,frequency+occupancy,increase,decrease,by 14%,Experiments,Evaluation on efficiencyProblem settingSize of transaction database,Business Impact,HP Product: Smart
13、 PrintMore than one million downloads and usages5 USA patents filed,Other Applications,Occupancy for sequential pattern mining,30,Travel package recommendation,作业,设计一个Transaction Database 画出它的lexicographic subset tree给定min_sup, min_occu的阈值,计算出在这些阈值下的frequent patterns, dominant patterns, maximal frequent patterns要求:这个例子中,必须包含一个pattern,使得:它是maximal frequent pattern,但它不是dominant pattern,随机点名五人,随机点名号码郭彤蕾,鹿强,蔡坤桥,彭燕,吴水琴,