收藏 分享(赏)

Regular Expression与Java.ppt

上传人:dreamzhangning 文档编号:2811388 上传时间:2018-09-28 格式:PPT 页数:50 大小:2.04MB
下载 相关 举报
Regular Expression与Java.ppt_第1页
第1页 / 共50页
Regular Expression与Java.ppt_第2页
第2页 / 共50页
Regular Expression与Java.ppt_第3页
第3页 / 共50页
Regular Expression与Java.ppt_第4页
第4页 / 共50页
Regular Expression与Java.ppt_第5页
第5页 / 共50页
点击查看更多>>
资源描述

1、Regular Expression與Java,蕭宇程 http:/swanky.adsldns.org/,Introduction to Regular Expressions Regular Expression Syntax Object Models Using regexes in Java,Part 1: Introduction to Regular Expressions,Regular expressions are the key to powerful, flexible, and efficient text processing.,Searching Text Fi

2、les: egrep,The Filename Analogy,In DOS/Windows dir *.txt *、?:file globs or wildcards* :match anything? :match any one character,Generalized Pattern Language,Regular Experssions:Powerful pattern language(generalized pattern language) and the patterns themselves,The Language Analogy,Regular Expression

3、s are composed of: Metacharacters (special characters) Literal (normal text characters)Literal text acting as the words and metacharacters as the grammar.,Part 2 Regular Expression Syntax,Regular Expression測試小程式 http:/.tw/blog/archives/ciyawasay/000381.html Regular Experssion的投影片與範例檔 http:/.tw/blog/

4、archives/ciyawasay/000394.html,Start and End of the Line,Start: (caret) End: $ (dollar) cat 、cat 、cat$Match a position in the line rather than any actual text characters themselves.Question:cat$ 、$ 、各代表什麼意思?,Character Classes,Matching any one of several characters Negated character classes ,Matching

5、 any one of several characters, grey、gray greay0-9character-class metacharacter - (dash) 0-9a-fA-F = A-Fa-f0-9A dash is a metacharacter only within a character class otherwise it matches the normal dash character.,Negated character classes, Matches any character that isnt listed.1-6 matches a charac

6、ter thats not 1 through 6.Question:Why doesnt qu match Qantas or Iraq,Character Class Notes,A character class, even negated, still requires a character to match. Consider character classes as their own mini language. The rules regarding which metacharacters are supported (and what they do) are compl

7、etely different inside and outside of character classes.,Matching Any Character with Dot,. (dot 、point) Matches any character 03.19.76 03a197603/19/76 、03-19-76 、03.19.76 03-./19-./76The dots are not metacharacters within a character class .-/ would be a mistake,Alternation Matching any one of sever

8、al subexperssions,| (or 、bar) Combine multiple experssions into a single expression that matches any of the individual ones. greay = grey|gray = gr(a|e)ygra|ey:Wrong! Within a class, the | character is just a normal character.,Question,Jeffrey|Jeffery (Geoff|Jeff)(rey|ery) (From|Subject|Date): Start

9、-of-line, followed by F、r、o、m, followed by : Start-of-line, followed by S、u、b、j、e、c、t, followed by : Start-of-line, followed by D、a、t、e, followed by :,Character Class & Alternation,A character class can match just a single character in the target text. With alternation, since each alternative can be

10、 a full-fledged regular expression in and of itself, each alternative can match an arbitrary amount of text.,Character Class & Alternation,Claracter classes are almost like their own special mini-language (with their own ideas about metacharacters, for example) While alternation is part of the “main

11、” regular expression language.,Word Boundaries,b () Match the position at the start and end of a word (word-based versions of and $) bcatb 、bcat 、catb,Optional Items,? (question mark) color|colour colou?r Optional:placed after the character that is allowed to appear at that point in the experssion,

12、but whose existence isnt actually required to still be considered a successful match. ? can attach to a parenthesized expression. 4th|4 4(th)?,Other Quantifiers: Repetition,+ (plus) One or more of the immediately-preceding item * (asterisk 、star) Any number, including none, of the item Exercrise:the

13、 size part is optional.,Defined range of matches: intervals,min,max (interval quantifier) a-zA-Z1,5,Parentheses and Backreferences,Parentheses can “remember” text matched by the subexpression they enclose. Backreferencing:match new text that is the same as some text matched earlier in the expression

14、. Doubled-word problem:thethe b(A-Za-z+)+1b(a-z)(0-9)12,The Great Escape, (escape) When a metacharacter is escaped, it loses its special meaning and becomes a literal character. (very) (a-zA-Z+) Not escape: 、1,Part 3 Object Models,Tasks need to be done in using a regular expression:,Setup . . .Acc

15、ept a string as a regex; compile to an internal form.Associate the regex with the target text. Actually apply the regex . . .Initiate a match attempt. See the results . . .Learn whether the match is successful.Gain access to further details of a successful attempt.Query those details (what matched,

16、where it matched, etc.). You might repeat them from 3. to find the next match in the target string.,An “all-in-one” model,An “match state” model (Java),Pattern:Represents a compiled regular expression. Matcher:Has all of the state associated with applying a Pattern object to a particular string.,An

17、“match result” model,Part 4 Using regexes in Java,public final class Pattern / Flags values (or together) public static final int UNIX_LINES, CASE_INSENSITIVE, COMMENTS,MULTILINE, DOTALL, UNICODE_CASE, CANON_EQ; / Factory methods (no public constructors) public static Pattern compile(String patt); p

18、ublic static Pattern compile(String patt, int flags); / Method to get a Matcher for this Pattern public Matcher matcher(CharSequence input); / Information methods public String pattern(); public int flags(); / Convenience methods public static boolean matches(String pattern, CharSequence input); pub

19、lic String split(CharSequence input); public String split(CharSequence input, int max); ,Java API,public final class Matcher / Action: find or match methods public boolean matches(); public boolean find(); public boolean find(int start); public boolean lookingAt(); / “Information about the previous

20、match“ methods public int start(); public int start(int whichGroup); public int end(); public int end(int whichGroup); public int groupCount(); public String group(); public String group(int whichGroup); ,public final class Matcher / Reset methods public Matcher reset(); public Matcher reset(CharSeq

21、uence newInput); / Replacement methods public Matcher appendReplacement(StringBuffer where, String newText); public StringBuffer appendTail(StringBuffer where); public String replaceAll(String newText); public String replaceFirst(String newText); / information methods public Pattern pattern(); ,/* S

22、tring, showing only the RE-related methods */ public final class String public boolean matches(String regex); public String replaceFirst(String regex, String newStr); public String replaceAll(String regex, String newStr); public String split(String regex); public String split(String regex, int max);

23、 ,SimpleRegexText.java,import java.regex.Pattern; import java.regex.Matcher;public class SimpleRegexText public static void main(String args)String sampleText = “this is the 1st test string“;String sampleRegex = “d+w+“;Pattern p = Ppile(sampleRegex);Matcher m = p.matcher(sampleText);if(m.find()Strin

24、g matchedText = m.group();int matchedFrom = m.start();int matchedTo = m.end();System.out.println(“matched “ + matchedText + “ from “ + matchedFrom + “ to “ + matchedTo + “.“); else System.out.println(“didnt match“); matched 1st from 12 to 15.,範例:取出英文單字 (取自 Thinking in Java) /: c12:FindDemo.java impo

25、rt java.util.regex.*; import com.bruceeckel.simpletest.*; import java.util.*; public class FindDemo private static Test monitor = new Test();public static void main(String args) Matcher m = Ppile(“w+“).matcher(“Evening is full of the linnets wings“);while(m.find()System.out.println(m.group();monitor

26、.expect(new String “Evening“,“is“,“full“,“of“,“the“,“linnet“,“s“,“wings“); /:,import java.util.regex.*;/* Split a String into a Java Array of Strings divided by an RE*/ public class Split public static void main(String args) String x = Ppile(“ian“).split(“the darwinian devonian explodianchicken“);fo

27、r (int i=0; ix.length; i+) System.out.println(i + “ “ + xi + “); ,0 “the darwin“ 1 “ devon“ 2 “ explod“ 3 “chicken“,import java.util.regex.*; /* Quick demo of RE substitution: correct “demon“ and other* spelling variants to the correct, non-satanic “daemon“.*/ public class ReplaceDemo public static

28、void main(String argv) / Make an RE pattern to match almost any form (deamon, demon, etc.).String patt = “dae1,2mon“; / i.e., 1 or 2 a or e any combo/ A test input.String input = “Unix hath demons and deamons in it!“;System.out.println(“Input: “ + input);/ Run it from a RE instance and see that it w

29、orksPattern r = Ppile(patt);Matcher m = r.matcher(input);System.out.println(“ReplaceAll: “ + m.replaceAll(“daemon“);/ Show the appendReplacement methodm.reset();StringBuffer sb = new StringBuffer();System.out.print(“Append methods: “);while (m.find() / copy to before first match, plus the word “daemon“m.appendReplacement(sb, “daemon“);m.appendTail(sb); / copy remainderSystem.out.println(sb.toString(); ,Input: Unix hath demons and deamons in it! ReplaceAll: Unix hath daemons and daemons in it! Append methods: Unix hath daemons and daemons in it!,The End,謝謝大家! 有問題歡迎到506研究室找我一起研究 ,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 网络科技 > Java

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报