Top 10 Questions for Java Regular Expression
2013-10-31 13:11
411 查看
This post summarizes the top questions asked about Java regular expressions. As they are most frequently asked, you may find that they are also very useful.
1. How to extract numbers from a string?
One common question of using regular expression is to extract all the numbers into an array of integers.
In Java,
errors introduced by malformed character classes. Please refer toPredefined character
classes for more details. Please note the first backslash
If you are using an escaped construct within a string literal, you must precede the backslash with another backslash for the string to compile. That’s why we need to use
2. How to split Java String by newlines?
There are at least three different ways to enter a new line character, dependent on the operating system you are working on.
Unix
Therefore the most straightforward way to split string by new lines is
But if you don’t want empty lines, you can use, which is also my favourite way:
A more robust way, which is really system independent, is as follows. But remember, you will still get empty lines if two newline characters are placed side by side.
3. Importance of Pattern.compile()
A regular expression, specified as a string, must first be compiled into an instance of Pattern class.Pattern.compile() method
is the only way to create a instance of object. A typical invocation sequence is thus
Essentially, Pattern.compile() is used to transform a regular expression into an Finite state machine (seeCompilers:
Principles, Techniques, and Tools (2nd Edition)). But all of the states involved in performing a match resides in the matcher. By this way, the Pattern p can be reused.
And many matchers can share the same pattern.
Pattern.matches() method is defined as a convenience for when a regular expression is used just once. This method still uses compile() to
get the instance of a Pattern implicitly, and matches a string. Therefore,
is equivalent to the first code above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.
4. How to escape text for regular expression?
In general, regular expression uses “\” to escape constructs, but it is painful to precede the backslash with another backslash for the Java string to compile. There is another way for users to pass string Literals to the Pattern,
like “$5″. Instead of writing
we can type
5. Why does String.split() need pipe delimiter to be escaped?
String.split() splits a string around matches of the given regular expression. Java expression supports special characters that affect the way a pattern is matched, which
is called metacharacter.
one metacharacter which is used to match a single regular expression out of several possible regular expressions. For example,
either
Please refer to Alternation with The Vertical Bar or Pipe Symbol for more details. Therefore, to use
a literature, you need to escape it by adding
6. How can we match anbn with
Java regex?
This is the language of all non-empty strings consisting of some number of
like
and
However, Java regex implementations can recognize more than just regular languages. That is, they are not “regular” by formal language theory definition. Using lookahead and self-reference
matching will achieve it. Here I will give the final regular expression first, then explain it a little bit. For a comprehensive explanation, I would refer you to read How
can we match a^n b^n with Java regex.
Instead of explaining the syntax of this complex regular expression, I would rather say a little bit how it works.
In the first iteration, it stops at the first
by using
This was achieved by using
the self-reference matching, will matches the very inner parenthesed elements, which is one single
In the second iteration, the expression will stop at the second
to see if there will be
actually equivalent to
have to be matched. If so,
the second iteration.
In the nth iteration, the expression stops at the nth
see if there are n
By this way, the expression can count the number of
followed by
7. How to replace 2 or more spaces with single space in string and delete leading
spaces only?
String.replaceAll() replaces each substring that matches the given regular expression with the given replacement. “2 or more spaces” can be expressed by regular expression
the pipeline.
8. How to determine if a number is a prime with regex?
The function first generates n number of characters and tries to see if that string matches
If it is prime, the expression will return false and the
The first part
try to matches n length of characters, then repeat it several times by
By definition, a prime number is a natural number greater than 1 that has no positive divisors other than 1 and
itself. That means if a=n*m then a is not a prime. n*m can be further explained “repeat n m times”,
and that is exactly what the regular expression does: matches n length of characters by using
then repeat it m times by using
not prime, otherwise it is. Remind that
9. How to split a comma-separated string but ignoring commas in quotes?
You have reached the point where regular expressions break down. It is better and more neat to write a simple splitter, and handles special cases as you wish.
Alternative, you can mimic the operation of finite state machine, by using a switch statement or if-else. Attached is a snippet of code.
10. How to use backreferences in Java Regular Expressions
Backreferences is another
useful feature in Java regular expression.
Expression: exclude a word/string
Backreferences
in Java Regular Expressions
Online
Java Regular Expression Testing
A
tool to increase blog visitor count
Category: Java,Regular
Expressions,Top 10
1. How to extract numbers from a string?
One common question of using regular expression is to extract all the numbers into an array of integers.
In Java,
\dmeans a range of digits (0-9). Using the predefined classes whenever possible will make your code easier to read and eliminate
errors introduced by malformed character classes. Please refer toPredefined character
classes for more details. Please note the first backslash
\in
\d.
If you are using an escaped construct within a string literal, you must precede the backslash with another backslash for the string to compile. That’s why we need to use
\\d.
List<Integer> numbers = new LinkedList<Integer>(); Pattern p = Pattern.compile("\\d+"); Matcher m = p.matcher(str); while (m.find()) { numbers.add(Integer.parseInt(m.group())); } |
There are at least three different ways to enter a new line character, dependent on the operating system you are working on.
\rrepresents CR (Carriage Return), which is used in
Unix
\nmeans LF (Line Feed), used in Mac OS
\r\nmeans CR + LF, used in Windows
Therefore the most straightforward way to split string by new lines is
String lines[] = String.split("\\r?\\n"); |
String.split("[\\r\\n]+") |
String.split(System.getProperty("line.separator")); |
A regular expression, specified as a string, must first be compiled into an instance of Pattern class.Pattern.compile() method
is the only way to create a instance of object. A typical invocation sequence is thus
Pattern p = Pattern.compile("a*b"); Matcher matcher = p.matcher("aaaaab"); assert matcher.matches() == true; |
Principles, Techniques, and Tools (2nd Edition)). But all of the states involved in performing a match resides in the matcher. By this way, the Pattern p can be reused.
And many matchers can share the same pattern.
Matcher anotherMatcher = p.matcher("aab"); assert anotherMatcher.matches() == true; |
get the instance of a Pattern implicitly, and matches a string. Therefore,
boolean b = Pattern.matches("a*b", "aaaaab"); |
4. How to escape text for regular expression?
In general, regular expression uses “\” to escape constructs, but it is painful to precede the backslash with another backslash for the Java string to compile. There is another way for users to pass string Literals to the Pattern,
like “$5″. Instead of writing
\\$5or
[$]5,
we can type
Pattern.quote("$5"); |
String.split() splits a string around matches of the given regular expression. Java expression supports special characters that affect the way a pattern is matched, which
is called metacharacter.
|is
one metacharacter which is used to match a single regular expression out of several possible regular expressions. For example,
A|Bmeans
either
Aor
B.
Please refer to Alternation with The Vertical Bar or Pipe Symbol for more details. Therefore, to use
|as
a literature, you need to escape it by adding
\in front of it, like
\\|.
6. How can we match anbn with
Java regex?
This is the language of all non-empty strings consisting of some number of
a‘s followed by an equal number of
b‘s,
like
ab,
aabb,
and
aaabbb. This language can be show to be context-free grammar S → aSb | ab, and therefore a non-regular language.
However, Java regex implementations can recognize more than just regular languages. That is, they are not “regular” by formal language theory definition. Using lookahead and self-reference
matching will achieve it. Here I will give the final regular expression first, then explain it a little bit. For a comprehensive explanation, I would refer you to read How
can we match a^n b^n with Java regex.
Pattern p = Pattern.compile("(?x)(?:a(?= a*(\\1?+b)))+\\1"); // true System.out.println(p.matcher("aaabbb").matches()); // false System.out.println(p.matcher("aaaabbb").matches()); // false System.out.println(p.matcher("aaabbbb").matches()); // false System.out.println(p.matcher("caaabbb").matches()); |
In the first iteration, it stops at the first
athen looks ahead (after skipping some
as
by using
a*) whether there is a
b.
This was achieved by using
(?:a(?= a*(\\1?+b))). If it matches,
\1,
the self-reference matching, will matches the very inner parenthesed elements, which is one single
bin the first iteration.
In the second iteration, the expression will stop at the second
a, then it looks ahead (again skipping
as)
to see if there will be
b. But this time,
\\1+bis
actually equivalent to
bb, therefore two
bs
have to be matched. If so,
\1will be changed to
bbafter
the second iteration.
In the nth iteration, the expression stops at the nth
aand
see if there are n
bs ahead.
By this way, the expression can count the number of
as and match if the number of
bs
followed by
ais same.
7. How to replace 2 or more spaces with single space in string and delete leading
spaces only?
String.replaceAll() replaces each substring that matches the given regular expression with the given replacement. “2 or more spaces” can be expressed by regular expression
[ ]+. Therefore, the following code will work. Note that, the solution won’t ultimately remove all leading and trailing whitespaces. If you would like to have them deleted, you can use String.trim() in
the pipeline.
String line = " aa bbbbb ccc d "; // " aa bbbbb ccc d " System.out.println(line.replaceAll("[\\s]+", " ")); |
public static void main(String[] args) { // false System.out.println(prime(1)); // true System.out.println(prime(2)); // true System.out.println(prime(3)); // true System.out.println(prime(5)); // false System.out.println(prime(8)); // true System.out.println(prime(13)); // false System.out.println(prime(14)); // false System.out.println(prime(15)); } public static boolean prime(int n) { return !new String(new char[n]).matches(".?|(..+?)\\1+"); } |
.?|(..+?)\\1+.
If it is prime, the expression will return false and the
!will reverse the result.
The first part
.?just tries to make sure 1 is not primer. The magic part is the second part where backreference is used.
(..+?)\\1+first
try to matches n length of characters, then repeat it several times by
\\1+.
By definition, a prime number is a natural number greater than 1 that has no positive divisors other than 1 and
itself. That means if a=n*m then a is not a prime. n*m can be further explained “repeat n m times”,
and that is exactly what the regular expression does: matches n length of characters by using
(..+?),
then repeat it m times by using
\\1+. Therefore, if the pattern matches, the number is
not prime, otherwise it is. Remind that
!will reverse the result.
9. How to split a comma-separated string but ignoring commas in quotes?
You have reached the point where regular expressions break down. It is better and more neat to write a simple splitter, and handles special cases as you wish.
Alternative, you can mimic the operation of finite state machine, by using a switch statement or if-else. Attached is a snippet of code.
public static void main(String[] args) { String line = "aaa,bbb,\"c,c\",dd;dd,\"e,e"; List<String> toks = splitComma(line); for (String t : toks) { System.out.println("> " + t); } } private static List<String> splitComma(String str) { int start = 0; List<String> toks = new ArrayList<String>(); boolean withinQuote = false; for (int end = 0; end < str.length(); end++) { char c = str.charAt(end); switch(c) { case ',': if (!withinQuote) { toks.add(str.substring(start, end)); start = end + 1; } break; case '\"': withinQuote = !withinQuote; break; } } if (start < str.length()) { toks.add(str.substring(start)); } return toks; } |
Backreferences is another
useful feature in Java regular expression.
Related posts:
RegularExpression: exclude a word/string
Backreferences
in Java Regular Expressions
Online
Java Regular Expression Testing
A
tool to increase blog visitor count
Category: Java,Regular
Expressions,Top 10
相关文章推荐
- Top 10 questions of Java Strings
- Top 10 websites for advanced level java developer
- Top 10 Questions about Java Exceptions--reference
- Top 10 Methods for Java Arrays
- LeetCode 10 Regular Expression Matching (C,C++,Java,Python)
- Top 10 Methods for Java Arrays
- Top 10 Methods for Java Arrays
- Top 10 tricky Java interview questions and answers
- Top 10 Java Serialization Interview Questions and Answers
- 10 Hibernate Interview Questions and Answers for Java J2EE Programmers
- Top 10 questions about Java Collections--reference
- Top 10 Java Serialization Interview questions
- Top 10 Methods for Java Arrays
- Regular Expression Pocket Reference: Regular Expressions for Perl, Ruby, PHP, Python, C, Java and .NET, 2nd edition
- Top 10 Websites for Advanced Level Java Developers
- Top 10 Websites for Advanced Level Java Developers
- Top 10 Methods for Java Arrays
- Top 10 Websites for Advanced Level Java Developers
- Top 10 Websites for Advanced Level Java Developers
- Java [leetcode 10] Regular Expression Matching