- 1) What is Java Regular Expression (Regex)?
- 2) Java Regular Expression(Regex) Proper Syntax Conventions
- 3) Java Regular Expression Implementation
- 4) Java Regex Implementation Examples
- 5) Java Regular Expression Common Examples
- 6) Cheat Sheet
- 7) Java Regular Expression Quiz
- 8) Source Code
- 9) References
Regular Expression i.e Regex is common in every predominant programming language, let it be JavaScript, Python or PHP. A regular expression, regex or regexp is a sequence of characters that define a search pattern.
What is Java Regular Expression (Regex)?
The search pattern can be anything from a simple character, a fixed string or a complex expression containing special characters describing the pattern. The pattern defined by the regex may match one or several times or not at all for a given string. Regular expressions can be used to search, edit and manipulate text.
The regular expression in java defines a pattern for a string. Regular Expression can be used to search, edit or manipulate text. A regular expression is not language-specific but they differ slightly for each language. Regular Expression in Java is most similar to Perl. Let’s dive inside to know-how Regular Expression works in Java.
Java Regular Expression(Regex) Proper Syntax Conventions
Common Matching Symbols, Metacharacters and Quantifiers
Regular Expression | Description | Example |
. | Match any character. | It will match to any character |
^regex | Find regex that must match at the beginning of the line. | ^P will match to PraBhu |
regex$ | Find a regex that must match at the end of the line. | u$ will match to PraBhu |
[abc] | Set definition, can match the letter a or b or c. | [Pp] will match to P or p |
[abc][vz] | Set definition, can match a or b or c followed by either v or z. | [Pp][Rr][Aa] will match to Pra |
[^abc] | When a caret appears as the first character inside the square bracket, it negates the pattern. This pattern matches any character except a or b or c. | [^pra] will match any character but pra |
[a-d1-7] | Range: matches a letter between a and d and figures from 1 to 7, but not d1. | [a-d1-4] will match to any single alphabet between a to d or digit between 1-4 |
X|Z | Find X or Z. | P|R matches either P or R |
XZ | Find X directly followed by Z. | PRA matches if exactly “PRA” is found. |
$ | Check if a line end as follows. | Match if that character is the end of the string |
\d | Any digit, short for [0-9] | \d will match to any digit between 0-9 |
\D | A non-digit, short for [^0-9] | \D will match to any character other than 0-9 |
\s | A whitespace character, short for [ \t\n\x0b\r\f] | \s will match to whitespace character |
\S | A non-whitespace character | \S any character other than whitespace |
\w | A word character, short for [a-zA-Z_0-9] | \w will match all cases insensitive alphabet, digits, and underscore |
\W | A non-word character [^\w] | \W will match to anything other than all cases insensitive alphabet, digits and underscore |
\S+ | Several non-whitespace characters | \S any character other than whitespace |
\b | Match a word boundary where a word character is [a-zA-Z0-9_]. Boundaries are determined when a word character is NOT followed or NOT preceded with another word character. | \b
Abc9: true Abc#: false |
* | Occurs zero or more times, is short for {0,} | X* finds no or several letters X, <sbr /> .* Finds any character sequence |
+ | Occurs one or more times, is short for {1,} | X+- Find one or several letters X |
? | Occurs no or one time,? It is short for {0,1}. | X? Find out exactly one letter X |
{X} | Occurs X number of times, {} describes the order of the preceding liberal | \d{3} searches for three digits, .{10} for any character sequence of length 10. |
{X,Y} | Occurs between X and Y times, | \d{1,4} means \d must occur at least once and at a maximum of four.
193: true 12345: false |
*? | ? After a quantifier makes it a reluctant quantifier. It tries to find the smallest match. This makes the regular expression stop at the first match. | AP*?
AP: true APP: false |
How to deal with Backslash in Java Regular Expression Regex?
The backslash \ is an escape character in Java Strings, which means the backslash has a predefined meaning in Java. It is used to indicate that the next character should NOT be interpreted literally. We have to use double backslash \\ to define a single backslash.
For example, as per the Regex Convention, the character ‘w’ by itself will be interpreted as ‘match the character w’, but using ‘\w‘ signifies ‘match an alpha-numeric character including underscore’.
In the case of Java, if we want to define \w, then we must be using \\w in our regex. If we want to use backslash as a literal, we have to type \\\\ as \ is also an escape character in regular expressions.
In short, to match a digit,
Regular Expression: \d
Java Regular Expression: \\d
How to Group in Java Regex?
Grouping parts of Regular Expression is possible in Java. We need to enclose the parts of the pattern we want to group in round brackets (). Java will assign a repetition operator to such a complete operator.
String pattern="(\\+\\d{2})(\\s)(\\d{10})"; // Java Regex to match +xx xxxxxxxxxx String testCase="+91 8308767656"; testCase.matches(pattern); // this will return true
How to create Back Reference in Java Regex?
We can also create back reference to these parts of the regular expressions. A backreference stores the part of the string which matched the regex group. This allows us to use this part in the replacement.
String pattern="(\\+\\d{2})(\\s)(\\d{10})"; String testCase="+91 8308767656"; System.out.println(testCase.replaceAll(pattern, "$3")); //this will print 8308767656 System.out.println(testCase.replaceAll(pattern, "$1$3")); // this will print pattern string without the space in between i.e +918308767656
Java Regular Expressions in different mode configurations
We can add the mode modifiers to the start of the regex. To specify multiple modes, simply put them together as in (?ismx).
- (?s) for “single-line mode” makes the dot match all characters, including line breaks.
- (?m) for “multi-line mode” makes the caret and dollar match at the start and end of each line in the subject string.
- (?i) makes the regex case insensitive.
String pattern2="(?i)(PraBhu)"; // (?i) makes entire pattern case insensitive String testCase2="PRABHU"; System.out.println(testCase2.matches(pattern2)); // this will return true testCase2="prabhu"; System.out.println(testCase2.matches(pattern2)); // this will return true
Java Regular Expression Implementation
String Class Methods
There are 4 methods present in java.lang.String class which supports regex.
Method | Description | Example |
s.matches(“regex”) | Evaluates if “regex” matches s. Returns only true if the WHOLE string can be matched. | “PraBhu”.matches(“\\S{6}”) returns true “123456”.matches(“\\D*”) returns false |
s.split(“regex”) | Creates an array with substrings of s divided occurrence of “regex”. “regex” is not included in the result. | “I have 2 dogs”.split(“\\d”)) return String[] with values “I have” and “dogs” |
s.replaceFirst(“regex”), “replacement” | Replaces first occurance of “regex” with”replacement. | “Pra1Bhu”.replaceFirst(“\\d”, ” “) returns “Pra Bhu” |
s.replaceAll(“regex”), “replacement” | Replaces all occurances of “regex” with”replacement. | “PraBhu”.replaceAll(“\\S”, “1”) returns “111111” |
String testCase = "This is a sample text with below 100 sample words to serve the purpose."; // Checks if the String contains "sample" String pattern1 = ".*(sample).*"; // Checks if the String contains "Sample" String pattern2 = ".*(Sample).*"; // Checks if the String contains three character digit String pattern3 = ".*\\d{3}.*"; // Checks if the String starts with 'T' and ends with '.'. String pattern4 = "T.*."; // pattern to three character digit String pattern5 = "\\d{3}"; // pattern to detect word 6 character word which starts with 's' and ends with // 'e' String pattern6 = "s.{4}e"; System.out.println(testCase.matches(pattern1)); // true System.out.println(testCase.matches(pattern2)); // false System.out.println(testCase.matches(pattern3)); // true System.out.println(testCase.matches(pattern4));// true String[] output = testCase.split(pattern5); for (String s : output) { System.out.println(s); } // This is a sample text with below // sample words to serve the purpose. System.out.println(testCase.replaceFirst(pattern6, "new")); // This is a new text with below 100 sample words to // serve the purpose. System.out.println(testCase.replaceAll(pattern6, "new")); // This is a new text with below 100 new words to // serve the purpose.
These methods should be used only when the entire regex needs to be matched with the input string. Internally, these methods use Patterns and Matches.
Advance Java Regex : Pattern, Matcher and PatternSyntaxException
For advanced regular expressions the java.util.regex.Pattern and java.util.regex.Matcher classes are used.
- Pattern: Pattern object is the compiled version of the regular expression. Pattern class doesn’t have any public constructor and we use its public static method
compile
to create a pattern object bypassing regular expression argument. You first create aPattern
object which defines the regular expression. This Pattern object allows you to create a Matcher object for a given string.
- Matcher: Matcher is the java regex engine object that matches the input string pattern with the pattern object created. Matcher class doesn’t have any public constructor and we get a Matcher object using a pattern object matcher method that takes the input string as an argument.
We then use the matches
method that returns a boolean result based on the input string that matches the regex pattern or not. This Matcher object then allows you to do regex operations on a String.
- PatternSyntaxException: PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.
https://www.adevguide.com/agile-scrum-most-asked-interview-questions/
Pattern Class Method Description
METHOD | DESCRIPTION |
compile(String regex) | It is used to compile the given regular expression into a pattern. |
compile(String regex, int flags) | It is used to compile the given regular expression into a pattern with the given flag. |
flags() | It is used to return this pattern’s matching flags. |
matcher(CharSequence input) | It is used to create a matcher that will match the given input against this pattern. |
matches(String regex, CharSequence input) | It is used to compile the given regular expression and attempts to match the given input against it. |
pattern() | It is used to return the regular expression from which this pattern was compiled. |
quote(String s) | It is used to return a literal pattern String for the specified String. |
split(CharSequence input) | It is used to split the given input sequence around matches of this pattern. |
split(CharSequence input, int limit) | It is used to split the given input sequence around matches of this pattern. |
toString() | It is used to return the string representation of this pattern. |
Matcher Class Method Description
METHOD | DESCRIPTION |
find() | It is mainly used for searching multiple occurrences of the regular expressions in the text. |
find(int start) | It is used for searching occurrences of the regular expressions in the text starting from the given index. |
start() | It is used for getting the start index of a match that is being found using find() method. |
end() | It is used for getting the end index of a match that is being found using find() method. It returns the index of character next to the last matching character. |
groupCount() | It is used to find the total number of the matched subsequence. |
group() | It is used to find the matched subsequence. |
matches() | It is used to test whether the regular expression matches the pattern. |
PatternSyntaxException Method Description
METHOD | DESCRIPTION |
getDescription() | It is used to retrieve the description of the error. |
getIndex() | It is used to retrieve the error-index. |
getMessage() | It is used to return a multi-line string containing the description of the syntax error and its index, the erroneous regular-expression pattern, and a visual indication of the error-index within the pattern. |
getPattern() | It is used to retrieve the erroneous regular-expression pattern. |
Java Regex Implementation Examples
Similar to String regex methods, Pattern compile method matches the entire string against the regex. For a partial match, we use compile method with Matcher.
package com.adevguide.java.regex; import java.util.regex.Pattern; public class PatternMatcher { public static void main(String[] args) { //Using Pattern class static method matches to match a regex against input string //return true if string starts with 'T' System.out.println(Pattern.matches("T.*", "This is a sample string")); //true //returns true if date is passed in DD-MM-YYYY format System.out.println(Pattern.matches("(\\d{2}-){2}\\d{4}", "19-05-1894")); //true } }
The pattern defined by regex is applied on the string from left to right and once a source character is used in a match, it can’t be reused. For example, the regex “121” will match “31212142121” only twice as “_121____121”.
Similar to the String class split method, we have the Pattern class split method.
package com.adevguide.java.simpleproject; import java.util.regex.Pattern; public class App { public static void main(String[] args) { String testCase = "This is yet another sample text with \t multiple tabs and multiple lines.\n" + "Let me add some 5 numbers also so we will have more test cases, \n" + "looks like I have added only 1 number. Bummer!!"; //create Pattern class object for new line regex Pattern pattern = Pattern.compile("\\n"); //invoke pattern class split method and pass input string String[] result = pattern.split(testCase); int i = 0; //iterate and print split result with line number for (String s : result) { i++; System.out.println("line " + i + ": " + s); } } }
Output:
line 1: This is yet another sample text with multiple tabs(2) and multiple lines.
line 2: Let me add 50 numbers also so we will have more test cases,
line 3: looks like I have added only 1 number. Bummer!!
Matcher class methods compile(), find(), start(), end()
package com.adevguide.java.simpleproject; import java.util.regex.Matcher; import java.util.regex.Pattern; public class App { public static void main(String[] args) { String testCase = "This is yet another \t sample text with \t multiple tabs(2) and multiple lines.\n" + "Let me add 50 numbers also so we will have more test cases, \n" + "looks like I have added only 1 number. Bummer!!"; // create Pattern class object for a digit Pattern pattern = Pattern.compile("\\d"); // generate Matcher class object from pattern object and pass input string Matcher matcher = pattern.matcher(testCase); int i = 0; while (matcher.find()) { i++; System.out.println("Pattern exist for " + i + " time from index " + matcher.start() + " till index " + matcher.end()); } } }
Output:
Pattern exist for 1 time from index 55 till index 56
Pattern exist for 2 time from index 89 till index 90
Pattern exist for 3 time from index 90 till index 91
Pattern exist for 4 time from index 168 till index 169
Case-insensitive pattern search
package com.adevguide.java.simpleproject; import java.util.regex.Matcher; import java.util.regex.Pattern; public class App { public static void main(String[] args) { String testCase = "This is yet another \t sample text with \t multiple tabs(2) and multiple lines.\n" + "Let me add 50 numbers also so we will have more test cases, \n" + "looks like I have added only 1 number. Bummer!!"; // create Pattern class object for a case insensitive word Pattern pattern = Pattern.compile("aNoTHeR", Pattern.CASE_INSENSITIVE); // generate Matcher class object from pattern object and pass input string Matcher matcher = pattern.matcher(testCase); System.out.println(matcher.find()); //true } }
Java Regex Grouping and Backreference
We can refer back to the regex group by using a double backslash followed by the group index
System.out.println(Pattern.matches("(Pra)\\1", "PraPra")); // true System.out.println(Pattern.matches("(Pra)(Bhu)\\2\\1", "PraBhuBhuPra")); // true System.out.println(Pattern.matches("(Pr)(a\\d)\\2\\1", "Pra1a1Pr")); // true
Java Regular Expression Common Examples
Email ID Java Regex
package com.adevguide.java.simpleproject; import java.util.regex.Pattern; public class App { public static void main(String[] args) { // Regex for emailId String emailIdRegex = "^[a-z0-9_+&*-]+(?:\\." + "[a-z0-9_+&*-]+)*@" + "(?:[a-z0-9-]+\\.)+[a-z]{2,7}$"; String emailId = "[email protected]"; // Case-insensitive search System.out.println(Pattern.compile(emailIdRegex, Pattern.CASE_INSENSITIVE).matcher(emailId).matches()); // true } }
Username Java Regex
package com.adevguide.java.simpleproject; import java.util.regex.Matcher; import java.util.regex.Pattern; public class App { public static void main(String[] args) { String inputText="Prabhu12_4"; Pattern p=Pattern.compile("(?i)^[a-z0-9_-]{3,16}$"); /* * We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter * (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those * characters, but no more than 16. Finally, we want the end of the string ($). */ Matcher m=p.matcher(inputText); System.out.println(m.matches());//true inputText="prabhu12%4";//false m=p.matcher(inputText); System.out.println(m.matches()); } }
Password Java Regex
package com.adevguide.java.simpleproject; import java.util.regex.Matcher; import java.util.regex.Pattern; public class App { public static void main(String[] args) { String inputText = "Prabhu12@4"; Pattern p = Pattern.compile("^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[@$!%*?&])[A-Za-z\\d@$!%*?&]{8,}$"); /* * Minimum eight characters, at least one uppercase letter, one lowercase letter, one number and one special * character */ Matcher m = p.matcher(inputText); System.out.println(m.matches()); // true inputText = "prabhu12%4"; m = p.matcher(inputText); System.out.println(m.matches()); // false } }
Cheat Sheet
Here is a cheat sheet to quickly glance over all the important aspects of java regular expressions.
View/Download Cheat Sheet in HD
Java Regular Expression Quiz
Do you want to give your regex skills at a test? We have created an MCQ quiz to test regex skills, give it a look.
Source Code
You can find the entire source code used in this tutorial in our GitHub Repository.
References
This tutorial took reference from the below articles: