Java Regular Expression

Regular Expression i.e Regex is common in every predominant programming language, let it be JavaScript, Python or PHP. A regular expression, regex or regexp is a sequence of characters that define a search pattern.

What is Java Regular Expression (Regex)?

The search pattern can be anything from a simple character, a fixed string or a complex expression containing special characters describing the pattern. The pattern defined by the regex may match one or several times or not at all for a given string. Regular expressions can be used to search, edit and manipulate text.

The regular expression in java defines a pattern for a string. Regular Expression can be used to search, edit or manipulate text. A regular expression is not language-specific but they differ slightly for each language. Regular Expression in Java is most similar to Perl. Let’s dive inside to know-how Regular Expression works in Java.

 

Java Regular Expression(Regex) Proper Syntax Conventions

Common Matching Symbols, Metacharacters and Quantifiers

Regular Expression Description Example
. Match any character.  It will match to any character
^regex Find regex that must match at the beginning of the line. ^P will match to PraBhu
regex$ Find a regex that must match at the end of the line. u$ will match to PraBhu
[abc] Set definition, can match the letter a or b or c. [Pp] will match to P or p
[abc][vz] Set definition, can match a or b or c followed by either v or z. [Pp][Rr][Aa] will match to Pra
[^abc] When a caret appears as the first character inside the square bracket, it negates the pattern. This pattern matches any character except a or b or c. [^pra] will match any character but pra
[a-d1-7] Range: matches a letter between a and d and figures from 1 to 7, but not d1. [a-d1-4] will match to any single alphabet between a to d or digit between 1-4
X|Z Find X or Z. P|R matches either P or R
XZ Find X directly followed by Z. PRA matches if exactly “PRA” is found.
$ Check if a line end as follows. Match if that character is the end of the string
\d Any digit, short for [0-9] \d will match to any digit between 0-9
\D A non-digit, short for [^0-9] \D will match to any character other than 0-9
\s A whitespace character, short for [ \t\n\x0b\r\f] \s will match to whitespace character
\S A non-whitespace character \S any character other than whitespace
\w A word character, short for [a-zA-Z_0-9] \w will match all cases insensitive alphabet, digits, and underscore
\W A non-word character [^\w] \W  will match to anything other than all cases insensitive alphabet, digits and underscore
\S+ Several non-whitespace characters \S any character other than whitespace
\b Match a word boundary where a word character is [a-zA-Z0-9_]. Boundaries are determined when a word character is NOT followed or NOT preceded with another word character. \b

Abc9: true

Abc#: false

* Occurs zero or more times, is short for {0,} X* finds no or several letters X, <sbr /> .* Finds any character sequence
+ Occurs one or more times, is short for {1,} X+- Find one or several letters X
? Occurs no or one time,? It is short for {0,1}. X? Find out exactly one letter X
{X} Occurs X number of times, {} describes the order of the preceding liberal \d{3} searches for three digits, .{10} for any character sequence of length 10.
{X,Y} Occurs between X and Y times, \d{1,4} means \d must occur at least once and at a maximum of four.

193: true

12345: false

*? ? After a quantifier makes it a reluctant quantifier. It tries to find the smallest match. This makes the regular expression stop at the first match. AP*?

AP: true

APP: false

How to deal with Backslash in Java Regular Expression Regex?

The backslash \ is an escape character in Java Strings, which means the backslash has a predefined meaning in Java. It is used to indicate that the next character should NOT be interpreted literally. We have to use double backslash \\ to define a single backslash.

For example, as per the Regex Convention, the character ‘w’ by itself will be interpreted as ‘match the character w’, but using ‘\w‘ signifies ‘match an alpha-numeric character including underscore’.

In the case of Java, if we want to define \w, then we must be using \\w in our regex. If we want to use backslash as a literal, we have to type \\\\ as \ is also an escape character in regular expressions.

In short, to match a digit,

Regular Expression: \d

Java Regular Expression: \\d

 

How to Group in Java Regex?

Grouping parts of Regular Expression is possible in Java. We need to enclose the parts of the pattern we want to group in round brackets (). Java will assign a repetition operator to such a complete operator.

 

String pattern="(\\+\\d{2})(\\s)(\\d{10})"; // Java Regex to match +xx xxxxxxxxxx
String testCase="+91 8308767656";
testCase.matches(pattern); // this will return true

 

How to create Back Reference in Java Regex?

We can also create back reference to these parts of the regular expressions. A backreference stores the part of the string which matched the regex group. This allows us to use this part in the replacement.

String pattern="(\\+\\d{2})(\\s)(\\d{10})";
String testCase="+91 8308767656";
System.out.println(testCase.replaceAll(pattern, "$3")); //this will print 8308767656
System.out.println(testCase.replaceAll(pattern, "$1$3")); // this will print pattern string without the space in between i.e +918308767656

 

Java Regular Expressions in different mode configurations

We can add the mode modifiers to the start of the regex. To specify multiple modes, simply put them together as in (?ismx).

  • (?s) for “single-line mode” makes the dot match all characters, including line breaks.
  • (?m) for “multi-line mode” makes the caret and dollar match at the start and end of each line in the subject string.
  • (?i) makes the regex case insensitive.
String pattern2="(?i)(PraBhu)"; // (?i) makes entire pattern case insensitive
String testCase2="PRABHU";
System.out.println(testCase2.matches(pattern2)); // this will return true
testCase2="prabhu";
System.out.println(testCase2.matches(pattern2)); // this will return true

 

Java Regular Expression Implementation

String Class Methods

There are 4 methods present in java.lang.String class which supports regex.

Method Description Example
s.matches(“regex”) Evaluates if “regex” matches s. Returns only true if the WHOLE string can be matched. “PraBhu”.matches(“\\S{6}”) returns true
“123456”.matches(“\\D*”) returns false
s.split(“regex”) Creates an array with substrings of s divided occurrence of “regex”. “regex” is not included in the result. “I have 2 dogs”.split(“\\d”)) return String[] with values “I have” and “dogs”
s.replaceFirst(“regex”), “replacement” Replaces first occurance of “regex” with”replacement. “Pra1Bhu”.replaceFirst(“\\d”, ” “) returns “Pra Bhu”
s.replaceAll(“regex”), “replacement” Replaces all occurances of “regex” with”replacement. “PraBhu”.replaceAll(“\\S”, “1”) returns “111111”

 

 

String testCase = "This is a sample text with below 100 sample words to serve the purpose.";
// Checks if the String contains "sample"
String pattern1 = ".*(sample).*";
// Checks if the String contains "Sample"
String pattern2 = ".*(Sample).*";
// Checks if the String contains three character digit
String pattern3 = ".*\\d{3}.*";
// Checks if the String starts with 'T' and ends with '.'.
String pattern4 = "T.*.";
// pattern to three character digit
String pattern5 = "\\d{3}";
// pattern to detect word 6 character word which starts with 's' and ends with
// 'e'
String pattern6 = "s.{4}e";

System.out.println(testCase.matches(pattern1)); // true

System.out.println(testCase.matches(pattern2)); // false

System.out.println(testCase.matches(pattern3)); // true

System.out.println(testCase.matches(pattern4));// true

String[] output = testCase.split(pattern5);
for (String s : output) {
  System.out.println(s);
}
// This is a sample text with below
// sample words to serve the purpose.

System.out.println(testCase.replaceFirst(pattern6, "new")); // This is a new text with below 100 sample words to
                              // serve the purpose.

System.out.println(testCase.replaceAll(pattern6, "new")); // This is a new text with below 100 new words to
                              // serve the purpose.

These methods should be used only when the entire regex needs to be matched with the input string. Internally, these methods use Patterns and Matches.

 

Advance Java Regex : Pattern, Matcher and PatternSyntaxException

For advanced regular expressions the java.util.regex.Pattern and java.util.regex.Matcher classes are used.

  • Pattern: Pattern object is the compiled version of the regular expression. Pattern class doesn’t have any public constructor and we use its public static method compile to create a pattern object bypassing regular expression argument. You first create a Pattern object which defines the regular expression. This Pattern object allows you to create a Matcher object for a given string.

 

  • Matcher: Matcher is the java regex engine object that matches the input string pattern with the pattern object created. Matcher class doesn’t have any public constructor and we get a Matcher object using a pattern object matcher method that takes the input string as an argument.

We then use the matches method that returns a boolean result based on the input string that matches the regex pattern or not. This Matcher object then allows you to do regex operations on a String.

 

  • PatternSyntaxException: PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.

https://www.adevguide.com/agile-scrum-most-asked-interview-questions/

 

Pattern Class Method Description

METHOD DESCRIPTION
compile(String regex) It is used to compile the given regular expression into a pattern.
compile(String regex, int flags) It is used to compile the given regular expression into a pattern with the given flag.
flags() It is used to return this pattern’s matching flags.
matcher(CharSequence input) It is used to create a matcher that will match the given input against this pattern.
matches(String regex, CharSequence input) It is used to compile the given regular expression and attempts to match the given input against it.
pattern() It is used to return the regular expression from which this pattern was compiled.
quote(String s) It is used to return a literal pattern String for the specified String.
split(CharSequence input) It is used to split the given input sequence around matches of this pattern.
split(CharSequence input, int limit) It is used to split the given input sequence around matches of this pattern.
toString() It is used to return the string representation of this pattern.

 

Matcher Class Method Description

METHOD DESCRIPTION
find() It is mainly used for searching multiple occurrences of the regular expressions in the text.
find(int start) It is used for searching occurrences of the regular expressions in the text starting from the given index.
start() It is used for getting the start index of a match that is being found using find() method.
end() It is used for getting the end index of a match that is being found using find() method. It returns the index of character next to the last matching character.
groupCount() It is used to find the total number of the matched subsequence.
group() It is used to find the matched subsequence.
matches() It is used to test whether the regular expression matches the pattern.

 

PatternSyntaxException Method Description

METHOD DESCRIPTION
getDescription() It is used to retrieve the description of the error.
getIndex() It is used to retrieve the error-index.
getMessage() It is used to return a multi-line string containing the description of the syntax error and its index, the erroneous regular-expression pattern, and a visual indication of the error-index within the pattern.
getPattern() It is used to retrieve the erroneous regular-expression pattern.

 

Java Regex Implementation Examples

Similar to String regex methods, Pattern compile method matches the entire string against the regex. For a partial match, we use compile method with Matcher.

package com.adevguide.java.regex;

import java.util.regex.Pattern;

public class PatternMatcher {

  public static void main(String[] args) {

    //Using Pattern class static method matches to match a regex against input string
    //return true if string starts with 'T'
    System.out.println(Pattern.matches("T.*", "This is a sample string")); //true
    //returns true if date is passed in DD-MM-YYYY format
    System.out.println(Pattern.matches("(\\d{2}-){2}\\d{4}", "19-05-1894")); //true
    
        
  }

}

The pattern defined by regex is applied on the string from left to right and once a source character is used in a match, it can’t be reused. For example, the regex “121” will match “31212142121” only twice as “_121____121”.

Similar to the String class split method, we have the Pattern class split method.

package com.adevguide.java.simpleproject;

import java.util.regex.Pattern;

public class App {
  public static void main(String[] args) {

    String testCase = "This is yet another sample text with \t multiple tabs and multiple lines.\n"
        + "Let me add some 5 numbers also so we will have more test cases, \n"
        + "looks like I have added only 1 number. Bummer!!";

    //create Pattern class object for new line regex
    Pattern pattern = Pattern.compile("\\n");
    //invoke pattern class split method and pass input string
    String[] result = pattern.split(testCase);
    int i = 0;
    //iterate and print split result with line number
    for (String s : result) {
      i++;
      System.out.println("line " + i + ": " + s);

    }
  }
}

Output:

line 1: This is yet another sample text with multiple tabs(2) and multiple lines.
line 2: Let me add 50 numbers also so we will have more test cases,
line 3: looks like I have added only 1 number. Bummer!!

 

Matcher class methods compile(), find(), start(), end()

package com.adevguide.java.simpleproject;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class App {
  public static void main(String[] args) {

    String testCase = "This is yet another \t sample text with \t multiple tabs(2) and multiple lines.\n"
        + "Let me add 50 numbers also so we will have more test cases, \n"
        + "looks like I have added only 1 number. Bummer!!";

    // create Pattern class object for a digit
    Pattern pattern = Pattern.compile("\\d");
    // generate Matcher class object from pattern object and pass input string
    Matcher matcher = pattern.matcher(testCase);
    int i = 0;
    while (matcher.find()) {
      i++;
      System.out.println("Pattern exist for " + i + " time from index " + matcher.start() + " till index "
          + matcher.end());
    }

  }
}

Output:

Pattern exist for 1 time from index 55 till index 56
Pattern exist for 2 time from index 89 till index 90
Pattern exist for 3 time from index 90 till index 91
Pattern exist for 4 time from index 168 till index 169

Case-insensitive pattern search

package com.adevguide.java.simpleproject;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class App {
  public static void main(String[] args) {

    String testCase = "This is yet another \t sample text with \t multiple tabs(2) and multiple lines.\n"
        + "Let me add 50 numbers also so we will have more test cases, \n"
        + "looks like I have added only 1 number. Bummer!!";

    // create Pattern class object for a case insensitive word
    Pattern pattern = Pattern.compile("aNoTHeR", Pattern.CASE_INSENSITIVE);
    // generate Matcher class object from pattern object and pass input string
    Matcher matcher = pattern.matcher(testCase);

    System.out.println(matcher.find()); //true

  }
}

Java Regex Grouping and Backreference

We can refer back to the regex group by using a double backslash followed by the group index

    System.out.println(Pattern.matches("(Pra)\\1", "PraPra")); // true
    System.out.println(Pattern.matches("(Pra)(Bhu)\\2\\1", "PraBhuBhuPra")); // true
    System.out.println(Pattern.matches("(Pr)(a\\d)\\2\\1", "Pra1a1Pr")); // true

 

Java Regular Expression Common Examples

Email ID Java Regex

package com.adevguide.java.simpleproject;

import java.util.regex.Pattern;

public class App {
  public static void main(String[] args) {

    // Regex for emailId
    String emailIdRegex = "^[a-z0-9_+&*-]+(?:\\." + "[a-z0-9_+&*-]+)*@" + "(?:[a-z0-9-]+\\.)+[a-z]{2,7}$";

    String emailId = "[email protected]";
    // Case-insensitive search
    System.out.println(Pattern.compile(emailIdRegex, Pattern.CASE_INSENSITIVE).matcher(emailId).matches()); // true

  }
}

 

Username Java Regex

package com.adevguide.java.simpleproject;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class App {
  public static void main(String[] args) {

  String inputText="Prabhu12_4";
  Pattern p=Pattern.compile("(?i)^[a-z0-9_-]{3,16}$");
        /*
         * We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter
         * (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those
         * characters, but no more than 16. Finally, we want the end of the string ($).
         */
  Matcher m=p.matcher(inputText);
  System.out.println(m.matches());//true
  
  inputText="prabhu12%4";//false
  m=p.matcher(inputText);
  System.out.println(m.matches());        

  }
}

 

Password Java Regex

package com.adevguide.java.simpleproject;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class App {
    public static void main(String[] args) {

        String inputText = "Prabhu12@4";
        Pattern p = Pattern.compile("^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[@$!%*?&])[A-Za-z\\d@$!%*?&]{8,}$");
        /*
         * Minimum eight characters, at least one uppercase letter, one lowercase letter, one number and one special
         * character
         */
        Matcher m = p.matcher(inputText);
        System.out.println(m.matches()); // true

        inputText = "prabhu12%4";
        m = p.matcher(inputText);
        System.out.println(m.matches()); // false

    }
}

 

Cheat Sheet

Here is a cheat sheet to quickly glance over all the important aspects of java regular expressions.

regular-expressions-cheat-sheet

 

View/Download Cheat Sheet in HD

Java Regular Expression Quiz

Do you want to give your regex skills at a test? We have created an MCQ quiz to test regex skills, give it a look.

20 Java Regular Expressions Quiz Regex Questions [MCQ]

Source Code

You can find the entire source code used in this tutorial in our GitHub Repository.

References

This tutorial took reference from the below articles:

Regular expressions in Java – Tutorial

Regular Expressions in Java

Regular Expression in Java – Java Regex Example