How to Validated Email Address in Python with Regular Expression

This tutorial will teach us to validate the given email address with the regular expression. A regular expression is an important technique to search the text and replace actions, validations, string splitting, and many other operations. They provide a set of rules to identify the specific characters and digits in such a form that the pattern matches the certain segments of text that we are looking for.

Regular expressions are known for pattern-matching, and various programming languages have different interfaces representing them to match the results.

Let's see how we can write the regular expression for the email validation.

General-Purpose Email Regular Expression

There is no such regular expression that matches every possible valid email address. Although, we can write a regular expression that can match the most valid email addresses. First, we must define the email address format we are looking for. Below is the most common email format is.

Thus we can note down a pattern of the @ symbol dividing the prefix from the domain segment. A user_name is a valid string containing uppercase and lowercase letters, numbers, and some special characters.

The domain_name is a top-level domain divided by a . (dot) symbol. The domain name also can have uppercase, lowercase letters, and - (hyphen) symbols. A top-level domain can be com, in, edu, co, or at least two characters long.

The regular expression could look as below.

(string1)@(string2).(2+characters)

The above expression accurately matches the below email.

The same regular expression can fail for the below type email addresses.

According to the regular expression, the string shouldn't contain special characters. Additionaly, the top-level domain can't be invalid. We can put these rules down into a concrete expression that takes in a few more cases into account than the first representation.

We cannot use the prefix just before the @ symbol, nor can the prefix start with it, so we made sure that there is at least one alphanumeric character before every special character. A top-level domain can be divided with a dot.

The regex is more complicated than the first one, but it covers all of the rules per our defined regular expression. Yet again, it would fail for some edge cases.

Validate Email Addresses with Python

Python provides the re-module that contains classes and methods to represent and work with Regular expressions in Python. We will use this module in our Python script and re.fullmatch(pattern, string, flags). This method returns a match object only if the whole string matches the pattern. If not matched, return none. Let's understand the following Python code.

Example -

Output:

The given mail is valid
The given mail is valid
The given mail is invalid
The given mail is invalid

Explanation -

In the above code, we use the compile() method, which compiles the regex pattern into a regex object. It is mostly used for efficiency reasons when we try to match the pattern more than once. Then we define a function that uses the re.fullmatch() method, which uses regex and email as arguments, and it returns the match object if the given mail completely matches and prints the message.

Robust Email Regular Expression

Earlier, we have written the regular expression that works well for most cases and works with the reasonable application. However, we can write the more robust regular expression. Long expressions tend to get convolved and hard to read, and this expression is no exception.

Now we write the script for the robust email validation. Let's understand the following example.

Example -

Output:

The given mail is valid
The given mail is valid
The given mail is invalid
The given mail is invalid

Conclusion

This tutorial included many ways to validate emails using Regular Expressions, mostly depending on what certain format we are looking for. In relation to that, there is no unique pattern that works for all email formats; we need to define the rules we want the format to follow and construct a pattern accordingly.

Each new rule reduces the degree of freedom on the accepted addresses.






Latest Courses