Today we will learn Python Regular Expression Example . In this tutorial we will learn what is Regular Expression, why we use this, where we use regular expression and various regular expression operations? So let’s move ahead.
So till now we have seen many languages and we have also seen certain kinds of string accepted by languages. And till now the way we have represented those strings were just using simple english language. But now we have the option of Regular Expression. So the next thing is that to grab the concept of regular expression.
What Are Regular Expressions ?
Overview
- A Regular Expression is basically a special text string for describing the search pattern.
- Regular Expressions are used for representing certain sets of string in Algebraic fashion.
- In short form Regular Expression is known as RegEx.
- All the programming languages like python, Ruby, java, javascript, PHP all of them have the capability and ability to use RegEx.
- They are very powerful.
Why We Use Regular Expression ?
Now you will are thinking about why we use regular expression. But don’t worry i am clearing all the odd things of regular expression. Now, here i am listing some examples of why we use regular expression?
- Finding Date and Time from log files – Let’s consider, if you want to find only date and time from a particular log file, and we know log file format is not understable for everyone. So, then what will you do? You can simply use regular expression for identifying patterns and then actually you get date and time from that particular log file.
- Verifying email addresses – Imagine you are a HR of your company and you have lots of email addresses. But lots of them are fake and you have to find out fake emails. Then what will you do? You can use regular expression to identify fake emails. All the email addresses have a particular pattern and they have a format. So with the use of RegEx you can verify that format and that’s way you can get what email addresses are correct and what are fake.
Apart from above two example, we can use regular expression for replacing a particular string, verifying phone numbers and finding their country to which they belongs etc.
Rules To Be Remember for RegEx
There are certain rules that should be remember in context of regular expression. i am listing them below –
- Any terminal symbol i.e., Σ including ^ and Ø are regular expressions.
- The union of two regular expression is also regular expression.
- The concatenation of two regular expression is also a regular expression.
- The iteration(closure) of a regular expression is also a regular expression.
- The regular expression over Σ are precisely those obtained recursively by the application of above rules once a several times.
Python Regular Expression Example
Till now we have seen about Regular expression in general, now we will discuss about Regular expression in python. So let’s start –
Regular Expression are very popular among programmers and can also be applied in many programming languages such as Java, javascript, php, C++, Ruby etc. But for now we will see it in python.
Python provides a module re that is used for regular expression. For further details about re module refer this link .
Importing re module
The syntax for importing re module is –
1 2 3 |
import re |
This way we can import re module.
Now, we will discuss about an example of how RegEx works ?
How RegEx works ?
To understand how RegEx works, we will take an example. Let’s consider a string –
1 2 3 4 5 6 |
fruit = ''' Guavas are 20 and Apples are 50 Grapes are 60 and Mangoes are 50 ''' |
Now, from this string we have to find out only useful data that is fruit’s name and their quantity. So what will we do for this?
First of all we have to identify the pattern, and with the help of regular expression convert it into a dictionary and that dictionary will contain only fruit’s name and its quantity. Hence the code will be as below –
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import re #String fruit = ''' Guavas are 20 and Apples are 50 Grapes are 60 and Mangoes are 50 ''' #search pattern fruit_quantity = re.findall(r'\d{1,3}', fruit) fruit_name = re.findall(r'[A-Z][a-z]*', fruit) fruitDict = {} a = 0 for each_fruit_name in fruit_name: fruitDict[each_fruit_name] = fruit_quantity[a] a+=1 print(fruitDict) |
What We Did ?
- We have defined a string that contain some data.
- Then we have defined search pattern. You can easily notice that every name start with an uppercase letter and the quantity is represented with integers.
- With the help of regular expression you can actually find out the two digit numbers and the name of fruits that starts with uppercase letter and then lowercase.
- Then we have declared a dictionary and initialize a variable a=0.
- And then we have start a for loop that will fetch the desired data into dictionary.
- And now simply print the dictionary.
The output of the above code is –
And hence you can see, only all the names and quantities have collected. So guys if you are getting difficulties to understand the syntax and all things are stucking on your mind then don’t worry i will explain all things in detail in next section. It was just for example and now we will see various operations that can be performed with regular expression. So let’s see them.
Regular Expression Operations
We can perform various operations with RegEx. They are listed as below –
- Generate an iterator
- Find a word in a string
- Match a single character
- Replace string
- Match series of range of characters
- Match words with a particular pattern
And now we will discuss all of them one by one. So let’s begin –
Generate an Iterator
Now we will see how to generate an iterator. For this, we have to get starting and ending index of the word that we are searching. Let’s take a string as –
1 2 3 |
str = "we are learning regular expression in python." |
Here i am searching word regular , so it will give me the starting and ending index for all the matches it finds. And finally the code will be –
1 2 3 4 5 6 7 8 9 10 |
import re str = "we are learning regular expression in python." #Generating iterator for i in re.finditer("regular", str): locTuple = i.span() print(locTuple) |
What we Did ?
- First of all import re module.
- Then define a string .
- finditer() method is used to generate iterator and it will return the iterator of matching object. Basically i am printing starting and ending index of matching object. We have to pass the word which is going to be searched and the string as parameter.
- Here i want to it converted into tuple so for this i have defined a variable and call span() method. This method returns 2-tuples i.e., starting index and ending index of the matching objects.
- And finally print the tuple.
So the output for this will be as follows –
So, as you can see that the starting index is 16 and ending index is 23. And this is basically for generating iterator using finditer().
Find a Word in a String
Now we will see how to find a particular word in a string. For this what we have to do ?
1 2 3 4 5 6 7 8 |
import re str= re.findall("regular", "we are learning regular expression in python, regular expression is basically a special text string for describing the search pattern. ") for i in str: print(i) |
What we Did ?
- We have to find word regular from the given string.
- findall() method return all the matches from the string.
- We have to pass searching word and the string from which the word is to be searched , as a parameter in findall() method.
- And then started a for loop in order to find all the matches.
The output of the above code is –
You can see regular is printed twice because it is found two times in the entered string.
Match a Single Character
Now we will match a single character from the string. For this write the following code.
1 2 3 4 5 6 |
import re string = "12345" print("Matches:", len(re.findall("\d{4}", string))) |
What We Did ?
- First of all define a string.
- Then print the matches.
- \d will match any numbers which are present.
- For matching a specific digit we have to insert the character inside the curly braces {} .
- For example, here i am matching the number 4.
So the output will be –
As you can see there is one match found.
Replace String
Now we will learn how to replace a particular string. Let’s take a string like “How are you?” and we want to replace the word How into Who. So what will we do for that –
1 2 3 4 5 6 7 8 9 10 |
import re text = "Who are you" reg = re.compile("[W]ho") text = reg.sub("How",text) print(text) |
What We Did ?
- first of all define a string.
- Then compile the pattern. re.compile() function compile pattern into pattern object.
- re module provide the sub() method to replace string in RegEx.
- In sub() method we pass two arguments one is replacing string and another is the string on which we want to substitute.
- Then simply print the text.
On running this code we will clearly see that the string has been replaced from “Who are you” to “How are you”.
Match series of range of characters
What does it mean, we will understand it with an example. Let’s consider a string “sad, mad, bad, dad” . Now we have to print all the words whose first letter is between the range of b-m and it should be ended at ad. For doing that we write the following code-
1 2 3 4 5 6 7 8 9 10 |
import re text = "sad, mad, bad, dad" allString = re.findall("[b-m]ad",text) for i in allString: print(i) |
What We Did ?
- First of all we have defined a string.
- And then define a pattern. [b-m]ad is pattern that means the first letter of word should be from the range b-m and ended at ad.
- Then create a for loop to read all string and simply print it.
On running the above code we will get the output –
Now if we want to print all the words that starts apart from the range b-m. For this we have to include ^ symbol as follows –
1 2 3 |
allString = re.findall("[^b-m]ad",text) |
Match Words With a Particular Pattern
Now we will see how to match words with a particular pattern. Consider a string “sad, mad, bad, dad” and observe the pattern. You will notice that ad is common in all.To perform this operation we have to write following codes-
1 2 3 4 5 6 7 8 9 10 |
import re text = "sad, mad, bad, dad" allString = re.findall("[smbd]ad",text) for i in allString: print(i) |
What We Did ?
- Define a string.
- Now i want to match anything that ends with ad. So for that define a variable and call the findall() method. Inside findall() method pass the pattern.
- The pattern is [smbd]ad. [smbd] shows that i am matching specifically for the words that started with s, m, b and d. ad shows that words which ends with ad.
- start a for loop to read the words and simply print it.
On running this code the output will be –
Afterall all the operation has been completed successfully.
So guys this was all about Python Regular Expression Example. I hope you have learned lots of about RegEx in python but if you have any query just put it in comment. And please share this tutorial with your friends and python learners. In the upcoming tutorial we will learn various RegEx applications , till then stay tuned with Simplified Python. Thanks
#search pattern
fruit_quantity = re.findall(r’\d{1,3}’, fruit)
fruit_name = re.findall(r'[A-Z][a-z]*’, fruit)
fruitDict = {}
a = 0
for each_fruit_name in fruit_name:
fruitDict[each_fruit_name] = fruit_quantity[a]
a+=1
print(fruitDict)
Hi,
I am beginner of Python hence I don’t have much idea about syntax.
So. Request you to please explain re.findall syntax in detail if possible.
Thanks for sharing much useful information.
Regards:
Nirav Thanki
If you will follow the post till end then you will understand whole things.