If you are looking for a tutorial that can help you to learn Parsing XML in python using ElementTree Module then you are landed in right place. In XML Parsing In Python tutorial, you will learn to parse xml file in python. So keep reading this tutorial till the end so that you can have enough knowledge of XML Parsing.
Contents
XML Parsing In Python – What Is XML And XML Parsing
In this section, you will see some overview of XML and XML parsing so that you can understand the rest of this tutorial. So let’s get’s started.
What Is XML ?
- XML is short form of Extensible Markup Language.
- It is similar to HTML(Hyper Text Markup Language) but the big difference is that HTML is Data Presentation whereas XML defines What is the data that are going to be used.
- It is designed to store and transport data.
- XML can easily read by humans and machines.
How XML Looks Likes ?
So now you will see how actually the XML looks like. So this is the basic structure of an XML file.
1 2 3 4 5 6 7 |
<root> <child attributes="..."> <subchild>......</subchild> </child> </root> |
An XML tree starts at a root element and branches from the root to child elements. All elements can have sub elements (child elements)
Now let’s take an example(products.xml) –
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<?xml version="1.0" encoding="UTF-8"?> <productinfo> <Software > <item name="ABC">PDF Converter</item> <price>1000</price> <version>1.2</version> </Software> <Hardware > <item name="XYZ">Battery</item> <price>2000</price> <warranty>1 Year</warranty> </Hardware> </productinfo> |
What Is XML Parsing ?
Basically parsing means to read informations from a file and split them into pieces by identifying parts of that particular XML file.
XML Parsing In Python – Parsing XML Documents
In this section, we will see how can we read and write to a XML file. To parse XML data in python, we need parsing module. So let’s discuss XML parsing module in python.
ElementTree Module
ElementTree module is used to formats data in a tree structure which is the most natural representation of hierarchical data. The Element data type allows storage the hierarchical data structure in memory.
The Element Tree is a class that can be used to wrap an element structure, and convert it from and to XML. It has the following properties, let’s see what they are.
- Tag : It is a string that represents the type of data being stored..
-
Attributes: A number of attributes, stored in a Python dictionary.
-
Text String: A text string having informations that needs to be displayed.
-
Tail String: an optional tail string.
-
Child Elements: a number of child elements, stored in a Python sequence.
Parsing XML
We can do parse XML documents using two methods –
Using parse() method
This method takes XML in file format to parse it. So write the following code to use this method.
1 2 3 4 5 6 7 8 9 10 11 |
import xml.etree.ElementTree as et #parsing XML file my_tree = et.parse('products.xml') #Get root element my_root = my_tree.getroot() print(my_root) |
- first thing you need to do is to import xml.etree.ElementTree. Here i have created an alias name as et.
- After this, take a variable(my_tree) and call the parse() method. parse() method takes a parameter where you to pass your XML file.
- Then you have to fetch root element. getroot() method is used to fetch the root element of the entered XML file.
- Finally just print the my_root. Now run the code and get the output.
Output
Using fromstring()
- It parses XML supplied as a string parameter.
So let’s see how it works. Firstly, write the following code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
import xml.etree.ElementTree as et #Define xml string xml_data = '''<?xml version="1.0" encoding="UTF-8"?> <productinfo> <product title="Software"> <name>PDF Converter</name> <price>1000</price> <version>1.2</version> </product> <product title="Hardware"> <name>Battery</name> <price>2000</price> <warranty>1 Year</warranty> </product> </productinfo>''' # parse the xml string my_root = et.fromstring(xml_data) #print the root element print(my_root.tag) |
- Define a variable that store the XML string that you want to parse using fromstring() method. Specify XML string within triple quotes.
- Then call the formstring() method and pass the xml_data as a parameter.
- Finally print the root element and run the code.
Output
XML Parsing In Python – Finding Elements From XML Documents
In this section, you will see how actually you can find elements according to your interest. So let’s see how it can be done.
You can find various elements and sub-elements using tag, attrib, text etc as your interests.
Finding Tag Element
To find tag element of XML data, write the following code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import xml.etree.ElementTree as et my_tree = et.parse('products.xml') my_root = my_tree.getroot() print("Root Element : ", my_root.tag) # tag of first child of the root element print("First Child Of Root Element : ", my_root[0].tag) # print all the tags print("\nAll Tags : ") for a in my_root[0]: print(a.tag) |
Output
Finding Attribute Element
To find attribute elements of XML data, you have to write following code.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import xml.etree.ElementTree as et my_tree = et.parse('products.xml') my_root = my_tree.getroot() # print all attributes print("\nAll Attributes : ") for a in my_root[0]: print(a.attrib) |
Now run the code and see the output.
Finding Text Element
Now, if you want to find text elements of XML data then write following code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import xml.etree.ElementTree as et my_tree = et.parse('products.xml') my_root = my_tree.getroot() # print all the Texts print("\nAll Texts Of First Sub-Element : ") for a in my_root[0]: print(a.text) print("\nAll Texts Of Second Sub-Element : ") for a in my_root[1]: print(a.text) |
Output
So guys this was all about XML Parsing In Python. I hope you found it helpful and you have learned lots of things about XML parsing. And if you have any query regarding this you can comment me freely.