Hello Everyone, this post is about a very important data analysis python library i.e., Pandas. So welcome to Python Pandas Tutorial. In this tutorial you will learn some basics of pandas, dataframes, different ways of creating dataframes, reading and writing csv and excel files and many more. So let’s start python pandas without wasting of time.
Contents
But before proceeding to further, first of all we need to understand the concept of Data Analysis. So let’s give a quick look on data analysis.
Python Pandas Tutorial – What Is data Analysis ?
Introduction
Data Analysis is a process of extracting useful, relevant and meaningful informations from observations in a systematic manner.
Data analysis is done for the following purposes –
- Parameter Estimation (inferring the unknowns)
- Model Development and Prediction (Forecasting)
- Feature Extraction (identifying patterns) and classification
- Hypothesis testing (Verification of postulates)
- Fault detection (process monitoring)
Types Of Data For Data Analysis
In data analysis, mainly two types of data –
- Deterministic (non-random)
- Stochastic (non-deterministic)
Data Life Cycle
In the above figure, you can see that data stored in different formats. It can be a csv file, excel file, html file or any others. So data is basically stored in different formats. Then you have to convert all these data into a single format and store it in somewhere, this is called Data Warehousing.
Now once you have stored data you can perform certain analysis on it such as predictive modeling, join or merge data and many others things. After analysis, you can even plot it in a graph and that stage is called Data Visualization.
So, this is a general overview of Data Life Cycle.
Why Data Analysis ?
Now we will see why data analysis is useful, with an example.
Let’s consider we have a data set, in which we have data about weather information across the globe from 2015 – 2018. We have country wise weather data from 2015 – 2018. So there are percentage of rain within that particular country, we have data about that in data set.
Now, what if you want to find only a particular country’s data. In this example, let’s say America, and in that particular country you want to find percentage of rain between 2016 – 2017. Now what should you do. So basically what you need to do is, in the given particular data set you need to perform certain analysis.That analysis should give you percentage of rain in America between 2016 – 2017. And this is called Data analysis.
So this basically explains – what is data analysis and why we use it ?
So till now we have discussed about data analysis, but now we will discuss about how to do data analysis in python. So let’s move ahead.
Python Pandas Tutorial – Introduction To Pandas
What Is Pandas ?
Pandas is an open source python library providing high – performance, easy to use data structures and data analysis tools for python programming language.
- It is very popular library for data science.
- It runs on top of NumPy.
- The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.
- Wes McKinney is developer of pandas and developed in 2008.
- The cool thing about Pandas is that it takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and columns called data frame that looks very similar to table in a statistical software (think Excel or SPSS for example).
Features Of Pandas
Pandas has following features –
- High-level data structures (data frames)
- More streamlined handling of tabular data and rich time series functionality.
- Data alignment, missing-data friendly statistics, groupby, merge and join methods.
- You can use pandas data structures, and freely draw on NumPy and SciPy functions to manipulate them.
Pandas Data Types
Pandas is well suited for many kinds of data such as –
- Tabular data with heterogeneously-typed columns
- Arbitrary matrix data with row and column labels
- Ordered and unordered time series data
- Any other form of observational / statistical data sets
Python Pandas Tutorial – Getting started With Pandas
So now in this section, we will learn to implement pandas in python.
Creating New Project
First of all open your IDE and create a new project and inside this project create a new python file. In my case, my project is like this –
Installing Pandas Module
So working with pandas in python, you have to install pandas module.
- Go to terminal and run following command.
1 2 3 |
pip install pandas |
Now pandas module has been installed successfully and now you can work with it.
Importing Pandas
Now you have to import pandas module in your project. So write the following code.
1 2 3 |
import pandas as pd |
- pd is an alias of pandas, because using pandas everytime is not a good way.
- By importing pandas, you can use all the classes and methods of pandas module.
Implementing Pandas In python
So now, you have to do following things to implement pandas –
- First of all import pandas module so that you can use all the classes and methods of pandas.
- Then you have to create a dataframe.
- Dataframe is a main object in pandas. Dataframe is a data structure which is used to represent tabular data such as excel files, csv files etc.
- There are many ways to create dataframe and i will discuss it later.
- Now write the following code snippets.
1 2 3 4 5 6 7 8 9 10 11 |
import pandas as pd #Create a dictionary which contains information about fruits fruit_dict = {"Name":['Mango','Apple','Banana'], "Quantity":[1,4,8]} #Create a dataframe df = pd.DataFrame(fruit_dict) print(df) #print the dataframe |
- Here i created a dictionary which contains some informations about fruits.
- Then created a dataframe using dictionary.
Now let’s see the result.
- Here you can see in this dataframe, you have columns and rows.
- Name and Quantity are column headers.
- 0,1,2,3 are the default index assigned to each using the function range(n).
So now you have learnt how to work with pandas. Now we will discuss about different ways of creating dataframes. So let’s start.
Python Pandas Tutorial – Different Ways Of Creating Dataframes
we can create pandas dataframes in the following ways, so let’s discuss them one by one.
- From python Dictionaries
- From list of tuples
- From list of dictionaries
- Using CSV files
- Using Excel files
From Python Dictionaries
Creating dataframes from python dictionaries is very easy.
- first of all you need to create a dictionary.
- Then pass this dictionary as argument in DataFrame() method.
- Then simply print the dataframe.
So write the following code snippets to create dataframes from python dictionary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import pandas as pd #Create a dictionary which contains book's details book_dict = {'Name':['Python for Beginners','Aab-e-Hayat','Super Economies','True Colors'], 'Author':['Hilary','Umera','Raghav Bahal','Adam Gilchrist'], 'Price':[900,600,400,500], 'Quantity':[2,8,90,70]} print("\n\t\tDataframe From Python Dictionaries\n") #Create dataframe from python dictionary df = pd.DataFrame(book_dict) print(df) |
- If you pass the index, then the length of the index should equal to the length of the arrays.
- If you don’t pass index, then by default, index will be range(n), where n is the array length.
- Here i have not passed any index.
Result
- So here you can see 0,1,2,3 are the indexes which are default and assigned to each row using the function range(n).
And now if you want to create an indexed dataframe then pass the index parameter while creating dataframe. So write the following code for doing this.
1 2 3 4 |
print("\n\t\tIndexed Dataframe From Python Dictionaries\n") df = pd.DataFrame(book_dict,index=['Sr.1','Sr.2','Sr.3','Sr.4']) |
Result
And now you can see that the index parameter assigns an index to each row.
From List Of Tuples
For creating pandas dataframes from list of tuple, you need to do following tasks –
- Create a list in which each element of list will be tuple.
- This tuple is nothing but a row in your dataframe.
- Then pass the columns parameter inside the Dataframe() method and specify the column names.
- So write the following code to implement it practically.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
import pandas as pd #List of Tuples means each element in the list is tuple book_data = [('Python for Beginners','Hilary',900,2), ('Aab-e-Hayat','Umera',600,8), ('Super Economies','Raghav Bahal',400,90), ('True Colors','Adam Gilchrist',500,70)] print("\n\t\tDataframe From List of Tuples\n") # Dataframe from list of tuples df = pd.DataFrame(book_data,columns=['Name','Author','Price','Quantity']) print(df) |
Result
Once again it created a dataframe successfully, and this is the second way of creating pandas dataframes. Let’s move forward and see the another ways.
From List Of Dictionaries
You can also create pandas dataframe from list of dictionaries.
The difference between creating dataframe from dictionary and list of dictionaries is that –
- In creating dataframe from dictionary, each key contain values i.e., row values. But in creating dataframe from dictionary, each element in the list represents one row along with column specification.
- You can see in the list, each record has column – value, column – value and so on
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import pandas as pd # List of dictionaries book_data = [{'Name':'Python for Beginners','Author':'Hilary','Price':900,'Quantity':2}, {'Name':'Aab-e-Hayat','Author':'Umera','Price':600,'Quantity':8}, {'Name': 'Super Economies', 'Author': 'Raghav Bahal', 'Price': 400, 'Quantity':90}, {'Name': 'True Colors', 'Author': 'Adam Gilchrist', 'Price': 500, 'Quantity': 70},] print("\n\t\tDataframe From List of Dictionaries\n") print('---------------------------------------------------------') #pass the book_data as an argument df = pd.DataFrame(book_data) print(df) |
- In this example, i have created a list of dictionaries that contains book data.
- Then passed this list as an argument in DataFrame() method.
Result
So the result of creating dataframe from list of dictionaries is here.
Using CSV Files
You can also create dataframes from CSV files. Let’s discuss how to do that.
Write the following code snippets for creating dataframes using CSV file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import pandas as pd # Reading CSV file books_data = pd.read_csv("Books.csv") print("\n\t\tDataframe Using CSV Files \n") print('---------------------------------------------------------') # Pass books_data as an argument df = pd.DataFrame(books_data) print(df) |
- For creating dataframes using CSV files, first of all you have to read CSV file, for more details check Python CSV Reader Tutorial – Reading CSV Files with Python.
- read_csv() is a method that will read the csv into dataframe.
- If your csv file is not in the same folder where your program file is placed then you have to provide the proper path of that CSV file otherwise just pass the csv file name as argument in the read_csv() method.
- Now run the code and see the result.
Result
So this is our dataframe that is created by using CSV file.
Using Excel Files
Creating dataframes using excel files is pretty much similar to using csv files.
So write the following code.
1 2 3 4 5 6 7 8 9 10 11 12 |
import pandas as pd # Reading Excel File books_data = pd.read_excel("Books.xls","Sheet1") print("\n\t\tDataframe Using Excel Files \n") print('---------------------------------------------------------') df = pd.DataFrame(books_data) print(df) |
- read_excel() method is used to read the excel into dataframe.
- Here you have to pass one extra argument which is sheet because excel file contains sheets.
- Now run the code and check the result.
So basically these are the different way of creating pandas dataframe, some another ways are also present. If you want to explore them then follow this documentation.
And now we have completed python pandas tutorial successfully and learned lots of things.
Suggested Articles :
- Python Turtle Module – A Complete Guide For Creating Graphics In Python
- NodeJS vs Python : Which One Is Better
- Python NumPy Tutorial – Getting Started With NumPy
- Best Online Python Compilers – Learn Coding On Online Compilers
- Python NumPy Operations – Learn Numpy Operations With Examples
- Python Simple HTTP Server : A Simple HTTP Web Server With Python
So here, i am wrapping up Python Pandas Tutorial. In the next tutorial, you will learn python pandas operations that means what type of operations you can perform in pandas. Till then stay tuned with Simplified Python. And if you have any doubt regarding this tutorial then just leave your comments. Happy Coding 🙂
Thanks … good overview.
Very helpful clean and simple <3
Thanks Salman. Keep reading Simplified python’s tutorials.
Hi Gulsanober Saba
I have excel file which updating data in every second (having RTD).
My requirement is that, all data which get update in every second should be update in python at a same time.
So how should be coded this in python ?