OVERVIEW : In this post we are going to learn basics of Pandas.The CGPAP is all that we would do. CGPAP is the short name given here (not conventional) given to following learning.

Create Data-We will create our own Data for learning so that there should be no dependencies of downloading and other stuff.

Get Data-After creating our own Data we will here learn how to read the file.

Prepare Data – Here we will check for any anomalies present in the data.We will clean the Data by removing  inconsistent Data,missing Data or disordered data.

Analyze Data – This is the most important section .Here we will analyze the data to extract needed information from the Data.

Present Data – In this section we will learn how to present the extracted information.

MISSION – In this process of learning we will create Data of names and number births for the specific name in 1880s . We will extract name with highest number of births.

Step 1: Importing all the libraries and functions necessary for this post.

Pandas libraries is used for our Data analysis part and matplotlib libraries for the presentation section.

(if your ipython notebook is not configured with matplotlib library try opening ipython notebook with ‘ipython notebook  – -matplotlib=inline‘ (quotes are not included))


Step 2 : Create Data

The Data set consist of five Baby names and number of Births recorded for that particular year(1880s).


Step 3 : Get Data

To read the file we will use Pandas’ read_csv function.



Step 4 : Prepare Data

Now analyze the Data.We know the name column would have alpha numeric strings and Births would probably have Integer.Realize that there are no missing data and only abnormality would be in disordered data of Birth column.Now we will clean data if there is anomalies in Data.


Step 5 : Analyze Data

We will apply techniques to extract information. To get the name with highest birth in the year 1880s we can either sort data and get name or use max() to get the name.


Step 6 : Present Data

We would show plot and table to show the extracted information i.e name with highest number of birth in the year 1880s.

There are many more methods to present data and that we will see in future posts.



PS:We will learn more of Pandas in further Posts.