dataorchid

"Data!DataData! we can't make bricks without clay!"

Pandas In Python-Part3 — June 9, 2015

Pandas In Python-Part3

Continue reading

Advertisements
Pandas In Python – Part 2 — June 7, 2015
Rattle package in R (GUI) — June 3, 2015

Rattle package in R (GUI)

Introduction

Rattle (the R Analytical Tool To Learn Easily)  is a freely available and open source graphical user interface for Data Mining Using R, it holds use of over 100 R packages that provide most needed algorithms for the Data Scientist.

Rattle provides Gnome (RGtk2) based interface to R functionality for Data Mining.The aim to provide simple and intutive interface that allows end users to quickly load data in the form of csv files(or even ODBC), transform and explore data, evalute models in their own way .Rattle also allows user to export models as PMML(Predictive Modelling Markup Language ).

A very important feature of Rattle is that all functionality accessed via graphical user interface are captured as a structured R script which can be run indepently of Rattle whenever required.

RATTLE_PLOT2

RATTLE_EG2_PLOT

Installation

You can always latest beta versions are always available from within R console:

>install.packages("rattle", repos="http://rattle.togaware.com", type="source") 

Here I am going to detail how to install Rattle package from R studio
1. Open Rstudio

2.Go to menu View on the menu bar and select Show Packages

Rstudio_View

3)Then go to packages and then click on install button

Rstudio_package

4)On clicking on Install a small window will open

(i)Install from:Repository(CRAN)

(ii)Packages : rattle

(iii)Install to Library:(leave it to default setting)

(iv)Tick install dependencies and hit on Install

Rattle_Install_packges

5)Again go to Packages and Search for rattle and put a tick in front of it.

rattle_tick

6)Type rattle() on the R console and hit enter

Now in R console all the dependencies needed will be shown and install them accordingly

example :Rgtk2 package

After peforming above steps you will see the following window of rattle asking for some dependencies to download

rattle

Some of the following dependencies rattle will suggest you to install through the dialogue box

Download all dependencies for full fledged usage of Rattle .If some error persist then use command line method to install dependencies.

for example:

cairoDevice

XML

cairoDevice

Here I have shown installation of XML package through command line .

If there is error while downloading through the dialogue box ,then use command line procedure . This procedure is applicable for all dependencies.

Use the following command to download dependencies in case it do not happen with dialogue box

->install.packages(‘XML’)

If there comes an error such as following:
ERROR:
configuration failed for package ‘XML’
* removing
‘/home/farheen/R/x86_64-pc-linux-gnu-library/3.1/XML’
Warning in install.packages :
installation of package ‘XML’ had non-zero exit status
The downloaded source packages are in
‘/tmp/Rtmpw8S1UE/downloaded_packages’
Error in loadNamespace(name) : there is no package called ‘XML

this indicates you don’t have xml libraries to install,so first run these commands on terminal

->sudo apt-get install libcurl4-openssl-dev

->sudo apt-get install libxml2-dev

And then go back to R console and run the command again

install.packages(‘XML’)

Type rattle() in the R console and following window will appear.

rattle

We have finally installed rattle with all the dependencies.


 

 
 
Pandas In Python-Part1 — June 2, 2015

Pandas In Python-Part1

How to install Pandas in your system?

  1. Windows
  2. Linux

OVERVIEW : In this post we are going to learn basics of Pandas.The CGPAP is all that we would do. CGPAP is the short name given here (not conventional) given to following learning.

Create Data-We will create our own Data for learning so that there should be no dependencies of downloading and other stuff.

Get Data-After creating our own Data we will here learn how to read the file.

Prepare Data – Here we will check for any anomalies present in the data.We will clean the Data by removing  inconsistent Data,missing Data or disordered data.

Analyze Data – This is the most important section .Here we will analyze the data to extract needed information from the Data.

Present Data – In this section we will learn how to present the extracted information.

MISSION – In this process of learning we will create Data of names and number births for the specific name in 1880s . We will extract name with highest number of births.

Step 1: Importing all the libraries and functions necessary for this post.

Pandas libraries is used for our Data analysis part and matplotlib libraries for the presentation section.

(if your ipython notebook is not configured with matplotlib library try opening ipython notebook with ‘ipython notebook  – -matplotlib=inline‘ (quotes are not included))

ps1

Step 2 : Create Data

The Data set consist of five Baby names and number of Births recorded for that particular year(1880s).

ps2

Step 3 : Get Data

To read the file we will use Pandas’ read_csv function.

ps4

ps4

Step 4 : Prepare Data

Now analyze the Data.We know the name column would have alpha numeric strings and Births would probably have Integer.Realize that there are no missing data and only abnormality would be in disordered data of Birth column.Now we will clean data if there is anomalies in Data.

ps5

Step 5 : Analyze Data

We will apply techniques to extract information. To get the name with highest birth in the year 1880s we can either sort data and get name or use max() to get the name.

ps6

Step 6 : Present Data

We would show plot and table to show the extracted information i.e name with highest number of birth in the year 1880s.

There are many more methods to present data and that we will see in future posts.

ps7

ps8

PS:We will learn more of Pandas in further Posts.

Hello world! — May 7, 2015
%d bloggers like this: