Rattle (the R Analytical Tool To Learn Easily) is a freely available and open source graphical user interface for Data Mining Using R, it holds use of over 100 R packages that provide most needed algorithms for the Data Scientist.
Rattle provides Gnome (RGtk2) based interface to R functionality for Data Mining.The aim to provide simple and intutive interface that allows end users to quickly load data in the form of csv files(or even ODBC), transform and explore data, evalute models in their own way .Rattle also allows user to export models as PMML(Predictive Modelling Markup Language ).
A very important feature of Rattle is that all functionality accessed via graphical user interface are captured as a structured R script which can be run indepently of Rattle whenever required.
You can always latest beta versions are always available from within R console:
>install.packages("rattle", repos="http://rattle.togaware.com", type="source")
Here I am going to detail how to install Rattle package from R studio
1. Open Rstudio
2.Go to menu View on the menu bar and select Show Packages
3)Then go to packages and then click on install button
4)On clicking on Install a small window will open
(ii)Packages : rattle
(iii)Install to Library:(leave it to default setting)
(iv)Tick install dependencies and hit on Install
5)Again go to Packages and Search for rattle and put a tick in front of it.
6)Type rattle() on the R console and hit enter
Now in R console all the dependencies needed will be shown and install them accordingly
example :Rgtk2 package
After peforming above steps you will see the following window of rattle asking for some dependencies to download
Some of the following dependencies rattle will suggest you to install through the dialogue box
Download all dependencies for full fledged usage of Rattle .If some error persist then use command line method to install dependencies.
Here I have shown installation of XML package through command line .
If there is error while downloading through the dialogue box ,then use command line procedure . This procedure is applicable for all dependencies.
Use the following command to download dependencies in case it do not happen with dialogue box
If there comes an error such as following:
ERROR: configuration failed for package ‘XML’ * removing ‘/home/farheen/R/x86_64-pc-linux-gnu-library/3.1/XML’ Warning in install.packages : installation of package ‘XML’ had non-zero exit status The downloaded source packages are in ‘/tmp/Rtmpw8S1UE/downloaded_packages’ Error in loadNamespace(name) : there is no package called ‘XML
this indicates you don’t have xml libraries to install,so first run these commands on terminal
->sudo apt-get install libcurl4-openssl-dev
->sudo apt-get install libxml2-dev
And then go back to R console and run the command again
Type rattle() in the R console and following window will appear.
We have finally installed rattle with all the dependencies.
How to install Pandas in your system?
OVERVIEW : In this post we are going to learn basics of Pandas.The CGPAP is all that we would do. CGPAP is the short name given here (not conventional) given to following learning.
Create Data-We will create our own Data for learning so that there should be no dependencies of downloading and other stuff.
Get Data-After creating our own Data we will here learn how to read the file.
Prepare Data – Here we will check for any anomalies present in the data.We will clean the Data by removing inconsistent Data,missing Data or disordered data.
Analyze Data – This is the most important section .Here we will analyze the data to extract needed information from the Data.
Present Data – In this section we will learn how to present the extracted information.
MISSION – In this process of learning we will create Data of names and number births for the specific name in 1880s . We will extract name with highest number of births.
Step 1: Importing all the libraries and functions necessary for this post.
Pandas libraries is used for our Data analysis part and matplotlib libraries for the presentation section.
(if your ipython notebook is not configured with matplotlib library try opening ipython notebook with ‘ipython notebook – -matplotlib=inline‘ (quotes are not included))
Step 2 : Create Data
The Data set consist of five Baby names and number of Births recorded for that particular year(1880s).
Step 3 : Get Data
To read the file we will use Pandas’ read_csv function.
Step 4 : Prepare Data
Now analyze the Data.We know the name column would have alpha numeric strings and Births would probably have Integer.Realize that there are no missing data and only abnormality would be in disordered data of Birth column.Now we will clean data if there is anomalies in Data.
Step 5 : Analyze Data
We will apply techniques to extract information. To get the name with highest birth in the year 1880s we can either sort data and get name or use max() to get the name.
Step 6 : Present Data
We would show plot and table to show the extracted information i.e name with highest number of birth in the year 1880s.
There are many more methods to present data and that we will see in future posts.
PS:We will learn more of Pandas in further Posts.
Hello! This is my first blog post.
This summer I have started to learn Data Science while interning at DecisionStats.org.I know that lot of my friend also want to do the same so I started posting the work I am doing to help those friends.