Monday, November 16, 2015

Objects in R & Python

R and Python are the most preferred languages for Data Science. Both are object oriented languages. Let us talk about the objects used in these languages. Firstly let us talk about the objects. Objects are the entities which store variables. These variables can store the values of different classes. Most common classes are numeric, integers, float, string, logical, and complex numbers.

Objects in R:

  1. Vector: Objects which can store the values of same class. 
  2. List: Objects which can store the values of different classes. 
  3. Matrix: Objects which can store the values of same class in two dimensions.
  4. Data Frame: Objects which can store the values of different class in two dimensions.

Objects in Python:

  1. List: Objects which can store the values of different classes. 
  2. Dictionary: Objects which can store the values of different classes and every value is distinguished with a key.
  3. Data Frame (Pandas Library): Objects which can store the values of different class in two dimensions.
Let us take an example and understand this in bit depth. I will take simple example of library catalogue where book information is available. Now how best we can store the information in objects so that we can use them in data analysis. 

Assume data available is book id, book name, author name, release date. Let us take the each object one by one and try to store the information:
Vector: It can store the values of same class in one dimension. So, we have to store the information in four vectors, each for book id, book name, author name, and release date. E.g. book_name = c("The Power of Now", "The Alchemist"...)
List: List can contain different variable classes and it can also contain vectors as well. It can contain all four vectors inside it. E.g. 
cat = list(book_id, book_name, author_name, release_date) (in R)
cat = [book_id, book_name, author_name, release_date] (in Python)
Matrix: It can't be stored in a matrix because class of variables are different.
Data Frame: This is ideal way to store the data and it is my favourite as well. You can think it like a table where we have four columns for each variable. E.g.
cat = data.frame(file) #If file has the text of folder path. 

To store values to objects, we use functions like c, list, data.frame and we are passing argument in these functions. This is the way to assign value to any object. You can think it like the functions you studied in 11th class where y=f(x). y is the object and f is the respective function and x is the argument passed to the function.

Note: In this article, I did not define lot of things and have made it vague so far. As it is my first article. I will update it soon for more specific information.