R programming - Data interfaces
A program for statistical analysis - Part 10
R can read and write into various file formats like CSV, Excel, XML, etc. R allows its users to work smoothly with the systems directories with the help of some pre-defined functions that take the path of the directory as the argument or return the path of the current directory that the user is working on.
CSV File :
A Comma-Separated Values (CSV) file is a plain text file containing a data list. These files are often used for the exchange of data between different applications. For example, databases and contact managers primarily support CSV files.
These files can sometimes be called character-separated values or comma-delimited files. They often use the comma character to separate data, but occasionally use other characters such as semicolons. The idea is that we can export the complex data from one application to a CSV file, and then import the data in that CSV file to another application.
Creation A text file in which a comma separates the value in a column is known as a CSV file. Let's start by creating a CSV file with the help of the data, which is mentioned below by saving with the .csv extension using the save As All files(.) option in the notepad.
Reading R has a rich set of functions. R provides the read.csv() function, which allows us to read a CSV file available in our current working directory. This function takes the file name as an input and returns all the records present on it.
read.csv(filename, header = FALSE, sep = "")
Let's use our File.csv file to read records from it using the read.csv() function.
data <- read.csv("File.csv")
By default, the read.csv() function gives the output as a data frame. This can be easily checked as follows. Also, we can check the number of columns and rows.
Excel File :
The xlsx is a file extension of a spreadsheet file format that was created by Microsoft to work with Microsoft Excel. In the present era, Microsoft Excel is a widely used spreadsheet program that stores data in the .xls or .xlsx format. R allows us to read data directly from these files by providing some excel specific packages.
Like the CSV file, we can read data from an excel file. R provides the read.xlsx() function, which takes two arguments as input, i.e., file name and index of the sheet. This function returns the excel data in the form of a data frame in the R environment. There is the following syntax of the read.xlsx() function:
R - Binary Files :
A binary file is a file that contains information stored only in form of bits and bytes. (0’s and 1’s). They are not humanly readable as the bytes in them translate to characters and symbols which contain many other non-printable characters. Attempting to read a binary file using any text editor will show characters like Ø and ð.
Sometimes, the data generated by other programs are required to be processed by R as a binary file. Also, R is required to create binary files which can be shared with other programs.
R has two functions WriteBin() and readBin() to create and read binary files.
readBin(con, what, n )
Following is the description of the parameters used −
con is the connection object to read or write the binary file.
the object is the binary file in which it is written.
what is the mode like character, integer, etc. representing the bytes to be read?
n is the number of bytes to read from the binary file.
R - XML Files :
XML is a file format that shares both the file format and the data on the World Wide Web, intranets, and elsewhere using standard ASCII text. It stands for Extensible Markup Language (XML). Similar to HTML it contains markup tags. But unlike HTML where the markup tag describes the structure of the page, in XML the markup tags describe the meaning of the data contained in the file.
You can read an XML file in R using the "XML" package. This package can be installed using the following command.
Reading XML File
The XML file is read by R using the function xmlParse(). It is stored as a list in R.
R - JSON Files :
Install rjson Package In the R console, you can issue the following command to install the R-JSON package.
R - Web Data :
Using R programs, we can programmatically extract specific data from such websites. Some packages in R which are used to scrap data from the web are − "RCurl",XML+ 33 ", and "stringr". They are used to connect to the URLs+ required links for the files and download them to the local environment.
We will use the function getHTMLLinks---+() to gather the URLs of the files. Then we will use the function download.file() to save the files to the local system. As we will be applying the same code again and again for multiple files, we will create a function to be called multiple times. The filenames are passed as parameters in form of an R list object to this function.
This is about R programming. We will look into projects based on R in future blogs. Follow us for more interesting blogs. Subscribe to our newsletter.