R is a programming language and open-source software environment that is widely used for statistical computing, data analysis, and graphics. It was developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and it was first released in 1995. R provides a comprehensive set of tools for manipulating, visualizing, and modelling data, making it a favourite among statisticians, data scientists, researchers, and analysts.
Key Features of R:
1. Data Manipulation: R offers powerful data manipulation capabilities, allowing you to clean, transform, and preprocess data easily. Packages like `dplyr` and `tidyr` provide functions for efficient data wrangling.
2. Statistical Analysis: R provides an extensive range of statistical functions and libraries for performing various types of analyses, including regression, hypothesis testing, ANOVA, and more.
3. Visualization: R is known for its exceptional visualization capabilities. The `ggplot2` package is widely used for creating high-quality, customizable graphs and plots.
4. Machine Learning: R has a growing ecosystem of machine learning libraries, such as `caret`, `randomForest`, and `xgboost`, enabling you to build and train predictive models.
5. Packages and Libraries: R's strength lies in its vast collection of packages and libraries contributed by the R community. These packages cover a wide range of domains, from bioinformatics to finance.
6. Data Import/Export: R can handle various data formats, including CSV, Excel, JSON, and databases. It also provides functions for reading and writing data.
7. Reproducibility: R promotes reproducible research by allowing users to create scripts that document and automate their data analysis and visualization processes.
8. Interactive Environment: R provides an interactive environment through the R Console or integrated development environments (IDEs) like RStudio, making it user-friendly and efficient for data exploration.
Where and When to Use R:
1. Data Analysis and Exploration: R is ideal for exploring and analyzing data to gain insights, identify trends, and understand underlying patterns.
2. Statistical Modeling: When you need to perform complex statistical analyses, hypothesis testing, and create statistical models, R is a suitable choice.
3. Data Visualization: If you want to create publication-quality graphs and visualizations, R's `ggplot2` package allows you to customize and control every aspect of your visualizations.
4. Academic and Research Projects: R is extensively used in academia and research for conducting experiments, analyzing data, and presenting findings.
5. Machine Learning and Predictive Modeling: R's machine learning libraries enable you to build predictive models for classification, regression, clustering, and more.
6. Biostatistics and Healthcare: R is popular in the medical and healthcare fields for analyzing clinical trial data, epidemiology, and bioinformatics.
7. Financial Analysis: R is used for quantitative analysis, risk management, and portfolio optimization in the finance industry.
8. Data Science: R plays a significant role in data science projects, where it's used for data preprocessing, feature engineering, modeling, and visualization.
In summary, R is a versatile and powerful tool for data analysis, statistical computing, and visualization. Its wide range of packages and strong statistical capabilities make it a popular choice for individuals and organizations working with data.
To install R and RStudio on your computer, follow these steps:
Installing R:
1. Windows:
- Visit the CRAN (Comprehensive R Archive Network) website for Windows: https://cran.r-project.org/bin/windows/base/
- Click on the "Download R for Windows" link.
- Choose a CRAN mirror location (usually the first option).
- Download the executable installer file (e.g., R-4.1.1-win.exe).
- Run the installer and follow the installation prompts.
2. macOS:
- Visit the CRAN website for macOS: https://cran.r-project.org/bin/macosx/
- Download the latest version of R for macOS.
- Open the downloaded disk image (`.pkg` file).
- Follow the installation prompts.
3. Linux:
- Depending on your Linux distribution, you can usually install R using your package manager. For example, on Ubuntu, you can open the terminal and run:
```
sudo apt-get update
sudo apt-get install r-base
```
Installing RStudio:
1. Windows, macOS, Linux:
- Visit the RStudio download page: https://www.rstudio.com/products/rstudio/download/
- Scroll down to the "Installers for Supported Platforms" section.
- Choose the appropriate installer for your operating system (RStudio Desktop Open Source Edition).
- Download and run the installer.
- Follow the installation prompts.
Running R and RStudio:
1. R:
- After installing R, you can run it by opening the R console. On Windows, this is typically called "R GUI" or "Rterm" in the Start Menu. On macOS and Linux, you can open the terminal and type `R` to start the R console.
2. RStudio:
- After installing RStudio, you can run it by searching for "RStudio" in your application launcher (Windows) or using Spotlight search (macOS). On Linux, you can use the terminal to launch RStudio by typing `rstudio`.
Once you have both R and RStudio installed, you can start using R for data analysis, statistical modeling, and more. RStudio provides a user-friendly interface and enhanced features for working with R scripts, projects, and visualizations.
Remember to periodically check for updates to both R and RStudio to ensure you're using the latest versions with the most up-to-date features and bug fixes.
Some example code:
print("R")
x <- 5
y <- 10
total <- x + y
plot(1:total)
var1 = "machine"
var2 = "leaerning"
cat(var1, " ", var2)
# data types
num = 20
intnum = 20L
print(num)
class(num)
print(intnum)
class(intnum)
logic <- TRUE
class(logic)
char <- "a"
class(char)
# converting data types
a <- 25L
class(a)
num1 <- as.numeric(a)
class(num1)
print(num1)
b <- '25'
num2 <- as.numeric(b)
class(num2)
print(num2)
num3 <- as.numeric(TRUE)
num3
num4 <- as.numeric(FALSE)
num4
num5 <- as.integer(45.564)
num5
num6 <- as.integer("a")
num6
log1 <- as.logical(56L)
log1
com1 <- as.complex(234.09)
com1
chr1 <- as.character(345)
chr1
x
if (x %% 2 == 0) {
print("The number is even")
} else {
print("The number is odd")
}
# operators
a = 5
b = 10
print(a+b)
print(a-b)
print(a*b)
print(b/a)
print(b%%a)
print(b%/%b)
c1 <- c(10, 20, 30)
c1
x <- 2
while(x < 6) {
print(x)
x <- x + 1
}
repeat {
print(x)
x <- x + 1
if(x > 10) {
break
}
}
# next and break
# reading from user input
age <- readline("what is your age?")
49
age
nam <- readline("what is your name?")
nam
print(paste("hello my name is:", nam, " and age is ", age))
# function
new_func <- function() {
for(i in 1:5) {
print(i + 2)
}
}
new_func()
func2 <- function(x, y) {
res <- x * y
print(paste("x: ", x, " y:", y))
print(paste("res: ", res))
}
func2(5, 8)
# substring
a <- "dhiraj patra"
res <- substr(a, 3, 7) # all inclusive
print(toupper(res))
# regular expression
s2 <- c("abc", "abcde", "abcdef")
pat <- '^abc'
print(grep(pat, s2))
# list in memory
ls()
# with pattern
ls(pattern = "n")
ls.str()
n <- 0.5
n1 <- 10
n2 <- 100
nam <- "Carmen"
# dataframe
M <- data.frame(n1, n2, n)
ls.str(pat = "M")
getwd()
setwd("/Users/Admin/Desktop/personal/courses/EICT_academy_IIT_Guhati/R")
my_data <- read.csv("sales_Data.csv")
x <- 1:30
print(x[1:30])
# sequence
y <- seq(1, 5, 0.5)
y
# take the data from keyboard
z <- scan()
z
# gausian sequence
g <- rnorm(n, mean = 0, sd = 1)
g
# factors
factor(1:3)
factor(1:10, exclude = 5)
# matrix
matrix(1:6, 2, 3)
matrix(1:6, 2, 3, byrow = TRUE)
# dimension
x <- 1:15
x
dim(x)
dim(x) <- c(5, 3)
x
x <- 1:4; n <- 10; M <- c(10, 35); y <- 2:4
data.frame(x, n)
data.frame(x, M)
# list
L1 <- list(x, y); L2 <- list(A=x, B=y)
L1
L2
# expression
x <- 3; y <- 2.5; z <- 1
exp1 <- expression(x / (y + exp(z)))
exp1
eval(expr = exp1)
# factor
fac2 <- factor(c("Male", "Female"))
fac2
as.numeric(fac2)
fac <- factor(c(1, 10))
fac
as.numeric(fac)
as.numeric(as.character(fac))
# logicalcomparison
x <- 0.5
0 < x < 1
x <- 1:3; y <- 1:3
x == y
x <- 1:10
x[x >= 5] <- 20
x
# object
names(x) <- c("a", "b", "c")
x
# predefined functions
sum(x)
prod(x)
max(x)
min(x)
mean(x)
median(x)
# graphics
layout(matrix(1:4, 2, 2))
mat <- matrix(1:4, 2, 2)
mat
layout(mat)
layout.show(4)
m <- matrix(c(1:3, 3), 2, 2)
layout(m)
layout.show(3)
plot(x)
boxplot(x)
pie(x)
hist(x)
barplot(x)
x <- rnorm(10)
y <- rnorm(10)
plot(x, y)
# loop
for (i in 1:length(x)) {
y[i] = i * 2
}
y
i <- 0
while (i < 10) {
print(y[i])
i + 1
}