Showing posts with label r. Show all posts
Showing posts with label r. Show all posts

Sunday

R programming language introduction


 

R is a programming language and open-source software environment that is widely used for statistical computing, data analysis, and graphics. It was developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and it was first released in 1995. R provides a comprehensive set of tools for manipulating, visualizing, and modelling data, making it a favourite among statisticians, data scientists, researchers, and analysts.

Key Features of R:

1. Data Manipulation: R offers powerful data manipulation capabilities, allowing you to clean, transform, and preprocess data easily. Packages like `dplyr` and `tidyr` provide functions for efficient data wrangling.

2. Statistical Analysis: R provides an extensive range of statistical functions and libraries for performing various types of analyses, including regression, hypothesis testing, ANOVA, and more.

3. Visualization: R is known for its exceptional visualization capabilities. The `ggplot2` package is widely used for creating high-quality, customizable graphs and plots.

4. Machine Learning: R has a growing ecosystem of machine learning libraries, such as `caret`, `randomForest`, and `xgboost`, enabling you to build and train predictive models.

5. Packages and Libraries: R's strength lies in its vast collection of packages and libraries contributed by the R community. These packages cover a wide range of domains, from bioinformatics to finance.

6. Data Import/Export: R can handle various data formats, including CSV, Excel, JSON, and databases. It also provides functions for reading and writing data.

7. Reproducibility: R promotes reproducible research by allowing users to create scripts that document and automate their data analysis and visualization processes.

8. Interactive Environment: R provides an interactive environment through the R Console or integrated development environments (IDEs) like RStudio, making it user-friendly and efficient for data exploration.

Where and When to Use R:

1. Data Analysis and Exploration: R is ideal for exploring and analyzing data to gain insights, identify trends, and understand underlying patterns.

2. Statistical Modeling: When you need to perform complex statistical analyses, hypothesis testing, and create statistical models, R is a suitable choice.

3. Data Visualization: If you want to create publication-quality graphs and visualizations, R's `ggplot2` package allows you to customize and control every aspect of your visualizations.

4. Academic and Research Projects: R is extensively used in academia and research for conducting experiments, analyzing data, and presenting findings.

5. Machine Learning and Predictive Modeling: R's machine learning libraries enable you to build predictive models for classification, regression, clustering, and more.

6. Biostatistics and Healthcare: R is popular in the medical and healthcare fields for analyzing clinical trial data, epidemiology, and bioinformatics.

7. Financial Analysis: R is used for quantitative analysis, risk management, and portfolio optimization in the finance industry.

8. Data Science: R plays a significant role in data science projects, where it's used for data preprocessing, feature engineering, modeling, and visualization.

In summary, R is a versatile and powerful tool for data analysis, statistical computing, and visualization. Its wide range of packages and strong statistical capabilities make it a popular choice for individuals and organizations working with data.

To install R and RStudio on your computer, follow these steps:


Installing R:

1. Windows:

   - Visit the CRAN (Comprehensive R Archive Network) website for Windows: https://cran.r-project.org/bin/windows/base/

   - Click on the "Download R for Windows" link.

   - Choose a CRAN mirror location (usually the first option).

   - Download the executable installer file (e.g., R-4.1.1-win.exe).

   - Run the installer and follow the installation prompts.

2. macOS:

   - Visit the CRAN website for macOS: https://cran.r-project.org/bin/macosx/

   - Download the latest version of R for macOS.

   - Open the downloaded disk image (`.pkg` file).

   - Follow the installation prompts.

3. Linux:

   - Depending on your Linux distribution, you can usually install R using your package manager. For example, on Ubuntu, you can open the terminal and run:

     ```

     sudo apt-get update

     sudo apt-get install r-base

     ```

Installing RStudio:

1. Windows, macOS, Linux:

   - Visit the RStudio download page: https://www.rstudio.com/products/rstudio/download/

   - Scroll down to the "Installers for Supported Platforms" section.

   - Choose the appropriate installer for your operating system (RStudio Desktop Open Source Edition).

   - Download and run the installer.

   - Follow the installation prompts.

Running R and RStudio:

1. R:

   - After installing R, you can run it by opening the R console. On Windows, this is typically called "R GUI" or "Rterm" in the Start Menu. On macOS and Linux, you can open the terminal and type `R` to start the R console.

2. RStudio:

   - After installing RStudio, you can run it by searching for "RStudio" in your application launcher (Windows) or using Spotlight search (macOS). On Linux, you can use the terminal to launch RStudio by typing `rstudio`.

Once you have both R and RStudio installed, you can start using R for data analysis, statistical modeling, and more. RStudio provides a user-friendly interface and enhanced features for working with R scripts, projects, and visualizations.

Remember to periodically check for updates to both R and RStudio to ensure you're using the latest versions with the most up-to-date features and bug fixes.


Some example code:

print("R")

x <- 5

y <- 10

total <- x + y

plot(1:total)


var1 = "machine"

var2 = "leaerning"

cat(var1, " ", var2)


# data types

num = 20

intnum = 20L

print(num)

class(num)

print(intnum)

class(intnum)


logic <- TRUE

class(logic)


char <- "a"

class(char)


# converting data types

a <- 25L

class(a)


num1 <- as.numeric(a)

class(num1)

print(num1)


b <- '25'

num2 <- as.numeric(b)

class(num2)

print(num2)


num3 <- as.numeric(TRUE)

num3

num4 <- as.numeric(FALSE)

num4


num5 <- as.integer(45.564)

num5


num6 <- as.integer("a")

num6


log1 <- as.logical(56L)

log1


com1 <- as.complex(234.09)

com1


chr1 <- as.character(345)

chr1


x

if (x %% 2 == 0) {

  print("The number is even")

} else {

  print("The number is odd")

}


# operators

a = 5

b = 10

print(a+b)

print(a-b)

print(a*b)

print(b/a)

print(b%%a)

print(b%/%b)


c1 <- c(10, 20, 30)

c1


x <- 2

while(x < 6) {

  print(x)

  x <- x + 1

}


repeat {

  print(x)

  x <- x + 1

  if(x > 10) {

    break

  }

}


# next and break





# reading from user input

age <- readline("what is your age?")

49

age

nam <- readline("what is your name?")

nam


print(paste("hello my name is:", nam, " and age is ", age))


# function

new_func <- function() {

  for(i in 1:5) {

    print(i + 2)

  }

}


new_func()


func2 <- function(x, y) {

  res <- x * y

  print(paste("x: ", x, " y:", y))

  print(paste("res: ", res))

}


func2(5, 8)


# substring

a <- "dhiraj patra"

res <- substr(a, 3, 7) # all inclusive

print(toupper(res))


# regular expression

s2 <- c("abc", "abcde", "abcdef")

pat <- '^abc'

print(grep(pat, s2))


# list in memory

ls()

# with pattern

ls(pattern = "n")

ls.str()

n <- 0.5

n1 <- 10

n2 <- 100

nam <- "Carmen"


# dataframe

M <- data.frame(n1, n2, n)

ls.str(pat = "M")


getwd()

setwd("/Users/Admin/Desktop/personal/courses/EICT_academy_IIT_Guhati/R")

my_data <- read.csv("sales_Data.csv")


x <- 1:30

print(x[1:30])


# sequence 

y <- seq(1, 5, 0.5)

y


# take the data from keyboard

z <- scan()

z


# gausian sequence

g <- rnorm(n, mean = 0, sd = 1)

g


# factors

factor(1:3)

factor(1:10, exclude = 5)


# matrix

matrix(1:6, 2, 3)

matrix(1:6, 2, 3, byrow = TRUE)


# dimension

x <- 1:15

x

dim(x)

dim(x) <- c(5, 3)

x


x <- 1:4; n <- 10; M <- c(10, 35); y <- 2:4

data.frame(x, n)

data.frame(x, M)


# list

L1 <- list(x, y); L2 <- list(A=x, B=y)

L1

L2


# expression

x <- 3; y <- 2.5; z <- 1

exp1 <- expression(x / (y + exp(z)))

exp1

eval(expr = exp1)


# factor

fac2 <- factor(c("Male", "Female"))

fac2

as.numeric(fac2)


fac <- factor(c(1, 10))

fac

as.numeric(fac)

as.numeric(as.character(fac))


# logicalcomparison

x <- 0.5

0 < x < 1

x <- 1:3; y <- 1:3

x == y

x <- 1:10

x[x >= 5] <- 20

x


# object

names(x) <- c("a", "b", "c")

x


# predefined functions

sum(x)

prod(x)

max(x)

min(x)

mean(x)

median(x)


# graphics

layout(matrix(1:4, 2, 2))

mat <- matrix(1:4, 2, 2)

mat

layout(mat)

layout.show(4)


m <- matrix(c(1:3, 3), 2, 2)

layout(m)

layout.show(3)


plot(x)

boxplot(x)

pie(x)

hist(x)

barplot(x)


x <- rnorm(10)

y <- rnorm(10)

plot(x, y)


# loop

for (i in 1:length(x)) {

  y[i] = i * 2

}

y


i <- 0

while (i < 10) {

  print(y[i])

  i + 1

}