R-Basic Part V: Control Structures, Scoping in R, Dates and Times
Highlights:
- If-else expression
- For-loop, While-loop, and Repeat-loop
- Writing a Function and Return Value
- Lexical vs. Dynamic Scoping
- Dates and Times in R
Control Structures in R
Control structures in R allow us to control the flow of R program. Below is a list of some common basic constructs and a brief description of their jobs.
- if, else: testing a logical condition
- for: executing a loop certain number of times
- while: executing a loop until a condition remains true
- repeat: executing an infinite loop
- break: breaking the execution of a loop
- next: skips an iteration of a function
- return: exit a function
If-else
combined with allows programmers to test logical conditions and let R do something based on whether is true or false. The part is optional, so, it is used if we want R to do something else given the defined condition is not met. There are three types of conditional in this function, they are: comes at the beginning and always at the end (if we decide to use one). We can have as many statement as we need. There are a couple of different ways we can formulate the . For example:
# Example 1
a <- 21 # Variable /a/ has a value of 20
if(a < 20){ #If a is less than 20
b <- 0 #then b is 0
} else{ # otherwise
b <- print("👁️You Hit the Bulls Eye!") # good job
}
[1] "<U+0001F441><U+FE0F>You Hit the Bulls Eye!"
# Example 2
c <- 15
d <- if(c <= 16){
print("Can you go up?")
} else{
print("🤩 You are my man!")
}
[1] "Can you go up?"
If-else statements are not functions though. If you want to pass a new value to or that don’t work.
If…else Ladder
The statement can be extended further to accommodate other conditionals by using else-if in between. Remember that the statement still starts with and ends with . We can have one or more expressions in a single statement. The i ladder takes the following general structure
if(test_condition1){
action_1
} else if(test_condition2){
action_2
} else if(test_condition3){
action_3
} else if(test_condition4){
action_4
} else {
action_5
}
Let’s look at an example. One else if expression*
x <- 4 # the value of x is 4
if (x > 4){ # if x is higher than 4
print("Too high dear!") # then, print this
} else if(x < 4){ #if x is less than 4,
print("You are a loser!") # then, print this
} else # otherwise,
print("x is a square number!") #print this
[1] "x is a square number!"
We assigned the value of 4 to x, thus, the outcome was “x is a square number”. Now, let’s check a statement having more than one conditionals.
x <- c("This","is","where","salmons","come","to","breed",".")
if("tuna" %in% x){
print("Oh, yeah!! We are lerning about tuna fish!")
} else if("Alaska" %in% x){
print("Alaska!! My god, it's cold in there!")
} else if("breed" %in% x){
print("Oh, well!! Elon Musk says human population is declining!")
} else if("Ukraine" %in% x){
print("I am writing this code when Ukraine and Russia truf war is all time high!")
} else
print("I am the God!!")
[1] "Oh, well!! Elon Musk says human population is declining!"
They are very useful in exploratory data analyses and even data visualization where we need to put conditionals and calculate series of values and or plots/charts based on the groups.
For-loop
This the most common type of loops in R. It takes an initiator (loop index) point and assigns successive values through a loop. Let’s take an example:
# A function that loops 1 through 10 and prints 1 + (1-1) = 1 all the way to 10 + (10-1) = 19.
for (i in 1:10){
print(i + (i-1))
}
[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
[1] 17
[1] 19
# A function that loops through letters 5 (that is: E) to 15 (i.e., O) and prints them
b <- LETTERS[5:15]
for(i in 1:11){
print(b[i])
}
[1] "E"
[1] "F"
[1] "G"
[1] "H"
[1] "I"
[1] "J"
[1] "K"
[1] "L"
[1] "M"
[1] "N"
[1] "O"
# A Function that puts things in provided sequence, in this case, sequence as noted in the vector 'C'.
C <- c(1,9,3,2,4,5,7,8,6,4,5,3,4,1,0,0,0,5,3)
for(i in seq_along(C)){
print(C[i])
}
[1] 1
[1] 9
[1] 3
[1] 2
[1] 4
[1] 5
[1] 7
[1] 8
[1] 6
[1] 4
[1] 5
[1] 3
[1] 4
[1] 1
[1] 0
[1] 0
[1] 0
[1] 5
[1] 3
# A function that calculates mean of a vector, and if, the vector has a missing value, it prints some information.
mean_func <- function(x){
my_mean <- mean(x)
if(any(is.na(x))){
warning("This variable has missing values!")
return("Fix the missing values")
} else{
return(my_mean)
}
}
mean_func(iris$Sepal.Width)
[1] 3.057333
mean_func(cars$dist)
[1] 42.98
mean_func(airquality$Ozone)
Warning in mean_func(airquality$Ozone): This variable has missing values!
[1] "Fix the missing values"
I know that cars and iris datasets don’t have any missing values, at least, in the noted variables. I, thus, got the mean of those variables. Airquality data set, though, have some missing values. My mean_func function discovered some missing values and populated the warning message and the results that I specified in the function.
It is convention to enclose in a curly braces. However, sometimes we can write them without the braces. Please note, without curly braces are more error prone, and they cannot accommodate compound statements (e.g., requiring to meet more than one conditions) within them. Here’s an example of simple without a pair of curly braces.
sentence <- c("Salmon","returns","to","its","birthplace","to","breed", "and", "die", ".")
# for loop without curly braces
for(words in sentence)
print(words)
[1] "Salmon"
[1] "returns"
[1] "to"
[1] "its"
[1] "birthplace"
[1] "to"
[1] "breed"
[1] "and"
[1] "die"
[1] "."
Nested for-loops
Sometime we may want to embed a within a . We have to be careful while nesting, because having more then 2 or 3 levels can be very hard to read and understand. Most of the times there are ways to avoid writing nested , while sometimes it may be necessary. Example:
a <- matrix(1:15, 5,3) # A matrix of 5 rows and 3 column
# Creating a loop
for(i in seq_len(nrow(a))){ #for all elements in rows, put them in order
for(j in seq_len(ncol(a))){ #for all elements in columns, put them in order
print(a[i,j]) #print the output
}
}
[1] 1
[1] 6
[1] 11
[1] 2
[1] 7
[1] 12
[1] 3
[1] 8
[1] 13
[1] 4
[1] 9
[1] 14
[1] 5
[1] 10
[1] 15
While loop
While is another looping function in R. It first tests the defined condition and when the result is executed, it test the condition again until it meets the set limit. While loops, can potentially result in infinite loops if not written properly.
my_number <- 0 #Starts with the number of zero
while(my_number < 10){ #stop the loop when the value of my_number is less than 10
my_number <- my_number + 1 # and add 1 to the number
print(my_number)#Print the number
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
We can also test multiple conditions using the . Like the one below:
b <- 8 #The assigned value of b
while(b >= 2 && b <= 10){ #setting the conditions and creating a while loop
print(b)#if the value is within the limit print the value
filp_a_coin <- rbinom(1, 1, 0.5) #then flip a fair coin once,
if(filp_a_coin == 1){# if the coin turns head(or 1)
b = b + 2 # add 2 to the value of b
}else{ # otherwise
b = b - 1 #subtract 1 from b
}
}
[1] 8
[1] 10
[1] 9
[1] 8
[1] 10
[1] 9
Here, the I was not sure when the was going to end. We can see that the value goes up and down based on the outcome of the coin flip.
Repeat, Next, & Break
- Repeat is a function that basically initiate an infinite loop. These are not commonly used in statistics. The only way to exit a repeat loop is to call break.
min_value <- 1
max_value <- 500
repeat{ estimates <- computeEstimate()
if(abs(estimates - min_value)< max_value)
{ break
}else{ min_value <- estimates
}
}
I tried the above code but it didn’t converge. The complain was about not finding the function computeEstimate().
Next Loop It is used in any type of looping construct when we want to skip an iteration. I am going to create a that runs for 150 iterations, and pass a next statement to bypass certain cases:
for(i in 1:15){
if(i <= 11){ # I want to skip first 11 iterations
next # go to 12nd iteration
}
print(i-1)# and print i-1
}
[1] 11
[1] 12
[1] 13
[1] 14
Writing functions in R and using them to our analyses
Creating a function that calculates means of all the columns in a data table.
column_wise_mean <- function(m){
number_of_column <- ncol(m)#creating a vector that reads the columns in a data frame
calculated_means <- numeric(number_of_column)#creating an empty vector to store the calculated_means
for(i in 1:number_of_column){ #for 1 through the number of columns
calculated_means[i] <- mean(m[,i]) #calculates mean by columns in m
}
calculated_means #returns the calculated_mean
}
# Checking the function on the iris data set
column_wise_mean(airquality)
[1] NA NA 9.957516 77.882353 6.993464 15.803922
The results shows the mean of the last 4-columns and returns NA for the first two column. A further study of these columns show that there are multiple missing values in them.
Calculating columnwise Standard Deviation after getting rid of NAs
column_wise_sd <- function(n, removeNA = TRUE){
number_of_column <- ncol(n)
calculated_sd <- numeric(number_of_column)
for(i in 1:number_of_column){
calculated_sd[i] <- sd(n[,i], na.rm = removeNA)
}
calculated_sd
}
# Checking the function on the airquality data set
column_wise_sd(airquality)
[1] 32.987885 90.058422 3.523001 9.465270 1.416522 8.864520
Once omitted the Missing Values from the calculations, I was able to calculte standard deviation for all the columns in the airquality dataset.
The “…” Arguemnt in R
The … argument indicates a variable number of arguments that are usually passed on to other functions. It is often used when extending another function and we don’t want to copy the entire argument list of the original function. For example:
plot_line <- function(calculated_mean, calculated_sd, type = "l",...){
plot(calculated_mean, calculated_sd, type = type, ...)
}
#Search Process in R if we accidentally create generic function named as the global function already exists in the stat package, e.g., mean
search()
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
Scoping in R
The concept of scope in any programming language is a code chunk dedicated to a variable so that it can be called and refrenced when needed. There are two basic concepts of scoping in R, i.e., lexical (aka. statistical) and dynamic. R uses lexical scoping. For example if we use natural logarithm (ln) of a number R searches it for the environment where the function was defined and made available on the Global environment. Dynamic scoping on the other hand uses the most recent values assigned to a variable.
# Function within a function
raised_power <- function(x){ #takes the value of x
p_wer <- function(y){ #takes the value of y
y^x #raises y to the power of x
}
p_wer
}
# Creating other functions in relation to the raise_power
square_function <- raised_power(2)
cube_function <- raised_power(3)
quad_function <- raised_power(4)
# Testing my work the base of 2
square_function(2)
[1] 4
cube_function(2)
[1] 8
quad_function(2)
[1] 16
# Testing my work the base of 10
square_function(10)
[1] 100
cube_function(10)
[1] 1000
quad_function(10)
[1] 10000
Checking the Function Environment (Function Disection)
Now, lets check how the above functions work:
# The Square Function
ls(environment(square_function))
[1] "p_wer" "x"
get("x", environment(square_function))
[1] 2
# The Cube Function
ls(environment(cube_function))
[1] "p_wer" "x"
get("x", environment(cube_function))
[1] 3
# The Quad Function
ls(environment(quad_function))
[1] "p_wer" "x"
get("x", environment(quad_function))
[1] 4
The readily available R functions are saved in the global environment, thus, they are easily available in user workspace. Lexical scoping allows R program to check the value of a function in a global environment, while the dynamic scoping allows R to look up the values in the environment from which the function is called. In my case, my current work space, thus, it is sometimes called the calling environment, aka the parent frame. The function I am going create follows the parent frame to calculate the value of the desired input. In the case below, the first value of b, i.e., is global environment while the second value is defined in the calling/local environment.
b <- 41 #the value of b is 41
func <- function(t){ # Creating a function named func which takes an argument t
b <- 21 # Then it assigns b the value 21
b^2 + g(t)#It, then squares b, and adds g of t
}
g <- function(t){# Defining the g function
t * b # it multiplies t with b
}
func(5)# 5 is the value of t
[1] 646
The value of func(5) is 646. How? Here’s how:
- function(t) <- b^2 + g(t) = 21 * 21 + g(t) = 441 + g(t)
- the g or t is = t * b = 5 * 41 = 205
- so, the value of func(5)= 441 + 205 = 646
Likewise, lets hand calculate the value of func (1)
- function(t) <- b^2 + g(t) = 21 * 21 + g(t) = 441 + g(t)
- the g or t is = t * b = 1 * 41 = 41
- so, the value of func(1)= 441 + 41 = 482
Let’s check if that’s the case
func(1)
[1] 482
Exactly.
Optimization
As a data scientists, we all write regular functions that manipulate data or do some calculations. There is one combination of the scoping rules and function that is as useful, and it is called optimization
- Optimization routines in R like optim, nlm, and optimize require us to pass a function whose argument is a vector of parameters (e.g., log-likelihood)
- However, an object function might depend on many other things besides its parameters (like data)
- When writing software which does optimization, it may be desirable to allow the user to hold certain parameter fixed.
Coding Standards for R (Dr. Peng)
- Always use text files/text editor
- Indent the code
- Limit the width of the code (80 columns?)
- Indenting improves readability
- Fixing line length (80 columns) prevents lots of nesting and very long functions
- Suggested: Indents of 4 spaces at minimum; 8 spaces ideal
- Limit the lengths of functions
Date and Times in R
Date and Times are regarded as a separate kind of data in R. R has developed a special representation of dates and times:
- Dates are represented by the **Date** class
- Times are represented by the **POSIXct** or the **POSIXlt** class
- Dates are stored internally as the number of days since 1970-01-01
- Times are stored internally as the number of seconds since 1970-01-01
There are a number of functions that work on dates and times. For example:
- weekdays: give the day of the week
- months: give the month name
- quarters: give the quarter number(“Q1”, “Q2”, “Q3”, “Q4”)
Dates are represented by the Date class and can be coerced from a character string using the as.Date() function. Times can be coerced from a character string using the as.POSIXct or as.POSIXlt functions.
tme <- Sys.time()#Generates the system time
tme
[1] "2022-02-17 21:20:39 CST"
# Using as.POSIXlt
tme_1 <- as.POSIXlt(tme)
tme_1
[1] "2022-02-17 21:20:39 CST"
# Checking the names in tme1
names(unclass(tme_1))
[1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday"
[9] "isdst" "zone" "gmtoff"
# Accessing only the second values
tme_1$sec
[1] 39.03534
# Accessing only the values in hour
tme_1$hour
[1] 21
The object of PoSIXct function does not have these list functions. Thus,when we try to do the above things on the PoSIXct functions we get either error message or some non useful information. For example:
present_time <- Sys.time()
present_time
[1] "2022-02-17 21:20:39 CST"
#Using the unclass function
unclass(present_time)
[1] 1645154439
# Accessing only the second values – present_time$sec (gave the error message) # Accessing only the values in hour – present_time$hour (gave the error message)
strptime Function
this function are transform the dates in different formats, for example from character to integer format. Example:
string_date <- c("May 5, 2021 14:04", "August 01, 2020 11:51")
new_date <- strptime(string_date, "%B %d, %Y %H:%M")
new_date
[1] "2021-05-05 14:04:00 CDT" "2020-08-01 11:51:00 CDT"
class(new_date)
[1] "POSIXlt" "POSIXt"
Operations on Dates and Times: We can use mathematical operations, i.e., add or subtract, or do the comparison, i.e., ==, <= etc. on dates and times.
#Number of days between the two dates above
total_days <- new_date[1] - new_date[2]
total_days
Time difference of 277.0924 days
#Creating new dates
dat1 <- as.Date("2021-05-05")
dat1 <- as.POSIXlt(dat1)
dat2 <- strptime("26 July 2020 5:26:26", "%d %b %Y %H:%M:%S")
# Subtract
dat1 - dat2
Time difference of 282.565 days
It can be helpful for us because they help us keep track of leap year, day light saving, and even the time zones.
a <- as.Date("2020-09-16")
b <- as.Date("2021-05-02")
a-b
Time difference of -228 days
#Calculating Time Difference between These Two Dates
a <- as.POSIXct("2020-09-16 01:00:00")
b <- as.POSIXct("2021-05-02 06:00:00", tz = "GMT")
b-a
Time difference of 228 days
Comments
Post a Comment