ยง2024-12-04

In R, data structures are essential for organizing, storing, and manipulating data. R provides a variety of data structures that cater to different needs, such as vectors, matrices, arrays, data frames, and lists. Below is an explanation of these key data structures in R:

  1. Vector
    numeric_vector <- c(1, 2, 3, 4)
    char_vector <- c("apple", "banana", "cherry")
# Creation: You can create a vector using the c() function 
vec <- c(10, 20, 30, 40, 50)
#  index  1,  2,  3,  4,  5

# 1. Numeric Indexing:
vec[3]  # Access the third element (30)

# 2. Multiple Indices:
vec[c(1, 3, 5)]  # Access the 1st, 3rd, and 5th elements

# 3. Negative Indexing:
# In R, when you use negative indexing with a vector, 
# it removes the element(s) at the specified index. 
vec[-2]  # Removes the second element (20)

# 4. Logical Indexing
# You can access vector elements based on a logical condition. 

vec[vec > 25]  # Selects elements greater than 25 (30, 40, 50)
# [1] 30 40 50

# 5. Named Indices
# If the vector has named elements (e.g., after assigning names to 
# the vector elements), you can access elements by name.
vec <- c(a = 10, b = 20, c = 30)
vec["b"]  # Access element named "b"

# 6. Using which()
# You can use the which() function to return the indices of 
# elements that meet a certain condition.
vec <- c(10, 20, 30, 40, 50)
which(vec > 30)  # Returns the indices where elements are greater than 30 (4 and 5)

# 7. Accessing Slices (Range of Elements)
# To access a subset of a vector, you can specify a range of indices:
vec[2:4]  # Access elements from index 2 to 4 (20, 30, 40)
  1. Matrix
    m <- matrix(1:6, nrow = 2, ncol = 3)
    # Creates a 2x3 matrix with values from 1 to 6
# Create a matrix
m <- matrix(1:9, nrow=3, ncol=3)
# [1] 1 4 7
# [1] 2 5 8
# [1] 3 6 9

# 1. To access a specific element in a matrix, use the format matrix[row, column.
m[2, 3] # [1] 8

# 2. Accessing a specific row or column
m[1, ]
m[ ,1]

# 4. Accessing a submatrix (subset of rows and columns)
# Access the submatrix from rows 1 to 2 and columns 2 to 3
m[1:2, 2:3]
#     [,1] [,2]
# [1,]    4    7
# [2,]    5    8

# 5. Using logical indexing:
# You can also use logical conditions to subset a matrix.
# Access elements greater than 5
m[m > 5]
# [1] 6 7 8 9

# 6. Changing a specific element:
# To change a specific element, simply assign a new value using the sam matrix[row, column] 
m[1, 1] <- 10
m[]
#      [,1] [,2] [,3]
# [1,]   10    4    7
# [2,]    2    5    8
# [3,]    3    6    9
  1. Array
# Create a 3x3x3 array
my_array <- array(1:27, dim = c(3, 3, 3))
my_array[]
# , , 1
# 
#      [,1] [,2] [,3]
# [1,]    1    4    7
# [2,]    2    5    8
# [3,]    3    6    9

# , , 2

#      [,1] [,2] [,3]
# [1,]   10   13   16
# [2,]   11   14   17
# [3,]   12   15   18

# , , 3

#      [,1] [,2] [,3]
# [1,]   19   22   25
# [2,]   20   23   26
# [3,]   21   24   27

# my_array[row, column, layer]
my_array [1, 1, 1]
# [1] 1
my_array [1, , ]
#      [,1] [,2] [,3]
# [1,]    1   10   19
# [2,]    4   13   22
# [3,]    7   16   25

# Access the first two rows, all columns, and the first layer
my_array[1:2, , 1]


#      [,1] [,2] [,3]
# [1,]    1    4    7
# [2,]    2    5    8

my_array <- array(1:27, dim = c(3, 3, 3))

Assign dimension names

dimnames(my_array) <- list(c("A", "B", "C"), c("X", "Y", "Z"), c("L1", "L2", "L3")) my_array

, , L1

X Y Z

A 1 4 7

B 2 5 8

C 3 6 9

, , L2

X Y Z

A 10 13 16

B 11 14 17

C 12 15 18

, , L3

X Y Z

A 19 22 25

B 20 23 26

C 21 24 27

Access the element in the second row, first column, and third layer by name

my_array["B", "X", "L3"] [1] 20

Accessing with apply() Function

Sum elements along the first dimension (rows)

apply(my_array, 1, sum)

A B C

117 126 135

Sum elements along the second dimension (columns)

apply(my_array, 2, sum)

X Y Z

99 126 153

Get the multi-dimensional indices for the 10th element

arrayInd(10, dim(my_array))

[,1] [,2] [,3]

[1,] 1 1 2


my_array <- array(1:27, dim = c(3, 3, 3))

# Assign dimension names
dimnames(my_array) <- list(c("A", "B", "C"), c("X", "Y", "Z"), c("L1", "L2", "L3"))
my_array

# Access the element in the second row, first column, and third layer by name
my_array["B", "X", "L3"]

4. List

    Description: A list is a versatile data structure that can store elements of different types and even different data structures (vectors, matrices, other lists, etc.).
    Creation: Use the list() function to create a list.

    my_list <- list(name = "John", age = 25, scores = c(85, 90, 88))

    Characteristics: Lists are useful for holding mixed data types. You can access elements using the list index or names if provided.

5. Data Frame

    Description: A data frame is a two-dimensional table-like structure that is similar to a matrix but can hold different types of data (columns can be numeric, character, logical, etc.).
    Creation: You can create a data frame using the data.frame() function.

    df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30), Score = c(85, 90))

    Characteristics: Data frames are the most commonly used data structure in R for statistical analysis and data manipulation. They are very similar to tables or spreadsheets in other software like Excel.

6. Factor

    Description: A factor is a data structure used for categorical data, such as levels (e.g., gender: male, female, or grades: A, B, C). It stores both the values and the corresponding levels.
    Creation: You can create a factor using the factor() function.

    gender <- factor(c("Male", "Female", "Female", "Male"))

    Characteristics: Factors are particularly useful for handling categorical variables in statistical analysis, as they maintain the levels and ensure data consistency.

7. Matrix vs. Data Frame

    Matrix: All elements are of the same data type (numeric, character, etc.).
    Data Frame: Columns can contain different types of data (numeric, character, logical, etc.).

Summary Table
Data Structure	Dimensions	Data Types	Example Usage
Vector	1D	Homogeneous	Storing numeric or character data
Matrix	2D	Homogeneous	Mathematical operations (e.g., matrix multiplication)
Array	Multi-dimensional	Homogeneous	Representing multi-dimensional data
List	1D	Heterogeneous	Storing mixed data types
Data Frame	2D	Heterogeneous	Storing tabular data (e.g., datasets)
Factor	1D	Categorical	Storing categorical data like gender or grades
Key Operations in R

    Indexing: Accessing elements in vectors, matrices, data frames, and lists using indexing (e.g., vector[1], matrix[1,2], df$column_name).
    Manipulation: You can modify or manipulate these structures using functions like cbind(), rbind(), subset(), append(), etc.

In conclusion, R provides a diverse set of data structures, each suited for specific tasks and types of data, making it flexible and powerful for data analysis.


ChatGPT can make mistakes. Check important info.