R 語言的表格與流程控制

2018-03-29

data.frame

暖身練習

year1 <- 87:91
# 社會服務業自民國87至民國91年的年度用電量（度）
power1 <- c(6097059332, 6425887925, 6982579022, 7323992602.53436, 7954239517) 
# 製造業自民國87至民國91年的年度用電量（度）
power2 <- c(59090445718, 61981666330, 67378329131, 66127460204.6482, 69696372914.6949)

請選出年度(year1)中，社會服務業用電量超過7e9 的年份。
接著請計算「社會服務業從民國87年到91年的平均用電量」。
請計算「社會服務業從民國87年到91年用電量的標準差」。
請計算出「社會服務業從民國87年到91年用電量的標準分數」。
請同學算出「製造業自民國87年至民國91年用電量的平均數、標準差和標準分數」。
最後請根據年度，比較同年度中社會服務業用電量以及製造業用電量的十分之一，並列出前者高於後者的年份。

參考答案

請見上課示範或參考R語言翻轉教室的筆記: 01-RBasic-02-Data-Structure-Vectors

資料與資料間的連結

year1[1] v.s. power1[1] v.s. power2[1]
year1[2] v.s. power1[2] v.s. power2[2]
year1[3] v.s. power1[3] v.s. power2[3]
…

表格資料

df <- data.frame(
  year1 = 87:91,
  power1 = c(6097059332, 6425887925, 6982579022, 7323992602.53436, 7954239517),
  power2 = c(59090445718, 61981666330, 67378329131, 66127460204.6482, 69696372914.6949)
)

year1	power1	power2
87	6097059332	59090445718
88	6425887925	61981666330
89	6982579022	67378329131
90	7323992603	66127460205
91	7954239517	69696372915

data.frame

欄位(column)說明資料的屬性
列(row)代表一個物件或事件
- 因此，同一列的資料是有聯繫的
結構化的資料

Example: iris

# 載入內建資料
data(iris)
# 列出前6個列(row)
head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Example: cars

# 載入內建資料
data(cars)
# 列出前6個列(row)
head(cars)

##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

Example: CO2

# 載入內建資料
data(CO2)
# 列出前6個列(row)
head(CO2)

##   Plant   Type  Treatment conc uptake
## 1   Qn1 Quebec nonchilled   95   16.0
## 2   Qn1 Quebec nonchilled  175   30.4
## 3   Qn1 Quebec nonchilled  250   34.8
## 4   Qn1 Quebec nonchilled  350   37.2
## 5   Qn1 Quebec nonchilled  500   35.3
## 6   Qn1 Quebec nonchilled  675   39.2

# ?CO2

data.frame 的 CRUD

CREATE

data(iris)
head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

CREATE

data.frame(a = 1:3, b = 4:6, c = c("a", "b", "c"))

##   a b c
## 1 1 4 a
## 2 2 5 b
## 3 3 6 c

CREATE

iris <- read.table(
  "https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv", 
  header = TRUE, quote = '"', sep = ",")
head(iris)

##   sepal.length sepal.width petal.length petal.width variety
## 1          5.1         3.5          1.4         0.2  Setosa
## 2          4.9         3.0          1.4         0.2  Setosa
## 3          4.7         3.2          1.3         0.2  Setosa
## 4          4.6         3.1          1.5         0.2  Setosa
## 5          5.0         3.6          1.4         0.2  Setosa
## 6          5.4         3.9          1.7         0.4  Setosa

把資料匯入R語言

excel library(readxl)
google spreadsheet: library(googlesheets)
其他分析工具: library(foreign)

預設輸出結果都是data.frame

READ

同 List

相同的型態
- [
從容器中取出物件
- [[
- $

READ Example

data(iris)
iris.head <- head(iris)
iris.head[1]

##   Sepal.Length
## 1          5.1
## 2          4.9
## 3          4.7
## 4          4.6
## 5          5.0
## 6          5.4

READ Example

iris.head[[1]]

## [1] 5.1 4.9 4.7 4.6 5.0 5.4

READ Example

iris.head$Sepal.Length

## [1] 5.1 4.9 4.7 4.6 5.0 5.4

類似矩陣

[ + 兩個參數(列、欄)

iris.head[1,1]

## [1] 5.1

# [row,col]
iris.head[1,2]

## [1] 3.5

iris.head[2,1]

## [1] 4.9

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa

小挑戰

如果[函數的簽名式是

function(x, i, j)

x參數對應到[左邊的物件
以函數參數的觀點，請問以下兩個expression的差別是？

iris.head[c(1,2)]
iris.head[1,2]

具體的問，｀x、i, 與j`的參數會是什麼？
參數之間，是用,分隔

UPDATE

iris.head$NEW <- 1:6
iris.head[["NEW2"]] <- 7:12
iris.head

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species NEW NEW2
## 1          5.1         3.5          1.4         0.2  setosa   1    7
## 2          4.9         3.0          1.4         0.2  setosa   2    8
## 3          4.7         3.2          1.3         0.2  setosa   3    9
## 4          4.6         3.1          1.5         0.2  setosa   4   10
## 5          5.0         3.6          1.4         0.2  setosa   5   11
## 6          5.4         3.9          1.7         0.4  setosa   6   12

UPDATE: READ + <-

DELETE

<- + READ
<- + NULL

iris.head$NEWS2 <- NULL
iris.head <- iris.head[-6]
iris.head

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species NEW2
## 1          5.1         3.5          1.4         0.2  setosa    7
## 2          4.9         3.0          1.4         0.2  setosa    8
## 3          4.7         3.2          1.3         0.2  setosa    9
## 4          4.6         3.1          1.5         0.2  setosa   10
## 5          5.0         3.6          1.4         0.2  setosa   11
## 6          5.4         3.9          1.7         0.4  setosa   12

data.frame v.s. list

data.frame 是一種 list，所以許多CRUD的操作都很接近

class(iris.head)

## [1] "data.frame"

class(iris.head) <- NULL
class(iris.head)

## [1] "list"

iris.head

## $Sepal.Length
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
## 
## $Sepal.Width
## [1] 3.5 3.0 3.2 3.1 3.6 3.9
## 
## $Petal.Length
## [1] 1.4 1.4 1.3 1.5 1.4 1.7
## 
## $Petal.Width
## [1] 0.2 0.2 0.2 0.2 0.2 0.4
## 
## $Species
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
## 
## $NEW2
## [1]  7  8  9 10 11 12
## 
## attr(,"row.names")
## [1] 1 2 3 4 5 6

data.frame v.s. list

data.frame 是list的擴充，所以具備list之外的性質
- data.frame的列與列之間有關聯
list: 非結構化資料
data.frame: 結構化資料

隨堂練習

df <- data.frame(
  year1 = 87:91,
  # 社會服務業自民國87至民國91年的年度用電量（度）
  power1 = c(6097059332, 6425887925, 6982579022, 7323992602.53436, 7954239517),
  # 製造業自民國87至民國91年的年度用電量（度）
  power2 = c(59090445718, 61981666330, 67378329131, 66127460204.6482, 69696372914.6949)
)

請選出年度(year1)中，社會服務業用電量超過7e9 的年份。
接著請計算「社會服務業從民國87年到91年的平均用電量」。
請計算「社會服務業從民國87年到91年用電量的標準差」。
請計算出「社會服務業從民國87年到91年用電量的標準分數」。
請同學算出「製造業自民國87年至民國91年用電量的平均數、標準差和標準分數」。
最後請根據年度，比較同年度中社會服務業用電量以及製造業用電量的十分之一，並列出前者高於後者的年份。

Expression的回顧

Expression的種類

基本型

R Object
- 1、"1"
- Variable: a
Function Call
- 1 + 1
- install.packages('sos')

Expression的種類

Nested Expression

Function: function(a, b) expr
Assignment: <variable name> <- <expression>, ex: a <- 1、a <- a + 1、a <- mean(a)
Control Flow
- if (cond) expr
- if (cond) cons.expr else alt.expr
- for(var in seq) expr

問題

根據R語法的定義，例如Control Flow: if (cond) expr
- 如果我想要做很多個動作(很多個expression)，但是定義只允許放一個expression，怎麼辦？

大括號 `{`與expression的合併

expression

每一行都是一個expression

a <- 1
a

## [1] 1

a + 1

## [1] 2

在R 中可以合併expression

大括號之間的expression會被合併成為單一的expression
- 被合併的expression稱為sub expression
所有的sub expression都會依序執行，但是R 只會回應最後一個sub expression，

{
  a <- 1
  a
  a + 1
}

## [1] 2

巢狀函數

一行一個動作

請同學匯入iris資料集之後，輸出Species欄位的長度(length)

data(iris)
x <- iris$Species
length(x)

## [1] 150

一行多個動作

data(iris)
#x <- iris$Species
#length(x)
length(iris$Species)

## [1] 150

省略了變數x

一行一個動作

year1 <- 87:91
# 社會服務業自民國87至民國91年的年度用電量（度）
power1 <- c(6097059332, 6425887925, 6982579022, 7323992602.53436, 7954239517) 
# 製造業自民國87至民國91年的年度用電量（度）
power2 <- c(59090445718, 61981666330, 67378329131, 66127460204.6482, 69696372914.6949)
# 最後請根據年度，比較同年度中社會服務業用電量以及製造業用電量的十分之一，並列出前者高於後者的年份。
power2.0.1 <- power2 * 0.1
power1.v.s.power2.0.1 <- power1 > power2.0.1
year1[power1.v.s.power2.0.1]

## [1] 87 88 89 90 91

一行多個動作

year1 <- 87:91
# 社會服務業自民國87至民國91年的年度用電量（度）
power1 <- c(6097059332, 6425887925, 6982579022, 7323992602.53436, 7954239517) 
# 製造業自民國87至民國91年的年度用電量（度）
power2 <- c(59090445718, 61981666330, 67378329131, 66127460204.6482, 69696372914.6949)
# 最後請根據年度，比較同年度中社會服務業用電量以及製造業用電量的十分之一，並列出前者高於後者的年份。
power2.0.1 <- power2 * 0.1
year1[power1 > power2.0.1]

## [1] 87 88 89 90 91

一行多個動作

year1 <- 87:91
# 社會服務業自民國87至民國91年的年度用電量（度）
power1 <- c(6097059332, 6425887925, 6982579022, 7323992602.53436, 7954239517) 
# 製造業自民國87至民國91年的年度用電量（度）
power2 <- c(59090445718, 61981666330, 67378329131, 66127460204.6482, 69696372914.6949)
# 最後請根據年度，比較同年度中社會服務業用電量以及製造業用電量的十分之一，並列出前者高於後者的年份。
year1[power1 > power2 * 0.1]

## [1] 87 88 89 90 91

巢狀函數

在R 中所有動作都是函數, ex: mean、[、>
將一個函數的輸出匯出到令一個函數的輸入，稱之為巢狀函數

year1[power1 > power2 * 0.1]

## [1] 87 88 89 90 91

# 在R中，以`.`開頭的變數，預設會被隱藏(Rstudio的Environment看不到)
.i <- power1 > power2 * 0.1
year1[.i]

## [1] 87 88 89 90 91

在上面的expressions中，power1 > power2 * 0.1的結果(輸出)是[的輸入

巢狀函數

在R 語言中非常常見

args(utils::install.packages)

## function (pkgs, lib, repos = getOption("repos"), contriburl = contrib.url(repos, 
##     type), method, available = NULL, destdir = NULL, dependencies = NA, 
##     type = getOption("pkgType"), configure.args = getOption("configure.args"), 
##     configure.vars = getOption("configure.vars"), clean = FALSE, 
##     Ncpus = getOption("Ncpus", 1L), verbose = getOption("verbose"), 
##     libs_only = FALSE, INSTALL_opts, quiet = FALSE, keep_outputs = FALSE, 
##     ...) 
## NULL

在安裝套件的時候，許多「預設」參數的值是巢狀函數（與getOption函數有關）
資料處理本質牽涉到大量的連續動作–>巢狀函數

解析R 的Expression，掌握自學能力

範例：R for Beginners Section 4.5 A practical example

x <- rnorm(10)
y <- rnorm(10)
plot(x, y)
plot(
  x, y, xlab="Ten random values", ylab="Ten other values", 
  xlim=c(-2, 2), ylim=c(-2, 2), pch=22, col="red", bg="yellow", 
  bty="l", tcl=0.4,
  main="How to customize a plot with R", las=1, cex=1.5)

流程控制

大絕招

?Control

學會流程控制以後
- 在R 中，請以這個頁面為準
- 其他程式語言，請找到類似的「官方」說明文件解釋流程控制的語法
學習一個程式語言 =
- 學習語言的資料結構
- 學習語言的流程控制
- (*)學習語言的其他特性（ex: 物件導向）

`if`

library(diagram)
# creates an empty plot
openplotmat()
# create the coordinates
pos <- coordinates(c(1,3))
# Set the shape size
rx <- 0.15
ry <- 0.15
# arrow of `TRUE`/`FALSE`
ar <- straightarrow(from = pos[1,], to = pos[2,])
text(ar[1] - rx/2, ar[2] + ry, "TRUE", cex = 1.5)
ar <- straightarrow(from = pos[1,], to = pos[4,])
text(ar[1] + rx/2, ar[2] + ry, "FALSE", cex = 1.5)
# cells
textdiamond(mid = pos[1,], radx = rx, rady = ry, lab = "cond", cex = 2)
textrect(mid = pos[2,], radx = rx, rady = ry, lab = "cons.expr", cex = 2)
textrect(mid = pos[4,], radx = rx, rady = ry, lab = "alt.expr", cex = 2)

`if`

if (cond) cons.expr else alt.expr

範例

用同樣的expression將ceiba提供的學號轉換成帳號

id <- "ntnu_m1234567"
# gsub 可以替換字串內容，細節請參考`?gsub`
if (nchar(id) == 9) paste("ntu", id, sep = "") else gsub("_", "", id)

## [1] "ntnum1234567"

id <- "d01234567"
if (nchar(id) == 9) paste("ntu", id, sep = "") else gsub("_", "", id)

## [1] "ntud01234567"

`for`

for(var in seq) expr

# `print`會將變數得值印到Console
for(i in 1:10) print(i)

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

隨堂練習

印出1到100之間的奇數
印出1到100之間，非2的倍數也非3的倍數(for + if)

自訂函數

function(
) expr
- 函數會回傳expr的結果

f <- function(a) print(a)
a <- 1
f(2)

## [1] 2

## [1] 1

自訂函數

function() expr

f <- function(b) print(a)
a <- 1
f(2)

## [1] 1

## Error in eval(expr, envir, enclos): 找不到物件 'b'

範例

判斷一個整數是不是質數

is.prime <- function(p) {
  stopifnot(p > 1)
  r <- TRUE
  test.i.upper <- floor(sqrt(p))
  if (test.i.upper >= 2) for(i in seq(2, floor(sqrt(p)), by = 1)) {
    if (p %% i == 0) r <- FALSE
  }
  r
}

挑戰

印出2到100之間的質數

data.frame

暖身練習

參考答案

資料與資料間的連結

表格資料

data.frame

Example: iris

Example: cars

Example: CO2

data.frame 的 CRUD

CREATE

CREATE

CREATE

把資料匯入R語言

READ

同 List

READ Example

READ Example

READ Example

類似矩陣

小挑戰

UPDATE

DELETE

data.frame v.s. list

data.frame v.s. list

隨堂練習

Expression的回顧

Expression的種類

基本型

Expression的種類

Nested Expression

問題

大括號 {與expression的合併

expression

在R 中可以合併expression

巢狀函數

一行一個動作

一行多個動作

一行一個動作

一行多個動作

一行多個動作

巢狀函數

巢狀函數

解析R 的Expression，掌握自學能力

範例：R for Beginners Section 4.5 A practical example

流程控制

大絕招

if

if

範例

for

隨堂練習

自訂函數

自訂函數

範例

挑戰

大括號 `{`與expression的合併

`if`

`if`

`for`