在R中使用with()和by() | Trefoil

在R中使用with()和by()

刚刚上手R的同学,写R代码时,很难割舍速度很慢的for循环。with()by()命令,提供了一些情况下更加高效的解决办法。

本文的思路主要来源于Quick-R网站上的Using with( ) and by( )。Quick-R网站上的内容我觉得归类得很好,可以提供许多思路,美中不足的是代码无法直接运行。R的帮助文件里倒是有可运行的代码,但是知识密度实在太大了,需要反复咀嚼才能弄懂,而许多时候我们只需要搞清楚一个函数最基本的用法就够了。这一点真的远远不如MATLAB知识点层层递进的帮助文档。不过对于自由软件来说,真的不能奢求太多咯。

下面是我阅读上述资料后的笔记。

With

R的帮助文件Description里的介绍:

Evaluate an R expression in an environment constructed from data, possibly modifying the original data.

这就是说,我们在一个从data构建出的环境中运行R表达式(这样,在表达式里我们就不需要再注明用到的变量来自于data了)。

注意:

Note that assignments within expr take place in the constructed environment and not in the user’s workspace.

语法:

with(data, expr, …)

Quick-R上提到,With的功能和SAS中的DATA=相似。

提供两个可以运行的例子,其意自见。

例一:

with(warpbreaks, table(wool, tension))

这个例子的输出:(为了让大家熟悉这个数据集,这里提供了head()和tail()的输出)

> head(warpbreaks)
  breaks wool tension
1     26    A       L
2     30    A       L
3     54    A       L
4     25    A       L
5     70    A       L
6     52    A       L
> tail(warpbreaks)
   breaks wool tension
49     17    B       H
50     13    B       H
51     15    B       H
52     15    B       H
53     16    B       H
54     28    B       H
> with(warpbreaks, table(wool, tension))
    tension
wool L M H
   A 9 9 9
   B 9 9 9

例二:

with(data.frame(u = c(5,10,15,20,30,40,60,80,100),
                lot1 = c(118,58,42,35,27,25,21,19,18),
                lot2 = c(69,35,26,21,18,16,13,12,12)),
    list(summary(glm(lot1 ~ log(u), family = Gamma)),
         summary(glm(lot2 ~ log(u), family = Gamma))))

输出这里就略去了。

By

用法:

by(data, INDICES, FUN, ..., simplify = TRUE)

至于By的效果,还是R帮助的Details里面写得清楚:

A data frame is split by row into data frames subsetted by the values of one or more factors, and function FUN is applied to each subset in turn.

这就是说,我们把data这个data frame按照INDICES的factor拆分成若干块小的data frames,在每块小的data frame上运行函数FUN

在后面的两个例子中,由于作为INDICESwarpbreaks[,"tension"]有三个水平,因此拆分出了三块进行后面FUN的分析。

例一:

by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)

输出:

> by(warpbreaks[, 1:2], warpbreaks[,"tension"], summary)
warpbreaks[, "tension"]: L
     breaks      wool 
 Min.   :14.00   A:9  
 1st Qu.:26.00   B:9  
 Median :29.50        
 Mean   :36.39        
 3rd Qu.:49.25        
 Max.   :70.00        
---------------------------------------------------- 
warpbreaks[, "tension"]: M
     breaks      wool 
 Min.   :12.00   A:9  
 1st Qu.:18.25   B:9  
 Median :27.00        
 Mean   :26.39        
 3rd Qu.:33.75        
 Max.   :42.00        
---------------------------------------------------- 
warpbreaks[, "tension"]: H
     breaks      wool 
 Min.   :10.00   A:9  
 1st Qu.:15.25   B:9  
 Median :20.50        
 Mean   :21.67        
 3rd Qu.:25.50        
 Max.   :43.00        

例二:

by(warpbreaks, warpbreaks[,"tension"],
   function(x) lm(breaks ~ wool, data = x))

输出:

> by(warpbreaks, warpbreaks[,"tension"],
+    function(x) lm(breaks ~ wool, data = x))
warpbreaks[, "tension"]: L

Call:
lm(formula = breaks ~ wool, data = x)

Coefficients:
(Intercept)        woolB  
      44.56       -16.33  

---------------------------------------------------- 
warpbreaks[, "tension"]: M

Call:
lm(formula = breaks ~ wool, data = x)

Coefficients:
(Intercept)        woolB  
     24.000        4.778  

---------------------------------------------------- 
warpbreaks[, "tension"]: H

Call:
lm(formula = breaks ~ wool, data = x)

Coefficients:
(Intercept)        woolB  
     24.556       -5.778  

-- EOF --

声明: 本文采用 BY-NC-SA 协议进行授权. 转载请注明转自: 在R中使用with()和by()

comments powered by Disqus