R初步接触

Posted on 2019-09-30 Edited on 2020-07-16

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To download R, please choose your preferred CRAN mirror.

准备工作

安装包

自定义CRAN和Bioconductor的下载镜像

1
2
3

# options函数 设置R运行过程中的一些选项设置
options()$repos 
options()$BioC_mirror

options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/")
# 对应中科大源
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
# 对应清华源
options()$repos 
options()$BioC_mirror

安装bioconductor的包（比以往的source然后bioclite更安全）

1
2
3

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("KEGG.db",ask = F,update = F)

安装cran包

1	install.packages('WGCNA')

CRAN：install.packages()
Biocductor: BiocManager::install()
Github: devools::install_github()
library()
包安装目录

1	.libPaths()

镜像配置

在R的配置文件.Rprofile中写入

options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/")
# 对应中科大源
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
# 对应清华源

重启R，就不用每次都再配置一遍镜像了

1	file.edit('~/.Rprofile')

但是我现在觉得最好不要在.Rprofile里面自己写东西，有可能会遇到类似这样的错误

1 2	Error: Failed to install 'ggradar' from GitHub: (converted from warning) incomplete final line found on 'C:\Users\Lenovo\Documents\.Rprofile'

关联jupyter notebook

这是一件无所谓的事情，玩玩而已

1 2	install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest')) devtools::install_github('IRkernel/IRkernel')

# 只在当前用户下安装
#IRkernel::installspec()
# 或者是在系统下安装
IRkernel::installspec(user = FALSE)

第一份脚本

使用Rprojcet，工作路径为project所在工作目录
创建Rproject，然后创建Rscript.

数据类型

as 族函数实现数据类型之间的转换
as.numeric () 将其他数据类型转换为数值型
as.logical () 将其他数据类型转换为逻辑型
as.charactor () 将其他数据类型转换为字符型

is 族函数，判断，返回值为 TRUE 或 FALSE
is.numeric () 是否数值型数据
is.logical () 是否逻辑型数据
is.charactor () 是否字符型数据

数据结构：向量、数据框、矩阵、列表

数据框约等于“表格”。
向量则是数据框单独拿出的一列，视为一个整体。
一个向量只能有一种数据类型，可以有重复值。

统计函数

sort() #排序
length() #长度
unique() #去重复
table() #重复值统计

practice1 判断数据类型

1
2
3

class("a")
class(TRUE)
class(3)

循环输出变量类型

Lst<-list("a", TRUE, 3, c(4,7,9));
for (index in (1:length(Lst))){
  print(class(Lst[[index]])) 
}

practice2 向量生成

c("a", TRUE, 3, c(4,7,9))
seq(from = 4,to = 30,by = 4)
paste0(rep("sample",times=7),seq(from = 4,to = 30,by = 4))
# time = length(seq(from = 4,to = 30,by = 4))

practice3 向量取子集

# 1.将基因名"ACTR3B","ANLN","BAG1","BCL2","BIRC5","RAB","ABCT","ANF","BAD","BCF","BARC7","BALV"组成一个向量,赋值给x
x=c("ACTR3B","ANLN","BAG1","BCL2","BIRC5","RAB","ABCT","ANF","BAD","BCF","BARC7","BALV")
# 2.用函数计算向量长度
length(x)
# 3.用向量取子集的方法,选出第1,3,5,7,9,11个基因名。
x[seq(1, 11, 2)]
# 4.用向量取子集的方法,选出除倒数第2个以外所有的基因名。
x[-(length(x)-1)]
# 5.用向量取子集的方法,选出出在c("ANLN", "BCL2","TP53")中有的基因名。
# 提示：%in%
# (x %in% c("ANLN", "BCL2","TP53")) | (x %in% c("BIRC5"))
x[x %in% c("ANLN", "BCL2","TP53")]
# 6.修改第6个基因名为"a"并查看是否成功
x[6]="a" 
x
#7.生成100个随机数: rnorm(n=100,mean=0,sd=18)
rnum=rnorm(n=100, mean=0, sd=18)
#将小于-2的统一改为-2,将大于2的统一改为2
rnum[rnum<(-2)]=-2
rnum[rnum>2]=2
rnum

Q:’<-‘与’=’有什么区别

The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.

In a function call you can’t assign an object with = because = means assigning arguments there.

Q:如何多条件选取
加‘|’ ：

1	(x %in% c("ANLN", "BCL2","TP53")) \| (x %in% c("BIRC5"))

practice4 数据框处理

# 1.新建这个数据框
# （提示：后面的三列是rnorm（）
df <- data.frame(gene  = paste0("gene",1:15),
                 s1 = rnorm(15), s2 = rnorm(15), s3 = rnorm(15))
# 2.提取行名和列名
rownames(df)
colnames(df)
# 3.提取第二行
df[2,]
# 4.提取第3行第4列
df[3,4]
# 5.求第二列最大值和最小值
range(df[,2])
# 6.按照列名提取s1,s3列
df[,c("s1","s3")]
# 7.修改后三列列名为“sample1”，“sample2”，“sample3”
colnames(df)[c(2,3,4)]<-paste0("sample", 1:3)
colnames(df)[(ncol(df)-2):ncol(df)]
# 8.筛选sample3列大于0的行
df[df$sample3>0,]

数据框重排

ctrl+l 清空控制台
点扫把或 rm(list = ls())清空变量

数据读取

read.csv () –通常读取CSV格式
read.table() –通常用于读取txt格式

sep：分隔符：逗号，\t，空格
header：表头（是否设置第一行为列名）
row.names ：第一列作为行名

R特有的数据格式：Rdata

是R语言特有的数据存储格式，无法用其他软件打开；保存的是变量，不是表格文件；save（）保存—load（）加载

practice5 数据读取

#1.读取complete_set.txt（已保存在工作目录）
ex <- read.table('complete_set.txt', header = T)
# 2.查看有多少行、多少列
dim(ex)
# 3.获取行名和列名
colnames(ex)
rownames(ex)
# 4.导出为csv格式,再读取它
write.csv(ex,file = "ex.csv", row.names = F)
ex1 <- read.csv('ex.csv')
# 5.保存为Rdata，再加载它
save(ex1,file = "ex1.Rdata")
rm(ex1)
load(file ='ex1.Rdata')

高阶数据读取指南