文章目录
练习与作业1:dplyr练习
使用 mouse.tibble 变量做统计
- 每个染色体上每种基因类型的数量、平均长度、最大和最小长度,挑出最长和最短的基因
- 去掉含有500以下基因的染色体,按染色体、数量 高 -> 低 进行排序
```{r}
## 代码写这里,并运行;
require(tidyverse)
mouse.tibble <- read_delim("../data/talk04/mouse_genes_biomart_sep2018.txt",
delim="\t",quote="")
work1_1 <- mouse.tibble %>%
select( CHR = `Chromosome/scaffold name`, TYPE = `Transcript type`,
GENE_ID = `Gene stable ID`,
GENE_LEN = `Transcript length (including UTRs and CDS)` ) %>%
arrange(CHR,-GENE_LEN)%>%
group_by(CHR,TYPE)%>%
summarise(gene_num=n_distinct(GENE_ID),gene_mean_len=mean(GENE_LEN),
gene_min_len=min(GENE_LEN),gene_max_len=max(GENE_LEN),
maxgeneID=first(GENE_ID),mingeneID=last(GENE_ID))
work1_1
work1_2 <- mouse.tibble %>%
select( CHR = `Chromosome/scaffold name`, TYPE = `Transcript type`,
GENE_ID = `Gene stable ID`,
GENE_LEN = `Transcript length (including UTRs and CDS)` ) %>%
arrange(CHR,-GENE_LEN)%>%
group_by(CHR)%>%
summarise(gene_total_num=n_distinct(GENE_ID),gene_mean_len=mean(GENE_LEN),
gene_min_len=min(GENE_LEN),gene_max_len=max(GENE_LEN),
maxgeneID=first(GENE_ID),mingeneID=last(GENE_ID)) %>%
filter(gene_total_num>500)%>%
arrange(CHR,-gene_total_num)
work1_2
```
使用 grades 变量做统计
首先,用下面命令生成 grades
变量:
grades <- tibble( "Name" = c("Weihua Chen", "Mm Hu", "John Doe", "Jane Doe",
"Warren Buffet", "Elon Musk", "Jack Ma"),
"Occupation" = c("Teacher", "Student", "Teacher", "Student",
rep( "Entrepreneur", 3 ) ),
"English" = sample( 60:100, 7 ),
"ComputerScience" = sample(80:90, 7),
"Biology" = sample( 50:100, 7),
"Bioinformatics" = sample( 40:90, 7)
);
然后统计:
- 每个人最差的学科和成绩分别是什么?
- 哪个职业的平均成绩最好?
- 每个职业的最佳学科分别是什么(按平均分排序)???
```{r}
## 代码写这里,并运行;
grades <- tibble( "Name" = c("Weihua Chen", "Mm Hu", "John Doe", "Jane Doe",
"Warren Buffet", "Elon Musk", "Jack Ma"),
"Occupation" = c("Teacher", "Student", "Teacher", "Student",
rep( "Entrepreneur", 3 ) ),
"English" = sample( 60:100, 7 ),
"ComputerScience" = sample(80:90, 7),
"Biology" = sample( 50:100, 7),
"Bioinformatics" = sample( 40:90, 7)
);
grades
grades_melted <- grades %>% gather(course,grade,-Occupation,-Name)
grade_worst <- grades_melted %>% arrange(Name,grade)%>%
group_by(Name)%>%
summarise(worstsubject=first(course),worstgrade=first(grade))
grade_worst
occupation_average <- grades_melted %>% group_by(Occupation) %>%
summarise(mean_grade=mean(grade))%>%
arrange(-mean_grade)%>%
summarise(best=first(Occupation))
occupation_average
occupation_best_subject <- grades_melted %>% group_by(Occupation,course) %>%
summarise(subject_average=mean(grade))%>%
arrange(Occupation,-subject_average)%>%
summarise(bestsubject=first(course))
occupation_best_subject
```
使用 starwars
变量做计算
- 计算每个人的 BMI;
- 挑选出 肥胖(BMI >= 30)的人类,并且只显示其
name
,sex
和homeworld
;
```{r}
## 代码写这里,并运行;
stats <-
starwars %>%
select( name, height, mass,sex,homeworld, ends_with("color"),species) %>%
mutate( bmi = mass / ( (height / 100 )^2 ) )
subset(stats,bmi>=30&species=="Human",select=c(name,sex,homeworld))
```
- 挑选出金发碧眼的人类;
- 按BMI将他们分为三组, <18, 18~25, >25,统计每组的人数,并用 barplot 进行展示;注意:展示时三组的按BMI从小到大排序;
- 改变排序方式,按每组人数从小到大排序;
```{r}
## 代码写这里,并运行;
work3 <- stats %>%
filter(hair_color == "blond" & eye_color == "blue"&species=="Human") %>%
arrange(-bmi) %>%
select(name=name,bmi=bmi)
work3_1 <- data.frame(group=c("BMI<18","18~25","BMI>25"),num=c(0,1,1))
work3_1$group <-factor(work3_1$group,levels =c("BMI<18","18~25","BMI>25") )
boxplot(`num`~`group`,data=work3_1,las=2)
work3_1$group <-factor(work3_1$group,levels =c("BMI<18","BMI>25","18~25") )
boxplot(`num`~`group`,data=work3_1,las=2)
```
COMMENTS | NOTHING