[R语言]Talk05 练习与作业

发表于 2022-01-05  192 次阅读


文章目录

练习与作业1:dplyr练习


使用 mouse.tibble 变量做统计

  • 每个染色体上每种基因类型的数量、平均长度、最大和最小长度,挑出最长和最短的基因
  • 去掉含有500以下基因的染色体,按染色体、数量 高 -> 低 进行排序
```{r}
## 代码写这里,并运行;
require(tidyverse)
mouse.tibble <- read_delim("../data/talk04/mouse_genes_biomart_sep2018.txt",
                           delim="\t",quote="")
work1_1 <- mouse.tibble %>% 
  select( CHR = `Chromosome/scaffold name`, TYPE = `Transcript type`,
          GENE_ID = `Gene stable ID`,
          GENE_LEN = `Transcript length (including UTRs and CDS)` ) %>%
  arrange(CHR,-GENE_LEN)%>%
  group_by(CHR,TYPE)%>%
  summarise(gene_num=n_distinct(GENE_ID),gene_mean_len=mean(GENE_LEN),
            gene_min_len=min(GENE_LEN),gene_max_len=max(GENE_LEN),
            maxgeneID=first(GENE_ID),mingeneID=last(GENE_ID)) 
work1_1

work1_2 <- mouse.tibble %>% 
  select( CHR = `Chromosome/scaffold name`, TYPE = `Transcript type`,
          GENE_ID = `Gene stable ID`,
          GENE_LEN = `Transcript length (including UTRs and CDS)` ) %>%
  arrange(CHR,-GENE_LEN)%>%
  group_by(CHR)%>%
  summarise(gene_total_num=n_distinct(GENE_ID),gene_mean_len=mean(GENE_LEN),
            gene_min_len=min(GENE_LEN),gene_max_len=max(GENE_LEN),
            maxgeneID=first(GENE_ID),mingeneID=last(GENE_ID)) %>%
  filter(gene_total_num>500)%>%
  arrange(CHR,-gene_total_num)
work1_2
```

使用 grades 变量做统计

首先,用下面命令生成 grades变量:

grades <- tibble( "Name" = c("Weihua Chen", "Mm Hu", "John Doe", "Jane Doe",
                             "Warren Buffet", "Elon Musk", "Jack Ma"),
                  "Occupation" = c("Teacher", "Student", "Teacher", "Student", 
                                   rep( "Entrepreneur", 3 ) ),
                  "English" = sample( 60:100, 7 ),
                  "ComputerScience" = sample(80:90, 7),
                  "Biology" = sample( 50:100, 7),
                  "Bioinformatics" = sample( 40:90, 7)
                  );

然后统计:

  1. 每个人最差的学科和成绩分别是什么?
  2. 哪个职业的平均成绩最好?
  3. 每个职业的最佳学科分别是什么(按平均分排序)???
```{r}
## 代码写这里,并运行;
grades <- tibble( "Name" = c("Weihua Chen", "Mm Hu", "John Doe", "Jane Doe",
                             "Warren Buffet", "Elon Musk", "Jack Ma"),
                  "Occupation" = c("Teacher", "Student", "Teacher", "Student", 
                                   rep( "Entrepreneur", 3 ) ),
                  "English" = sample( 60:100, 7 ),
                  "ComputerScience" = sample(80:90, 7),
                  "Biology" = sample( 50:100, 7),
                  "Bioinformatics" = sample( 40:90, 7)
                  );
grades
grades_melted <- grades %>% gather(course,grade,-Occupation,-Name)
grade_worst <- grades_melted %>% arrange(Name,grade)%>%
  group_by(Name)%>%
  summarise(worstsubject=first(course),worstgrade=first(grade))
grade_worst
occupation_average <- grades_melted %>% group_by(Occupation) %>%
  summarise(mean_grade=mean(grade))%>%
  arrange(-mean_grade)%>%
  summarise(best=first(Occupation))
occupation_average
occupation_best_subject <- grades_melted %>% group_by(Occupation,course) %>%
  summarise(subject_average=mean(grade))%>%
  arrange(Occupation,-subject_average)%>%
  summarise(bestsubject=first(course))
occupation_best_subject
```

使用 starwars 变量做计算

  1. 计算每个人的 BMI;
  2. 挑选出 肥胖(BMI >= 30)的人类,并且只显示其 name, sexhomeworld
```{r}
## 代码写这里,并运行;
stats <-
starwars %>%
select( name, height, mass,sex,homeworld, ends_with("color"),species) %>%
mutate( bmi = mass / ( (height / 100 )^2 ) )
subset(stats,bmi>=30&species=="Human",select=c(name,sex,homeworld))
```
  1. 挑选出金发碧眼的人类;
  2. 按BMI将他们分为三组, <18, 18~25, >25,统计每组的人数,并用 barplot 进行展示;注意:展示时三组的按BMI从小到大排序;
  3. 改变排序方式,按每组人数从小到大排序;
```{r}
## 代码写这里,并运行;
work3 <- stats %>%
  filter(hair_color == "blond" & eye_color == "blue"&species=="Human") %>%
  arrange(-bmi) %>%
  select(name=name,bmi=bmi)
work3_1 <- data.frame(group=c("BMI<18","18~25","BMI>25"),num=c(0,1,1))
work3_1$group <-factor(work3_1$group,levels =c("BMI<18","18~25","BMI>25") )
boxplot(`num`~`group`,data=work3_1,las=2)
work3_1$group <-factor(work3_1$group,levels =c("BMI<18","BMI>25","18~25") )
boxplot(`num`~`group`,data=work3_1,las=2)
```

本站文章基于国际协议BY-NC-SA 4.0协议共享;
如未特殊说明,本站文章皆为原创文章,请规范转载。

0