R tutorials

TECHNICAL DIFFICULTIES:

The Saleska Lab website has encountered some technical difficulties. We are currently rebuilding content. We apologize for the inconvenience!

Ty Taylor’s R tutorials overview

R opens up new worlds for data analysis. When I think about data manipulation and analysis, I don’t think about what is available in a program, I dream up what I would ideally like to do, and then I translate the dream into R.

These tutorials are unique in that they teach R for beginners, and they walk you all the way up through advanced programming with custom functions. The tutorials are full of detailed explanations and practice examples. And they are structured in a way that accustoms you to using neatly organized R scripts. I have developed and refined these tutorials over three semesters of teaching the graduate level course: Programming for Data Analysis in R at the University of Arizona.

Tutorial instructions

All of the tutorials are in the form of an R script. They are most easily read within an R script editor like RStudio.

For each tutorial there are two scripts (.R files), one with comments+code, and one with comments only (with BLANK in the title). Your job is to fill in the BLANK version. They are designed this way to get people in the practice of using neatly formatted, legible scripts, as opposed to typing code directly into the console.

The scripts progressively walk you through the fundamentals of coding in R. The comments teach you about each subject, and typing in the corresponding example code gives you practice with it.

To work on a tutorial, open both scripts in RStudio as different script tabs (see the first tutorial—R and RStudio Primer—if RStudio is unfamiliar to you). Also open the PDF version of the completed script on a separate screen, or printed out. Your job is to type the code from the completed version into the corresponding place in the BLANK version. Each line in the completed script that does not have a # symbol in front of it is a line of code for you to enter into the BLANK version. Execute each block of code after entering it (some objects created from the code will be called upon later).

I highly recommend actually typing in the code. The idea here is to improve at scripting by actually doing it.

Alternatively, if you’re relatively familiar with a tutorial’s subject already, just reading through the completed version to discover some new tricks or explanations might be useful to you.

Tutorial versions: Versions are labeled by date in sortable format yymmdd, e.g., 150422 for April 22, 2015.

Downloading tutorial files

All file links go to dropbox files and folders. I am not sure if you need a dropbox account to download them. All access is read-only. Please let me know if you have any troubles with access: tytaylor@email.arizona.edu .

Tutorials datasets

All tutorial datasets are available here: R Tutorial datasets.

Dataset 1: glopnet.csv. The primary dataset for the tutorials is this publicly available plant traits dataset.

Dataset 2: LiDAR data. These are a set files containing leaf-area-density estimates by LiDAR along forest transects. They are used for practicing with large matrices and lists, and looping processes over multiple files and folders. The data has been partly altered and simulated. It is not useable for publication.

Dataset 3: glopnet.tnrs.csv. This file contains taxonomic data to merge() into the glopnet data. It is used from tutorial 3 onward.

Dataset 4: simulated_big_data_131121.csv. This is a simulated large dataset of plant inventory and trait data. It is used from tutorial 5 onward to practice with large datasets that require speedy code. This data is not usable for publication.

Dataset 5: birdsdiet.csv. Dataset from brianmcgill.org, used in Class Exercise 2 for practice examining linear model assumptions, making data alterations, and creating statistical summary tables.

Dataset 6: sla.csv. Specific-leaf-area data used for practicing custom functions, looping, and apply functions with a custom bootstrap power analysis in Class Exercise 3 and 4.

Tutorial 1: R and RStudio Primer

Tutorials: R Primer_150428.pdf; R Primer_150428.R; R Primer_150428_BLANK.R

Datasets: glopnet.csv.

–       This tutorial is meant to get you on your feet in R.

–       Part 1: How easy are ANOVAs, regressions, and plots?

–       Part 2: Introducing R Studio and basic R functionality.

–       Part 3: Objects, object classes, text

–       Part 4: Vectors and vector indexing

–       Part 5: Built-in functions, and R Help

–       Part 6: Matrices – creating, indexing, and manipulating

–       Part 7: Data frames – R’s most common data format

–       Part 8: Importing data as data frames

–       Part 9: Getting info from bigger data frames

–       Part 10: Packages

–       Part 11: Basic troubleshooting

–       Drills: Tutorial 1 Drills_150118.R; Tutorial 1 Drills_150118_BLANK.R

Tutorial 2: Indexing—pointing at things

Tutorials: R_Indexing_150315.pdfR_Indexing_150315.R; R_Indexing_150315_BLANK.R

Datasets: glopnet.csvLiDAR data

–       Part 1: Indexing in vectors

–       Part 2: Indexing with logicals and objects

–       Part 3: Indexing in matrices

–       Part 4: Indexing in data frames

–       Part 5: Indexing in lists

–       Part 6: Advanced indexing – techniques, nuances, and nuisances

Tutorial 3: For Loops—iteration of analyses

Tutorials: R_for_loops_150422.pdfR_for_loops_150422.RR_for_loops_150422_BLANK.R

Datasets: glopnet.csv; glopnet.tnrs.csv.

–       Part 1: Basic principles of iterations using for loops

–       Part 2: Simple examples

–       Part 3: Growing vectors; multiple references with ‘i’

–       Part 4: Nested for loops

–       Part 5: Flow control with if, else, else if, and next

Tutorial 4: Custom functions—gateway to dynamic automation

Tutorials: R_Custom_Functions_150422.pdfR_Custom_Functions_150422.R; R_Custom_Functions_150422_BLANK.R;

Datasets: glopnet.csv; glopnet.tnrs.csv.

Other files: wrap_text.R

–       Part 1: Custom functions basic structure

–       Part 2: Default arguments and flow control

–       Part 3: Return()ing objects; custom errors and warnings; source()ing your own library

–       Part 4: Devloping from blank screen to complete function

–       Part 5: Dynamic graph titles that track function options

–       Part 6: Nested custom functions

–       Part 7: ggplot2 in custom functions–its weird behavior and how to overcome it

Tutorial 5: Apply functions—fast iteration

Tutorials: R_apply_functions_160421.pdfR_apply_functions_160421.RR_apply_functions_160421_BLANK.R

Datasets: glopnet.csv; glopnet.tnrs.csvLiDAR datasimulated_big_data_131121.csv

–       Part 1: sapply(), iterate and simplify

–       Part 2: apply(), iterate across rows or columns

–       Part 3: Using apply() to guide functions across analysis templates

–       Part 4: aggregate() and tapply(), iterate across groups by categorical variables

–       Part 6: lapply(), iterate across lists or vectors and return a list 

Class exercises

These are challenging problem sets that can be performed together in class or assigned as homework. Answer sheets are available upon request to: tytaylor@email.arizona.edu

–       Class Exercise 1 – Data prep and basic stats: CE1_data prep_basic stats_150308.R, CE1_stats.pptx

–       Class Exercise 2 – Practicing with for loops and lists with LiDAR data: CE2_lidar data for loops_150305.R

–       Class Exercise 3 – Custom t-test function and a bootstrap power analysis: CE3 Bootstrap Power Analysis_160324.RCE3_sig_t function guide.pdfCE3_boot_power function guide.pdf

Leave a Reply