This tutorial walks you through
the steps of creating a reproducible project with the worcs
package. The learning goals are:
Open RStudio. Load the worcs package, and run the
installation check:
You should see all green checkmarks, optionally some “information” messages. If you see any failed tests - instructions should be printed on how to remedy the issue. Please follow these instructions. If you are in a “worcshop” with a live instructor, ask for help after you’ve tried to remedy the issues.
worcs ProjectIn Rstudio, click
File > New Project > New directory > WORCS Project Template
Type an appropriate name for the remote Repository in its textbox. This name will be used to create a new GitHub repository on your account. For example, you could name it “demo_worcs_project”.
Keep the checkbox for renv checked if you want to use
dependency management (recommended).
For this tutorial - select “none” in the preregistration template
dropdown menu. You can always add a preregistration later using
add_preregistration().
For this tutorial, select the manuscript template “github_document”, which has few dependencies. Optionally, you can choose a different template.
Select a license for your project (we recommend a CC-BY license, which allows free use of the licensed material as long as the creator is credited).
When you click “Create Project”, the new project should open in RStudio (either in a new, or in the current session.
Verify that you see a README.md file, which is the
welcoming page for users of your repository. Edit this template to
explain how users should interact with the project.
prepare_data.RThe data preparation script should turn your source data into an
analysis-ready data.frame (or other data object, but you
will need to specify custom functions for reading and loading the data
in that case).
Two important steps usually occur before data is added to a repository:
You can use your own data. If you don’t have your own data, you can use some demo data:
Below is a minimal prepare_data.R script. Adapt it for
your own data (which will require you to copy the file to your
worcs project directory, and load them into memory).
# Inside prepare_data.R:
library(worcs)
# Example methods of loading a data file
# df <- readxl::read_xlsx("penguins.xlsx", 1)
# df <- foreign::read.spss("penguins.sav", to.data.frame = TRUE)
# df <- read.csv("penguins.csv", stringsAsFactors = FALSE)
# Example data
df <- iris
# Remove a colum containing "potentially identifying information"
df[["Species"]] <- NULL
# Inspect the prepared data
descriptives(df)End the file prepare_data.R with the following command
to save the prepared dataset and publish it on GitHub. If you do not
want to publish your data on GitHub, use closed_data()
instead. This tutorial assumes you use open_data().
To confirm that the project now knows how to load the dataset, remove
df from the environment, then run load_data()
in the console:
Open your manuscript.Rmd file. There, edit an existing
code chunk, or remove them and create a new code chunk. First, load
worcs and the data we just created:
Now, add some mock analyses.
You can insert your own analysis code, or play around with the following functions:
# Descriptive statistics
res_desc <- descriptives(df)
write.csv(res_desc, "res_desc.csv", row.names = FALSE)
# Simple model
mod <- lm(Sepal.Length ~ Sepal.Width, data = df)
res_mod <- summary(mod)
write.csv(res_mod$coefficients, "res_coef.csv", row.names = FALSE)Note that, in this code, we write the results to spreadsheet files. You can also print them in the document, for example using: