--- title: "Creating a New worcs Project" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Creating a New worcs Project} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This tutorial walks you through the steps of creating a reproducible project with the `worcs` package. The learning goals are: - [ ] Installing WORCS using the [setup tutorial](https://cjvanlissa.github.io/worcs/articles/setup.html) (to be completed before starting the tutorial); - [ ] Checking the installation; - [ ] Creating a new WORCS project; - [ ] Preparing a dataset in `prepare_data.R`; - [ ] Adding the dataset to the repository with `open_data()`; - [ ] Adding some basic analyses - [ ] Registering endpoints for a reproducibility check; - [ ] Reproducing the project # Checking the Installation Open RStudio. Load the `worcs` package, and run the installation check: ```{r eval = FALSE} library(worcs) check_worcs_installation() ``` You should see all green checkmarks, optionally some "information" messages. If you see any failed tests - instructions should be printed on how to remedy the issue. Please follow these instructions. If you are in a "worcshop" with a live instructor, ask for help after you've tried to remedy the issues. # Creating a New `worcs` Project In Rstudio, click `File > New Project > New directory > WORCS Project Template` Type an appropriate name for the remote Repository in its textbox. This name will be used to create a new GitHub repository on your account. For example, you could name it "demo_worcs_project". Keep the checkbox for `renv` checked if you want to use dependency management (recommended). For this tutorial - select "none" in the preregistration template dropdown menu. You can always add a preregistration later using `add_preregistration()`. For this tutorial, select the manuscript template "github_document", which has few dependencies. Optionally, you can choose a different template. Select a license for your project (we recommend a CC-BY license, which allows free use of the licensed material as long as the creator is credited). When you click "Create Project", the new project should open in RStudio (either in a new, or in the current session. Verify that you see a `README.md` file, which is the welcoming page for users of your repository. Edit this template to explain how users should interact with the project. # Prepare a dataset using `prepare_data.R` The data preparation script should turn your source data into an analysis-ready `data.frame` (or other data object, but you will need to specify custom functions for reading and loading the data in that case). Two important steps usually occur before data is added to a repository: * Removing **any and all potentially identifying information** in the case of sensitive data * Minimal data cleaning required to store the data to a file. The remainder of the data cleaning will be done reproducibly. You can use your own data. If you don't have your own data, you can use some demo data: * Allison Horst's Penguin data, available in different file types at * [Data on the effect of different methods of drying the hands on hang time in rock climbing](https://github.com/cjvanlissa/demo_metarep/blob/1366250e87b9b3efd164e1f6d2a1d5e6158c0602/df.csv). Below is a minimal `prepare_data.R` script. Adapt it for your own data (which will require you to copy the file to your `worcs` project directory, and load them into memory). ```{r eval = FALSE} # Inside prepare_data.R: library(worcs) # Example methods of loading a data file # df <- readxl::read_xlsx("penguins.xlsx", 1) # df <- foreign::read.spss("penguins.sav", to.data.frame = TRUE) # df <- read.csv("penguins.csv", stringsAsFactors = FALSE) # Example data df <- iris # Remove a colum containing "potentially identifying information" df[["Species"]] <- NULL # Inspect the prepared data descriptives(df) ``` # Add the Dataset to the Repository End the file `prepare_data.R` with the following command to save the prepared dataset and publish it on GitHub. If you do not want to publish your data on GitHub, use `closed_data()` instead. This tutorial assumes you use `open_data()`. ```{r eval = FALSE} open_data(df) ``` To confirm that the project now knows how to load the dataset, remove `df` from the environment, then run `load_data()` in the console: ```{r eval = FALSE} rm(df) load_data() ``` # Add some demo analyses Open your `manuscript.Rmd` file. There, edit an existing code chunk, or remove them and create a new code chunk. First, load `worcs` and the data we just created: ```{r eval = FALSE} # Inside manuscript.Rmd: library(worcs) load_data() ``` Now, add some mock analyses. You can insert your own analysis code, or play around with the following functions: ```{r eval = FALSE} # Descriptive statistics res_desc <- descriptives(df) write.csv(res_desc, "res_desc.csv", row.names = FALSE) # Simple model mod <- lm(Sepal.Length ~ Sepal.Width, data = df) res_mod <- summary(mod) write.csv(res_mod$coefficients, "res_coef.csv", row.names = FALSE) ``` Note that, in this code, we write the results to spreadsheet files. You can also print them in the document, for example using: ```{r eval = FALSE} knitr::kable(res_mod, caption = "My regression model coefficients, for a model with $R^2 `r report(res_mod[['r.squared']])`$.") ``` # Reproduce the Project Run the following code in the terminal: ```{r eval = FALSE} reproduce() ``` This should render the manuscript that you've prepared.