New Zealand Research Software Engineering Conference 2023
The New Zealand Institute for Plant and Food Research Limited
18 September 2023
What I learned at uni:
statistics
data analysis
programming in R and Python
What I wish I had learned:
Version control
Managing your computational environment
Documentation
Unit testing
Workflow management
Sustainability
Open science
Access to the history of the project
Easy collaboration on code
Standardised way of sharing and publishing code
Use dependency management tools like renv (R) or conda (Python) in your work
1. Record dependencies
2. Isolate project library
READMEs:
Codebooks/data dictionaries:
Concept: mix code and plain text
Record reasoning, notes and interpretation alongside source code and results
Reduce copy-pasting for reports and presentations
Write formal tests for:
data (dimensions, range of values, etc.)
code (correct output type, handles errors)
analysis steps (sensible results, returns known answer)
Test models and algorithms with simulated data
Is the model appropriate to answer the research question?
How do assumptions violations affect the results?
Multiple scripts, folders, datasets
Interdependent analyses
What to do when a part changes
Turn your analysis into a pipeline, i.e. series of steps linked through input/output, which will be executed in the correct order
Statisticians and data scientists need more than statistical skills
Software engineering practices are crucial for work sustainability and for collaboration
These skills should be taught in statistical degrees (but there are lots of resources out there!)
Richard McElreath’s talk about Science as Amateur Software Development
The Turing Way handbook to reproducible, ethical and collaborative data science
Bruno Rodrigues’ book on Building reproducible analytical pipelines with R
The Good Research Code Handbook by Patrick Mineault