A Data science projects
This appendix sketches the types, criteria, and some ideas and resources for interesting and engaging data science (DS) projects. Whereas different project types (Section A.1) correspond to different courses and curricula, all types of DS projects share basic desiderata and requirements (Section A.2).
A.1 Project types
We distinguish between three types of DS projects: Whereas basic DS projects (Section A.1.1) are suitable for an introductory course (e.g., i2ds 1: Basics), applied DS projects (Section A.1.3) are more advanced and suitable for a follow-up course (e.g., i2ds 2: Applications). By contrast, a data visualization project (Section A.1.2) meets the goals and requirements of a course on data visualization.
A.1.1 Basic DS project
Basic DS projects should cover the contents of Part 1: Foundations and Part 4: Wrangling data, as well as selected elements of Part 2: Programming and Part 3: Visualising data. Such projects should primarily find and explore an interesting real-world dataset (as in Chapter 15: Exploring data). Suitable datasets are non-trivial (i.e., contain at least a few dozen or hundreds of observations and a mix of character/factor and numeric variables) and should raise interesting research questions. Ideally, the project should explain and document all steps and answer the question(s) that motivated it.
Key steps
More specifically, proceed as follows to conduct a basic DS project:
Find some non-trivial data that may be shared and raises interesting questions (see below for links and suggestions).
-
Conduct an exploratory data analysis (EDA, as in Chapter 15: Exploring data) in a reproducible Rmd-file:
- Formulate some questions to be answered by the data
- Read in the data (e.g., from a
CSV-file) - Describe the data (its dimensions, observations, variable types, etc.)
- Tidy and transform the data
- Visualize key aspects of the data
- Answer your questions (or explain why they remain unanswered)
Document and explain all steps (in text and commented code) and note which R packages and functions are being used to solve which tasks.
Include links and references to data sources, R packages, and all other sources used.
In short, a basic DS project should locate some real-world data that is non-trivial, may be shared, and raises interesting questions. Then explore the data to answer the questions in a well-documented and reproducible fashion.
See Appendices B3 of the ds4psy textbook (Neth, 2026a) for links to potential data sources.
Using the i2ds survey data
An example of a suitable dataset is provided by the i2ds_survey data included in the ds4psy package (Neth, 2026b).
When using this data, select an appropriate subset of the data and complete the following tasks:
- Describe the (chosen subset of the) sample
- Formulate some (non-trivial) hypotheses
- Evaluate each hypothesis (descriptively and/or visually)
In general, completing a basic DS project uses a mix of text and code to:
- describe the data (i.e., the variables used to address some hypothesis)
- explain what you are doing, why you are doing it, and how are you doing it
- draw conclusions (but also mention limitations)
- provide references (to background material, scientific theories, but also R packages)
A non-trivial hypothesis involves two or more variables and is justified by some narrative or theory. Examples of such hypotheses are:
- Is some preference (e.g., nutritional, political) correlated with another?
- Does a person’s zodiac sign correspond to some habit or personality trait?
- Are participants’ expressed art preferences consistent or circular (with respect to art styles)?
Hypotheses can be addressed descriptively or visually (i.e., statistical tests are not required in this course).
As basic DS projects focus on data transformation and visualization skills, top grades are reserved for using more difficult data and variables. For instance, to show off you data-wrangling skills, you could
- use the raw data (rather than the pre-processed
i2ds_surveydata included in the ds4psy package) - use more difficult variables that require some pre-processing (e.g., by turning text into numbers, categorizing values into factors)
A.1.2 Data visualization project
A data visualization project essentially consists in visually exploring a set of data. This type of DS project fits to the goals and topics of a course on data visualization (i.e., focusing on Part 3: Visualizing data).
Data choice
All tasks of a visualization project must be based on a single set of data:
- Find a suitable dataset (e.g., the
i2ds_surveydata from the ds4psy package) - Describe the data (its source, cases, and variables) and explain your choice
Main tasks
The following 3 tasks are required for passing this course:
- Create at least 5 different types of visualizations (and explain what they show and why they are showing it as they do)
Create at least 2 advanced visualizations in ggplot2 (combining multiple geoms, using faceting, etc.)
Create at least 1 visualization that uses as ggplot2 extension
Aspects to consider for all visualization tasks:
- Select required cases and variables and transform data in appropriate ways before using visualization functions
- Explain what is being shown, how it is shown, and why it is shown in this way
- Describe all graphical elements (e.g., provide label for axes, legends, etc.)
- Use suitable and consistent colors and color palette(s)
Bonus tasks
The following tasks are optional, but can further boost your final grade:
Create at least 2 types of visualization in both base R and with the ggplot2 package (and explain the challenges you encountered and how you solved them)
Create a custom visualization function by wrapping base R or ggplot2 code with appropriate arguments (e.g., for data, layouts, or aesthetic parameters)
Create a custom visualization function for a box-and-arrow diagram (e.g., a flowchart or tree diagram, see the FFTrees or riskyr packages for examples)
Create both a good and bad version of some visualization (and justify your evaluations)
A.1.3 Applied DS project
Applied DS projects are more advanced insofar as they integrate various chapters and topics and go beyond existing examples. Ideally, an advanced DS project should be based on the contents of Part 6: Applications and should have or imply real-world applications. Such projects could create new models or simulations, contribute to existing or create new R packages, or use existing R functions in interactive applications (e.g., using Shiny).
Ideas for applied DS projects
The following ideas for applied DS projects are based on chapters of Part 6: Applications:
- Comparing strategies in games (e.g., heuristic vs. learning agents)
- Performing a social network analysis
- Creating a mate search simulation
- Creating a foraging model (e.g., comparing heuristic or RL approaches in single vs. multi-agent simulations)
- Predicting the stock market (and evaluating portfolio performance)
- Plotting text (see Section 24.3)
- Conducting a sentiment analysis
- Creating artistic visualizations (see Section 24.4.2)
Projects relating to R packages
Projects can also collect or provide new data and revise or extend functions from existing R packages. Related ideas for advanced DS projects include:
Collecting new data (and provide them as an R package)
Creating new data processing or visualization functions
Contributing to existing R packages (e.g., see the R packages ds4psy, unikn, unicol, FFTrees, or riskyr)
Creating an interactive application for existing R functions (e.g., using R Shiny, see Chapter 25)
Creating an R package (see Appendix B: Developing R packages)
A.2 Desiderata and requirements
All types of DS projects share a common set of desiderata and requirements. The success of any DS project will be evaluated on its content and formal characteristics, as well as the details of its timely submission.
A.2.1 Content
Key ingredients of a successful DS project include:
- Ask an interesting question that can be answered within a course project
- Sketch the analysis, method or model that is suited to answer the question
- Find or generate suitable data
- Implement the analysis, method, or model (with comments)
- Consider including data summaries and visualizations
- Interpret your results to answer your original hypotheses or questions
- Conclude by mentioning limitations and/or possible next steps
A.2.2 Form
Formal requirements for all types of projects include:
- Use the technologies of R, RStudio project, and RMarkdown to implement your project
- Use a reproducible
Rmdinput file that presents your procedure in a transparent fashion (in text and R code)
- Begin by loading all required R packages and data files or packages
- Document your methodology, intermediate steps, and conclusions (in both text and code)
- Explain which R functions and packages you have been using for solving which tasks
- Include links and references to all data sources, R packages, and other sources
- Generate a self-sufficient output file (in
HTMLorPDF-format)
A.2.3 Submission
Your project should be implemented as an RStudio project and store all required files in a project directory.
To submit your project, you should create a ZIP-archive of your project directory that includes all files and sub-directories.
The name of your archive should indicate your name, the type of DS project, and some descriptive elements (e.g., a keyword and date).
Specifically, this implies the following steps:
-
Include all text and code in an RStudio project and a single RMarkdown file:
- Store your input file, output file, and all other required files (e.g., data or images) in a single project directory.
- Ensure that your
Rmd-input file successfully reads in your data and compiles into an output file (using only R packages loaded in it). - Compile your
Rmd-input file into a single and self-contained output file inHTMLorPDFformat.
Create a ZIP-archive that contains all project files (e.g., data, input/output files, and image files)
-
Name your
zip-archive so that it indicates- your name (in
LastName-FirstNameformat), - your course (e.g.,
i2ds_1,i2ds_2, orvis4psy), - some descriptive keywords indicating the title or topic of your DS project, and
- the current date (e.g.,
LastName-FirstName_i2ds_2_sim-patch-foraging_yymmdd.zip).
- your name (in
-
Email your
zip-archive to the course instructor (with the subject line indicating your course) prior to the expiration of the submission deadline.
The deadline for submitting your archived project (in the current semester) is Friday, July 31, 2026 (on 23:59).
A.3 Advice
The best advice for a successful data science project is to find and do something that you are really interested in. Beyond that, start early, document what you are doing, explain why you are doing it in this way, and — most importantly — have fun!
A.4 Resources
This section provides pointers to additional resources:
See the appendices of the Data Science for Psychologists (ds4psy) textbook (Neth, 2026a):
Appendix C: Data science project provides general advice for successful data science projects
Appendix B.3.3 links to potential data sources. Please make sure that your data may be shared and include appropriate references to credit its creators or providers.
Inspirations for models and simulations:
Page (2018) contains dozens of models that could be implemented in simulations
The Learning Machines blog provides many inspirations that can be developed into projects
Tools for creating R packages and interactive applications:
Using devtools for creating R packages (Wickham, 2015; Wickham, Hester, Chang, et al., 2026; Wickham & Bryan, in progress); see Appendix B
Using R Shiny for creating interactive dashboards (Chang et al., 2026; Wickham, 2021)


