SIB Training Needs Survey Analysis

Summary

This report analyses responses to the SIB training needs survey. The survey collected responses from researchers across career stages and countries, asking about training needs across five topic areas (omics, computational methods & AI, data management, biomedicine & pathogens, and biodiversity & ecology), as well as preferred learning formats and perceived barriers to participation in SIB courses.

Analysis steps:

Response timing — Distribution of survey completion times to understand when responses were collected.
Respondent overview — Frequency breakdowns of career stage, country, Swiss canton, and organisation type to characterise the respondent pool.
Training needs (all topics) — Diverging Likert bar charts comparing response distributions across all five topic areas, broken down by Switzerland vs. other, career stage, and time since last SIB course attendance.
Participation barriers — Same breakdown structure applied to perceived barriers to attending SIB courses.
Preferred learning formats — Same breakdown structure applied to format preferences.

Response timing

The histogram below shows when respondents completed the survey, based on the recorded completion timestamp. This helps identify whether responses were collected in a single burst (e.g. following a specific outreach) or spread over time.

Figure 1: Distribution of survey submission times (completion time).

Respondent overview

The four charts below describe the composition of the respondent pool across career stage, country of work, Swiss canton (Swiss respondents only), and organisation type. Countries with fewer than 3 respondents are omitted from the country chart.

Figure 3: Top countries of respondents (countries with ≥ 3 respondents shown).

Figure 5: Time since last SIB course attendance across four respondent groupings. (A) Switzerland vs. other countries. (B) Canton (cantons with > 10 respondents only). (C) Career stage (Bachelor and Master’s students grouped as ‘Other’). (D) Organisation type.

Figure 6: Organisation type of respondents.

Likert-type questions

Each Likert question consists of a set of sub-questions (topics, barriers, or formats) rated on a 5-point scale. Results are displayed as diverging bar charts, with negative responses extending left and positive responses extending right of the centre line. The neutral/middle category is split evenly between both sides. Sub-questions on the y-axis are ordered by their mean scale score, so the topic or item with the most positive overall rating appears at the top.

Each section is broken down by three grouping variables: Switzerland vs. other countries, career stage, and time since last SIB course attendance. All charts show all respondents by default. The career stage and SIB course charts include a Swiss only toggle button to restrict the view to Swiss respondents, because SIB courses primarily target the Swiss research community.

Training needs: all topics

The following charts pool all five training topic areas (omics, computational methods & AI, data management, biomedicine & pathogens, biodiversity & ecology) into a single plot. Each sub-question corresponds to a specific topic within one of these areas.

Switzerland vs. non-Switzerland

Training need ratings across all topic areas, comparing Swiss respondents with those working elsewhere. Use Show percentages to toggle between raw counts and percentages.

By career stage

Training need ratings split by career stage. Bachelor’s and Master’s students are merged into ‘Other’. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.

By time since last SIB course

Training need ratings split by time since last SIB course attendance. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.

By canton (selected)

Training need ratings comparing the six largest Swiss cantons: Zurich, Bern, Vaud, Geneva, Fribourg, and Basel-Stadt. Use Show percentages to switch between raw counts and percentages, and Positive only to show only the top two response categories.

Participation barriers

Respondents rated how much each of a set of factors would be a barrier to attending a SIB training course, on a scale from “No barrier” to “Critical barrier”. The charts below show the distribution of ratings grouped by region, career stage, and SIB course history.

Switzerland vs. non-Switzerland

Barrier ratings comparing Swiss respondents with those working elsewhere. Use Show percentages to toggle between raw counts and percentages.

By career stage

Barrier ratings split by career stage. Bachelor’s and Master’s students are merged into ‘Other’. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.

By time since last SIB course

Barrier ratings split by time since last SIB course attendance. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.

By canton (selected)

Barrier ratings comparing the six largest Swiss cantons. Use Show percentages to toggle between raw counts and percentages.

Preferred learning formats

Respondents indicated how much they like or dislike each learning format on a scale from “Strongly dislike” to “Strongly like”. Note that formats are not mutually exclusive — respondents could express a preference for multiple formats. The charts below compare preferences across region, career stage, and SIB course history.

Switzerland vs. non-Switzerland

Format preference ratings comparing Swiss respondents with those working elsewhere. Use Show percentages to toggle between raw counts and percentages.

By career stage

Format preference ratings split by career stage. Bachelor’s and Master’s students are merged into ‘Other’. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.

By time since last SIB course

Format preference ratings split by time since last SIB course attendance. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.

By canton (selected)

Format preference ratings comparing the six largest Swiss cantons. Use Show percentages to toggle between raw counts and percentages.

Clustering respondents by course preferences

To group respondents by training-course interests, we use a graph-based approach:

Encode each respondent’s Likert answers for all course-topic items as numeric values.
Build one feature vector per respondent (one dimension per topic item).
Compute respondent-to-respondent correlations across these features.
Build a weighted graph where nodes are respondents and edges connect respondents with high positive correlation.
Detect communities with Louvain clustering on the weighted graph.
Profile each cluster using learning-format preferences.

The chunk below implements this workflow.

# A tibble: 4 × 2
  metric                        value
  <chr>                         <dbl>
1 nodes                       258    
2 edges                      6581    
3 edge_correlation_threshold    0.322
4 communities                   5

Figure 7: Respondent network from course-topic preference correlations. Edge widths indicate stronger positive correlation; colors indicate Louvain communities.

Cluster characterization by learning-format preferences

The following charts show learning-format preferences for each cluster (clusters with > 5 respondents only). Sub-questions on the y-axis are ordered by mean preference score, so the most liked format appears at the top.

Figure 8: Training needs per cluster (clusters with > 5 respondents only).

Figure 9: Training needs per cluster (clusters with > 5 respondents only).

Figure 10: Training needs per cluster (clusters with > 5 respondents only).

Figure 11: Training needs per cluster (clusters with > 5 respondents only).

Cluster descriptions

Based on the learning preferences of the respondents, the clusters can be characterized based on the top-ranked course-topic preferences within each cluster. These characterizations are used to assign descriptive labels to each cluster.

Cluster 1: AI and Data Infrastructure

Interests: This group is primarily interested in artificial intelligence and data engineering. Their top interests include AI-assisted coding, advanced AI techniques (RAGs, agentic AI), and the extraction of knowledge using LLMs. They also show a preference for operational aspects, such as GitHub actions (CI/CD), cloud computing, FAIR data principles, and reproducible workflows.
Likely Domain: They likely work as Bioinformaticians, Data Engineers, or Computational Scientists who focus on building platforms, infrastructure, and AI-driven tools for life science research.

Cluster 2: Infectious Disease Genomics

Interests: This group is primarily interested in infectious disease research. Their top interests include Phylogenetics, Antimicrobial Resistance (AMR), Viral Bioinformatics, and Pathogen data analysis. While they have an interest in workflow reproducibility, their focus remains on how these tools apply to tracking and understanding microbes and viruses.
Likely Domain: They likely work as Microbiologists, Epidemiologists, or Pathogen Genomicists working in public health, clinical diagnostics, or environmental microbiology (eDNA).
Summary: Infectious Disease Genomics

Cluster 3: Single-cell & Molecular Mechanisms

Interests: This group is primarily interested in high-resolution molecular biology. Their top interests include single-cell multiomics (RNA/ATAC-seq), epigenetics, non-coding RNA, and CRISPR-based screens. They are interested in functional and molecular biology and how different omics techniques integrate.
Likely Domain: They likely work as Molecular Biologists, Geneticists, or Cell Biologists working in basic research, developmental biology, or functional genomics.

Cluster 4: Proteomics & AI Imaging

Interests: This group is primarily interested in proteins and metabolites. Their top interests include Proteomics and Mass Spectrometry, but they have a distinct secondary interest in combining this data with AI and medical image analysis. They are the most interested group in using AI for clinical/biomedical applications specifically involving multi-modal data.
Likely Domain: They likely work as Proteomics Specialists, Biochemists, or Clinical Researchers who utilize mass spectrometry and imaging to find biomarkers or understand disease states.

Cluster composition pie charts

The following pie charts show category composition within each cluster (> 5 members only). Answer options come from the metadata JSON, with an explicit “Other” category always allowed. For high-cardinality categories, the top 5 most frequent non-“Other” answers are shown and all remaining answers are merged into “Other”.

Figure 12: Career stage composition per cluster.

Figure 13: Organisation type composition per cluster.

Figure 14: Swiss vs other composition per cluster.

Figure 15: Canton composition per cluster (top 5 + Other).

Learning-format preferences by cluster

After respondents are clustered by course-topic interests, we can inspect whether each cluster has a distinct preferred training format.

Likert plots by cluster (> 5 members only)

The following charts are shown separately for each detected respondent cluster and include only clusters with more than 5 respondents. This ensures sub-question ordering is computed within each cluster.

Figure 16: Participation barriers per cluster (clusters with > 5 respondents only).

Figure 17: Participation barriers per cluster (clusters with > 5 respondents only).

Figure 18: Participation barriers per cluster (clusters with > 5 respondents only).

Figure 19: Participation barriers per cluster (clusters with > 5 respondents only).

Figure 20: Learning format preferences per cluster (clusters with > 5 respondents only).

Figure 21: Learning format preferences per cluster (clusters with > 5 respondents only).