SIB Training Needs Survey Analysis
Summary
This report analyses responses to the SIB training needs survey. The survey collected responses from researchers across career stages and countries, asking about training needs across five topic areas (omics, computational methods & AI, data management, biomedicine & pathogens, and biodiversity & ecology), as well as preferred learning formats and perceived barriers to participation in SIB courses.
Analysis steps:
- Response timing — Distribution of survey completion times to understand when responses were collected.
- Respondent overview — Frequency breakdowns of career stage, country, Swiss canton, and organisation type to characterise the respondent pool.
- Training needs (all topics) — Diverging Likert bar charts comparing response distributions across all five topic areas, broken down by Switzerland vs. other, career stage, and time since last SIB course attendance.
- Participation barriers — Same breakdown structure applied to perceived barriers to attending SIB courses.
- Preferred learning formats — Same breakdown structure applied to format preferences.
Response timing
The histogram below shows when respondents completed the survey, based on the recorded completion timestamp. This helps identify whether responses were collected in a single burst (e.g. following a specific outreach) or spread over time.
Respondent overview
The four charts below describe the composition of the respondent pool across career stage, country of work, Swiss canton (Swiss respondents only), and organisation type. Countries with fewer than 3 respondents are omitted from the country chart.
Likert-type questions
Each Likert question consists of a set of sub-questions (topics, barriers, or formats) rated on a 5-point scale. Results are displayed as diverging bar charts, with negative responses extending left and positive responses extending right of the centre line. The neutral/middle category is split evenly between both sides. Sub-questions on the y-axis are ordered by their mean scale score, so the topic or item with the most positive overall rating appears at the top.
Each section is broken down by three grouping variables: Switzerland vs. other countries, career stage, and time since last SIB course attendance. All charts show all respondents by default. The career stage and SIB course charts include a Swiss only toggle button to restrict the view to Swiss respondents, because SIB courses primarily target the Swiss research community.
Training needs: all topics
The following charts pool all five training topic areas (omics, computational methods & AI, data management, biomedicine & pathogens, biodiversity & ecology) into a single plot. Each sub-question corresponds to a specific topic within one of these areas.
Switzerland vs. non-Switzerland
Training need ratings across all topic areas, comparing Swiss respondents with those working elsewhere. Use Show percentages to toggle between raw counts and percentages.
By career stage
Training need ratings split by career stage. Bachelor’s and Master’s students are merged into ‘Other’. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.
By time since last SIB course
Training need ratings split by time since last SIB course attendance. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.
By canton (selected)
Training need ratings comparing the six largest Swiss cantons: Zurich, Bern, Vaud, Geneva, Fribourg, and Basel-Stadt. Use Show percentages to switch between raw counts and percentages, and Positive only to show only the top two response categories.
Participation barriers
Respondents rated how much each of a set of factors would be a barrier to attending a SIB training course, on a scale from “No barrier” to “Critical barrier”. The charts below show the distribution of ratings grouped by region, career stage, and SIB course history.
Switzerland vs. non-Switzerland
Barrier ratings comparing Swiss respondents with those working elsewhere. Use Show percentages to toggle between raw counts and percentages.
By career stage
Barrier ratings split by career stage. Bachelor’s and Master’s students are merged into ‘Other’. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.
By time since last SIB course
Barrier ratings split by time since last SIB course attendance. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.
By canton (selected)
Barrier ratings comparing the six largest Swiss cantons. Use Show percentages to toggle between raw counts and percentages.
Preferred learning formats
Respondents indicated how much they like or dislike each learning format on a scale from “Strongly dislike” to “Strongly like”. Note that formats are not mutually exclusive — respondents could express a preference for multiple formats. The charts below compare preferences across region, career stage, and SIB course history.
Switzerland vs. non-Switzerland
Format preference ratings comparing Swiss respondents with those working elsewhere. Use Show percentages to toggle between raw counts and percentages.
By career stage
Format preference ratings split by career stage. Bachelor’s and Master’s students are merged into ‘Other’. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.
By time since last SIB course
Format preference ratings split by time since last SIB course attendance. Use Show percentages to switch between raw counts and percentages, and Swiss only to restrict the view to Swiss respondents.
By canton (selected)
Format preference ratings comparing the six largest Swiss cantons. Use Show percentages to toggle between raw counts and percentages.
Clustering respondents by course preferences
To group respondents by training-course interests, we use a graph-based approach:
- Encode each respondent’s Likert answers for all course-topic items as numeric values.
- Build one feature vector per respondent (one dimension per topic item).
- Compute respondent-to-respondent correlations across these features.
- Build a weighted graph where nodes are respondents and edges connect respondents with high positive correlation.
- Detect communities with Louvain clustering on the weighted graph.
- Profile each cluster using learning-format preferences.
The chunk below implements this workflow.
# A tibble: 4 × 2
metric value
<chr> <dbl>
1 nodes 258
2 edges 6581
3 edge_correlation_threshold 0.322
4 communities 5
Cluster characterization by learning-format preferences
The following charts show learning-format preferences for each cluster (clusters with > 5 respondents only). Sub-questions on the y-axis are ordered by mean preference score, so the most liked format appears at the top.
Cluster descriptions
Based on the learning preferences of the respondents, the clusters can be characterized based on the top-ranked course-topic preferences within each cluster. These characterizations are used to assign descriptive labels to each cluster.
Cluster 1: AI and Data Infrastructure
- Interests: This group is primarily interested in artificial intelligence and data engineering. Their top interests include AI-assisted coding, advanced AI techniques (RAGs, agentic AI), and the extraction of knowledge using LLMs. They also show a preference for operational aspects, such as GitHub actions (CI/CD), cloud computing, FAIR data principles, and reproducible workflows.
- Likely Domain: They likely work as Bioinformaticians, Data Engineers, or Computational Scientists who focus on building platforms, infrastructure, and AI-driven tools for life science research.
Cluster 2: Infectious Disease Genomics
- Interests: This group is primarily interested in infectious disease research. Their top interests include Phylogenetics, Antimicrobial Resistance (AMR), Viral Bioinformatics, and Pathogen data analysis. While they have an interest in workflow reproducibility, their focus remains on how these tools apply to tracking and understanding microbes and viruses.
- Likely Domain: They likely work as Microbiologists, Epidemiologists, or Pathogen Genomicists working in public health, clinical diagnostics, or environmental microbiology (eDNA).
- Summary: Infectious Disease Genomics
Cluster 3: Single-cell & Molecular Mechanisms
- Interests: This group is primarily interested in high-resolution molecular biology. Their top interests include single-cell multiomics (RNA/ATAC-seq), epigenetics, non-coding RNA, and CRISPR-based screens. They are interested in functional and molecular biology and how different omics techniques integrate.
- Likely Domain: They likely work as Molecular Biologists, Geneticists, or Cell Biologists working in basic research, developmental biology, or functional genomics.
Cluster 4: Proteomics & AI Imaging
- Interests: This group is primarily interested in proteins and metabolites. Their top interests include Proteomics and Mass Spectrometry, but they have a distinct secondary interest in combining this data with AI and medical image analysis. They are the most interested group in using AI for clinical/biomedical applications specifically involving multi-modal data.
- Likely Domain: They likely work as Proteomics Specialists, Biochemists, or Clinical Researchers who utilize mass spectrometry and imaging to find biomarkers or understand disease states.
Cluster composition pie charts
The following pie charts show category composition within each cluster (> 5 members only). Answer options come from the metadata JSON, with an explicit “Other” category always allowed. For high-cardinality categories, the top 5 most frequent non-“Other” answers are shown and all remaining answers are merged into “Other”.
Learning-format preferences by cluster
After respondents are clustered by course-topic interests, we can inspect whether each cluster has a distinct preferred training format.
Likert plots by cluster (> 5 members only)
The following charts are shown separately for each detected respondent cluster and include only clusters with more than 5 respondents. This ensures sub-question ordering is computed within each cluster.