Data Processing Scripts on Statistics Denmark's server
At the Agency for Digital Government, one of my team’s main tasks was to analyze Danish citizens’ trust in the digital public sector. Each year, we conducted a nation wide survey and published a report with the findings.
Previously, we had to pay Statistics Denmark (DST) to extract or prepare data for analysis. I designed and built a set of Python scripts that automated the full processing pipeline, enabling us to access the data ourselves. This gave our team direct control of the data and reduced the time and cost of producing the analysis-ready datasets used for the reports.
Overview & My Role
I designed the technical setup that enabled our team to efficiently prepare data for analysis. This included planning the workflow, structuring the scripts, and building checks to ensure outputs could be reused reliably year over year. The scripts automated the core steps: merging survey data with DST registers, recoding variables, and producing export tables for analysis and reporting.
- My role: workflow planning, script development, QA checks
- Impact: faster data preparation, consistent outputs across survey years, reduced manual workload
Tech & Tools
- Language: Python (pandas, NumPy)
- Environment: Jupyter Notebooks on DST Research Server
- Outputs: Clean datasets & structured crosstabs for reporting
Process Overview
Process Data
Combine survey results with DST register data and prepare a validated dataset for analysis.
Export Datasets
Produce descriptive tables (crosstabs, distributions) and structured datasets in CSV.
Create Reporting
Analyze survey results using the the prepared datasets and publish the official trust report.