Transformer Boilerplate Generator
As of now, the Open Data Hub does not offer a wide collection of pre-configured "Transformer Blueprints" in the same way as some Data Collectors. Instead, we provide a boilerplate generator that sets up a minimal, ready-to-extend Go-based transformer service. This allows you to quickly get started with a functional transformer and then customize its core transformation logic to fit your specific data requirements.
This boilerplate generator is a command-line script that automates the creation of the necessary project structure, Go module setup, and initial configuration files for a new transformer.
Location of the Boilerplate Generator
The boilerplate generator script is located within the opendatahub-collectors
monorepo at:
https://github.com/noi-techpark/opendatahub-collectors/tree/main/transformers/boilerplate
How to Use the Boilerplate Generator
To create a new transformer using the boilerplate, follow these steps:
-
Clone the
opendatahub-collectors
Repository: If you haven't already, clone the monorepo to your local machine:git clone https://github.com/noi-techpark/opendatahub-collectors.git
cd opendatahub-collectors -
Navigate to the Boilerplate Directory: Change your current directory to where the
setup_go.sh
script is located:cd transformers/boilerplate
-
Run the Setup Script: Execute the
setup_go.sh
script. The script will prompt you for several pieces of information to customize your new transformer../setup_go.sh
You will be asked for the following:
- Project name: This will be the name of your transformer's root folder, used in CI/CD pipelines, and as part of its Kubernetes service name.
- Example:
parking-valgardena
- Example:
- First part of provider tuple: This is the first segment of the
PROVIDER
environment variable that your transformer will use. It typically identifies the data source category or the collector.- Example:
parking-offstreet
- Example:
- Second part of provider tuple: This is the second segment of the
PROVIDER
environment variable, often identifying the specific data source or sub-category.- Example:
skidata
- Example:
- Origin: This identifies the original source or lineage of the data. It's used in the BDP provenance.
- Example:
valgardena
- Example:
After providing the details, the script will ask for confirmation. Type
y
(orY
) and press Enter to proceed.Example Interaction:
This wizard will set up the boilerplate for a new golang transformer
Project name. This will determine root folder, cicd and k8s service e.g. 'parking-valgardena': my-new-transformer
First part of provider tuple: my-data-source
Second part of provider tuple: my-dataset
Origin: my-organization
Are you sure these are correct (y/n)? y
ok, proceeding...
All setup! - Project name: This will be the name of your transformer's root folder, used in CI/CD pipelines, and as part of its Kubernetes service name.
Next Steps After Generation
After the boilerplate is set up, you will have a functional, albeit minimal, transformer. Your primary task will be to:
- Implement Transformation Logic: Modify the
src/main.go
and potentiallysrc/dto.go
files in your new transformer's directory to implement the specific data parsing, validation, and mapping logic required to convert your raw data into the Open Data Hub's BDP format. - Configure Environment Variables: Populate the
.env.example
file with actual credentials and specific configurations for your local development. For deployment, ensure your Helm values (e.g., ininfrastructure/helm/<ORIGIN>.yaml
) are correctly configured, especially for sensitive data using Kubernetes secrets. - Test Your Transformer: Utilize the testing patterns (e.g.,
main_test.go
withbdpmock
) to ensure your transformation logic is correct and robust. - Deploy: Use the generated Dockerfile and Helm charts to build and deploy your transformer to your development, testing, and production environments.
This boilerplate significantly reduces the initial setup time, allowing you to focus immediately on the unique aspects of your data transformation.