Secret and SSH management in Machine Learning workbench

Hello Celonis and others,

I am trying out the Machine Learning workbench, and in the examples I see the API secrets out in the open. I would prefer to hide the secrets in a secret management system. Does the Machine Learning workbench support something like that, and how can I setup a secret management system?

Inspiration for in my opinion good secret management is in:
https://docs.databricks.com/security/secrets/index.html

For managing SSH keys inspiration can be found in:
https://help.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account

1 Like

Hello Paul!
thanks for bringing this up. While you can use API keys to connect to the data, we strongly advise using an Application Key. Have you had a look at those? https://help.celonis.cloud/help/display/CIBC/Application+Keys

In terms of governance, this will allow you to provide permissions to the App Key instead of a regular user. The keys can be managed by an Admin in the Team Settings.
The Application Key can then be included while creating a Workbench and is saved inside the Workbench itself.

Does this already solve your pain point or do you require something more? E.g. is this about storing secrets to other external services as well?

Thank you
Nicolas

This would be about storing secrets to other external services as well.
Since we for example would like to have a git versioning on our analysis, inspiration:
https://python.celonis.cloud/docs/pycelonis/en/latest/notebooks/99_Use_Case_Version_control.html#Create-or-clean-analyses-backup-folder
Another use case is connecting to external API’s which require an API key as well.

Hey Paul,
thanks for your input. We will integrate ssh-agent so you can authenticate using ssh soon.
For other secrets, I suggest using a file outside the notebook file that, when sourced, writes the secrets to an environment variable that can be used inside your code.
To make this simple in Python, you can install the following library inside your Workbench:

This way you will have the secrets present in your execution environment, but not inside your source code.
Note that you can also set permissions on who can access a specific Workbench as well.

Does that help? If there is anything else we can help you with I am happy to discuss your Use Cases over a quick call.

Best
Nicolas
(Product Management ML Infrastructure)

This helps, and is a solution. However having a secret management would be a good add-on, since it might be hard to share the env files and define which scope of people can access it.

I am looking forward to see the SSH agent implementation :).