Databricks Community

Krthk · ‎03-31-2025

Hi, I have a py notebook that I want to execute in an automated manner. One way I found this was to attach this to a job/task and hit it using the api from my local. However this seems to be adding significant overhead, my code even if it’s just one line that should take milliseconds, takes around a minute. I’m fairly new to the platform can someone explain why this happens and if not this way what are my options ?

Isi · ‎04-01-2025

Hey @Krthk

If you want to orchestrate a notebook, the easiest way is to go to File > Schedule directly from the notebook. My recommendation is to use cron syntax to define when it should run, and attach it to a predefined cluster or configure a new job cluster.

Keep in mind that if you’re using a new job cluster, you’ll need to wait for the cluster to spin up, install dependencies, and execute the code. If you configure the cluster with the same specs (instance type, number of workers, etc.) as the one you used during development, the actual code execution time should be similar.

If you’re using a pre-existing interactive cluster, you’ll only need to wait for it to wake up (if it’s currently stopped). After the job finishes, you can check the “View details” section on the right panel of the job run, and look into the Event log to see how much time was spent in each phase: cluster creation, init script execution, and actual job execution.

Hope this helps 🙂

Isi

View solution in original post

Isi · ‎04-01-2025

Hey @Krthk

If you want to orchestrate a notebook, the easiest way is to go to File > Schedule directly from the notebook. My recommendation is to use cron syntax to define when it should run, and attach it to a predefined cluster or configure a new job cluster.

Keep in mind that if you’re using a new job cluster, you’ll need to wait for the cluster to spin up, install dependencies, and execute the code. If you configure the cluster with the same specs (instance type, number of workers, etc.) as the one you used during development, the actual code execution time should be similar.

If you’re using a pre-existing interactive cluster, you’ll only need to wait for it to wake up (if it’s currently stopped). After the job finishes, you can check the “View details” section on the right panel of the job run, and look into the Event log to see how much time was spent in each phase: cluster creation, init script execution, and actual job execution.

Hope this helps 🙂

Isi