cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Jobs overhead why ?

Krthk
New Contributor

Hi, I have a py notebook that I want to execute in an automated manner. One way I found this was to attach this to a job/task and hit it using the api from my local. However this seems to be adding significant overhead, my code even if itโ€™s just one line that should take milliseconds, takes around a minute. Iโ€™m fairly new to the platform can someone explain why this happens and if not this way what are my options ? 

1 ACCEPTED SOLUTION

Accepted Solutions

Isi
Contributor III

Hey @Krthk 

If you want to orchestrate a notebook, the easiest way is to go to File > Schedule directly from the notebook. My recommendation is to use cron syntax to define when it should run, and attach it to a predefined cluster or configure a new job cluster.

Keep in mind that if youโ€™re using a new job cluster, youโ€™ll need to wait for the cluster to spin up, install dependencies, and execute the code. If you configure the cluster with the same specs (instance type, number of workers, etc.) as the one you used during development, the actual code execution time should be similar.

If youโ€™re using a pre-existing interactive cluster, youโ€™ll only need to wait for it to wake up (if itโ€™s currently stopped). After the job finishes, you can check the โ€œView detailsโ€ section on the right panel of the job run, and look into the Event log to see how much time was spent in each phase: cluster creation, init script execution, and actual job execution.

Hope this helps ๐Ÿ™‚

Isi

View solution in original post

1 REPLY 1

Isi
Contributor III

Hey @Krthk 

If you want to orchestrate a notebook, the easiest way is to go to File > Schedule directly from the notebook. My recommendation is to use cron syntax to define when it should run, and attach it to a predefined cluster or configure a new job cluster.

Keep in mind that if youโ€™re using a new job cluster, youโ€™ll need to wait for the cluster to spin up, install dependencies, and execute the code. If you configure the cluster with the same specs (instance type, number of workers, etc.) as the one you used during development, the actual code execution time should be similar.

If youโ€™re using a pre-existing interactive cluster, youโ€™ll only need to wait for it to wake up (if itโ€™s currently stopped). After the job finishes, you can check the โ€œView detailsโ€ section on the right panel of the job run, and look into the Event log to see how much time was spent in each phase: cluster creation, init script execution, and actual job execution.

Hope this helps ๐Ÿ™‚

Isi

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now
OSZAR »