Databricks Community

brett-aulbaugh

5 ways to leverage Databricks Assistant to go from Petroleum Engineer to Data Scientist

In today's rapidly evolving energy landscape, petroleum engineers face unprecedented challenges that extend far beyond traditional reservoir management. While many engineers know precisely the type of analysis they want to conduct, they often lack the programming confidence to implement it effectively. Many experienced engineers lack programming knowledge and the vocabulary of data analysis, leading to what is termed “data-driven anxiety” within the workforce. This skills gap can hinder engineers’ ability to leverage big data for actionable insights, especially as their demanding jobs leave little time for upskilling. Compounding this issue is the industry's shortage of specialized data science talent, creating bottlenecks in decision-making processes. Databricks Assistant addresses these challenges by empowering engineers with AI-driven tools that simplify complex data tasks without requiring extensive coding expertise. By leveraging natural language and automated code generation, engineers can independently perform advanced analyses, such as decline curve automation or anomaly detection, while focusing on domain-specific problem-solving. This democratization of technical resources accelerates the industry's digital transformation, enabling petroleum engineers to seamlessly integrate their expertise with cutting-edge data science capabilities.

The Engine Behind Intelligent Assistance

Databricks Assistant is an AI-powered collaborator deeply integrated into the Databricks environment. Unlike other AI coding assistants, it's specifically designed to understand data contexts through the Databricks Data Intelligence Platform, making it uniquely valuable for data workflows. The Assistant leverages Unity Catalog to understand your tables, columns, descriptions, and relationships between data assets across your organization. This contextual awareness makes interactions more relevant and productive than the standard AI assistant. For instance, in datasets that might combine historical and forecasted oil production values, due to the underlying column descriptions and other metadata, the assistant knows exactly how to merge these datasets together, turning complex joins into things of the past.

The assistant is accessible throughout the Databricks interface; in notebooks, SQL queries, dashboards, and even during job error diagnosis, providing consistent support regardless of where you're working.

From Raw Data to Insights: Simplifying Petroleum Engineering Analytics

While the potential applications of Databricks Assistant in petroleum engineering workflows are virtually limitless, this section highlights several key areas that are particularly relevant for engineers embarking on their analytics journey. These foundational use cases demonstrate how the Assistant can transform daily tasks, from data exploration to advanced analytics, empowering engineers to focus on their domain expertise rather than grappling with complex coding challenges.

Fundamentals of Effective LLM Prompting

To begin maximizing the utility of a coding assistant, we first need to understand the best methods of communicating with it. Sometimes AI Assistants seem to read your mind, other times they do not quite produce the desired results. Below are a few key prompt tips to maximize the Assistant's capabilities:

Be specific and provide context: Instead of asking "Write a query showing production deviation from the forecast," specify "Write a SQL query to identify wells with oil production declining more than 15% compared to their 30-day average in the North Field area".
Use step-by-step instructions: Break down complex tasks into sequential steps, especially for multi-stage analyses common in engineering workflows.
Leverage few-shot prompting: Provide examples of desired outputs when asking for code or queries. For instance, when requesting a custom function, include one example of how parameters should be structured and the format that results should be returned.

SQL Generation

One of the most powerful capabilities of Databricks Assistant is generating complex SQL queries from simple natural language requests. For petroleum engineers who may be unfamiliar with SQL nuances such as data types and window functions, this removes a significant barrier to data analysis and ultimately removes the dependency on having an analyst write these queries for them.

When working with multiple production related datasets, perhaps with a goal of combining daily production with forecasted production value to better monitor performance for onshore oil wells. Now engineers simply describe the dataset they would like returned and through the power of the Data Intelligence Platform, the AI Assistant is able to efficiently write SQL statements without the user needing to know any syntax at all.

Prompt:

Using _production_df and @type_curve_df. Write a sql query that finds the wells with the largest under-performance deviation in total production volumes between forecasts and actuals in the last 28 days.

Output:

Due the power of Unity Catalog and the Data Intelligence platform, the Assistant will examine the metadata of both tables, identify the likely join keys (even if named differently), and generate the appropriate SQL query. It will even suggest potential data type conversions or formatting adjustments needed for successful joins.

Visualization Generation

Translating data into visual insights is critical for decision-making. Many engineers default to tools that might be limited in their visualization capabilities.

Instead of scrambling to prepare visualizations for operations presentation now simply describe the visualization you would like to see in Natural language and watch as the Databricks AI Assistant generates the code in your charting library of choice.

Prompt:

Please create a scatter plot visualization using the Ploty framework.  

First, For each well in the _production_df table I want to find the cumulative sum of ACTUALS_BOPD and compare the the cumulative sum of FORECASTED_BOPD from the @type_curve_df.  

Second, For the chart, there should be one point per well with forcasts and actuals summed per point. The X axis should be FORECAST and the Y axis should be ACTUALS.  Color the points by PRODUCING_FORMATION from the @dim_all_wells  table.  To get a better idea of what is over or under performing draw a diagonal 45 degree line  to separate wells that are under performing and lightly shade the area under this line in a transparent red.

Output:

The Assistant translates this request into a properly configured visualization code in the plotting framework of choice, selecting appropriate axes, data aggregations, and visual elements.

Advanced Function Development and Calculations

Decline curve analysis (DCA) is fundamental to production forecasting and is ultimately the method used to forecast how wells are expected to return a companies investment. This has major financial implications if these forecasts come up short. Traditionally, DCA requires specialized software or complex coding. Databricks Assistant transforms this process by generating sophisticated curve-fitting code from simple descriptions.

For example, when prompted to create an automated decline curve analysis function, the Assistant can generate Python code leveraging SciPy's curve_fit function to automatically fit production data to standard ARPS decline models:

Prompt:

Wells in the "Bone Spring" formation are not performing to the current forecast. I want to re-forecast these wells. We used the ARPS decline curve calculation of "q_i / pow(1 + b * d * day , 1/b)" when forecasting these wells initially where q_i is the BOPD on day 1 of production, day is DAYS_FROM_FIRST_PRODUCTION.  

Write a function that solves this for the Bone Spring wells.  You can just average production volumes over a single days from first production.  You should only need to use the _production df and @dim_all_wells tables.  Once the new curve parameters have been calculated show the average actuals vs the new forecasted volumes in a plotly chart.

This prompt produces the code to generate the following fitted parameters alongside the requested visualization.

Output:

Fitted parameters for Bone Spring formation: 
q_i = 5708.8096382592, b = 0.7453919466040444, d = 0.009918253305723445

While this example dives into DCA, the custom calculations and functions are limitless. Describe the transformation or calculation needed and watch Databricks Assistant begin working.

This capability transforms what would traditionally be days of specialized development or custom software into minutes of conversation with an AI assistant, democratizing advanced analytical techniques for all engineers ultimately saving companies time and money through efficiency gains.

Legacy Code Modernization

Many petroleum engineering departments rely on legacy VBA scripts embedded in workbooks for critical analyses. While functional, these solutions are difficult to maintain, lack version control, and don't integrate well with modern data platforms. Databricks Assistant excels at modernizing this legacy code.

By copying and pasting legacy VBA code into the Assistant interface, it is quickly able to convert to python that can run on top of your Data Lakehouse tables.

Prompt:

I have a legacy excel notebook with some VBA code that follows.  Convert this to a python function and give me an example of how to use:

Sub ArpsForecast()
    ' ARPS decline curve calculator
    Dim qi As Double: qi = Cells(2, "B").Value ' Day 1 production
    Dim Di As Double: Di = 0.15 ' Decline rate
    Dim b As Double: b = 0.7 ' Exponent
    Dim days As Integer: days = [A1048576].End(xlUp).Row - 1
 
    For i = 1 To days
        Cells(i + 1, "C").Value = qi / ((1 + b * Di * i) ^ (1 / b))
    Next i
End Sub

Output:

The transition from VBA to Python brings significant benefits beyond just modernization. Python implementations of petroleum engineering workflows showed vast performance improvements over VBA equivalents while enabling integration with powerful data science libraries unavailable in the many workbook based ecosystems.

Syntax Correction

The final and potentially most impactful way petroleum engineers might leverage the AI Assistant is when errors inevitably occur during the development process. While petroleum engineers have a deep understanding of sub-surface physics, the same might not be said about their ability to diagnose issues with code syntax. Luckily the /fix command provides instant troubleshooting. When you encounter an error in your code, perhaps a simple Python syntax error in a complex script, clicking the "Diagnose Error" button activates the Assistant to analyze and offer suggestions to the problem.

Output:

The Assistant will present a “diff” view showing proposed corrections alongside explanations of the underlying issues. This transforms error messages from frustrating roadblocks and hours wasted in online forums, into valuable learning opportunities, helping petroleum engineers build data science skills through practical problem-solving.

Final Thoughts

The integration of Databricks Assistant into Oil and Gas engineering workflows represents a paradigm shift in how engineers interact with data. By removing technical barriers, engineers can focus on solving domain-specific problems rather than wrestling with code syntax. A recent survey shows users leveraging the Assistant complete data analysis tasks at least 30% faster than traditional manual coding approaches, transforming days of development into hours of conversation.

The journey for petroleum engineers to gain data science skills doesn't happen overnight, but Databricks Assistant dramatically accelerates the transition. Engineers who embrace this AI-powered collaboration find themselves gradually building data science capabilities while simultaneously delivering immediate value through enhanced data analysis.

Databricks Community

5 ways to leverage Databricks Assistant to go from Petroleum Engineer to Data Scientist

5 ways to leverage Databricks Assistant to go from Petroleum Engineer to Data Scientist

The Engine Behind Intelligent Assistance

From Raw Data to Insights: Simplifying Petroleum Engineering Analytics

Fundamentals of Effective LLM Prompting

SQL Generation

Visualization Generation

Advanced Function Development and Calculations

Legacy Code Modernization

Syntax Correction

Final Thoughts

Metadata-Driven ETL Framework in Databricks (Part-1)

Top 10 query performance tuning tips for Databricks Serverless SQL

Best practices for safe data experimentation with Databricks