We are a non-profit committed to educating the Dallas Metroplex in all things Java.


Big Data and AI Architecture: Apache Iceberg via Spark and LLMs

Please RSVP to help us make the meeting a better experience. While not required to attend, RSVPing is a prerequisite to enter the drawing. RSVP

October 08, 2025


Abstract

This presentation delves into the potential of integrating LLMs with Apache Spark and Apache Iceberg as part of a Big Data to AI foundational architecture. In this session we’ll explore the potential of combining Iceberg, Spark and LLMs to give you a real world AI architecture that uses your data.

We'll build an AI application that allows users to perform data queries and extract insights from massive datasets using natural language. We'll start with understanding the structure and architecture of a large dataset. Then we'll look at options for querying the dataset using Apache Spark and Trino. Finally, we'll use an LLM to query the dataset using natural language. We'll also look at other uses of LLMs as part of an overall solution, and explore the differences between different LLMs.

We’ll also discuss where event streaming (Kafka and Flink) fit into this architecture. The design of this architecture is meant to be flexible and give your dev team the ability to choose different technologies for the processing and querying. I’ll leave you with a CONCRETE example that you can run on your laptop and explore the possibilities. Again, this will be an example of a real-world application; the dataset used will be for home sales data for the last 15 years.

We will use these technologies:

  • Apache Iceberg
  • Apache Spark
  • Ollama for running GenAI models locally

Presented by Pratik Patel

Pratik Patel is a Java Champion and developer advocate at Azul Systems. He wrote the first book on 'enterprise Java' in 1996, "Java Database Programming with JDBC." An all around software and hardware enthusiast with experience in the healthcare, telecom, financial services, and startup sectors. Helps to organize the Atlanta Java User Group, frequent speaker at tech events, and master builder of nachos.





Location and Time


On the second Wednesday of each month, we meet as a group to discuss the latest and greatest Java related methodologies, technologies and tools. Our meeting space is provided by Improving and is located at 5445 Legacy Dr, Suite 100, Plano, TX 75024.

Social time starts at 6:30 PM, announcements and sponsorship information at 7:00 PM, followed by the presentation which ends by 9:00 PM. Our sponsors provide free food and drink during the social hour. After the presentation has come to an end, we hold a drawing where we give away prizes that are also made possible by our sponsors. We look forward to seeing you there!

5445 Legacy Dr, Suite 100, Plano, TX 75024 ( Apple Maps | Google Maps )