We are a non-profit committed to educating the Dallas Metroplex in all things Java.


The State of AI in Large Scale Automated Refactoring

Please RSVP to help us make the meeting a better experience. While not required to attend, RSVPing is a prerequisite to enter the drawing. RSVP

November 13, 2024


Abstract

LLMs are data hungry, and when it comes to source code the text simply isn't enough to make large scale inferences about a codebase. As we know, code has a unique structure and strict grammar, as well as dependencies and type information that must be deterministically resolved by a compiler. This is information that could be incredibly useful for AI but is invisible to the text of the source code.

For example, if you're going to try to answer even a simple question about where Guava is used or where a particular logging library is used, you’ll find that while uses can occur in the code, the code-as-text may not have reference to the library you are looking for. Imagine a logger instance inherited as a protected field from a base class that is defined in a binary dependency. The import statement that identifies which logging library that logger is coming from is IN the binary dependency, not in the text of the call site. A human would do no better in this situation.

This keynote addresses how to improve AI accuracy for large-scale code refactoring by improving the data source. We’ll explore a state-of-the-art code data model called the Lossless Semantic Tree (LST) that’s part of the open source OpenRewrite auto-refactoring project. We’re finding that the LST and recipes are amazingly easy tools to equip LLMs with the data they need to make accurate decisions.

The common excuse for inaccuracy or incompleteness in output is that LLMs will get better, but actually I think the models are quite good enough right now. What they too often lack is the data to make inferences. We’ll show why, when evaluating LLMs for large scale automated refactoring:

  • If it's based on text, you don't want it
  • If it's based on AST, you don't want it

Presented by Jonathan Schneider

Jonathan is co-founder and CEO at Moderne, the pioneer of mass-scale auto-refactoring and analysis of codebases. He founded OpenRewrite, an auto-refactoring tool, at Netflix and went on to found the Micrometer project as a member of the Spring Team. He also is the author of “SRE with Java Microservices” (O’Reilly) and “Automated Code Remediation” (O’Reilly). He is an Army veteran and two-time bronze star recipient.





Location and Time


On the second Wednesday of each month, we meet as a group to discuss the latest and greatest Java related methodologies, technologies and tools. Our meeting space is provided by Improving and is located at 5445 Legacy Dr, Suite 100, Plano, TX 75024.

Social time starts at 6:30 PM, announcements and sponsorship information at 7:00 PM, followed by the presentation which ends by 9:00 PM. Our sponsors provide free food and drink during the social hour. After the presentation has come to an end, we hold a drawing where we give away prizes that are also made possible by our sponsors. We look forward to seeing you there!

5445 Legacy Dr, Suite 100, Plano, TX 75024 ( Apple Maps | Google Maps )