🔸 Issue #22: Structured Generation

Plus: Midship startup, Paul Graham's latest essay and Greg's tips on how to find and validate a startup idea

Nino Risteski

Aug 12, 2024

🗒️ IN TODAY’S ISSUE

🔸 “Structured Generation” from the paper “Let me speak freely? A study on the impact of format restrictions on performance on LLMs”
👨🏻‍💻 “Midship” - Extract info from docs and input it into your spreadsheets
🧠 “The right kind of stubborn” - Paul Graham's essay
📱 How to generate and validate startup ideas, Greg Isenberg

🔸 Extract: Structured Generation

from the paper “Let me speak freely? A study on the impact of format restrictions on performance on LLMs” by Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen

The paper explores how putting constraints on the output format of LLMs can affect their performance across various tasks. The researchers focused on structured generation, which is the process of producing content in standardized formats.

This is widely used in real-world applications to extract key information from the models' outputs easily. Here's the surprising part – the paper found that LLMs show a significant decline in their reasoning abilities when they have to stick to strict format restrictions.

The researchers tested the models on reasoning tasks like GSM8K (a collection of math problems set in natural language) and last-letter concatenation (where the model concatenates the last letters of a sequence of words). They discovered that the stricter the format constraints, the worse the models performed on these reasoning-focused tasks.

Image #1: GPT-3.5-turbo prompted with GSM8K math questions in standard natural language answered correctly, but failed when format restrictions were applied. (screenshot and explanation from the paper)

They also looked at classification tasks, such as DDXPlus (a medical diagnosis dataset) and Sports Understanding (a task that checks if a sentence about sports is plausible). Interestingly, they found that format restrictions can sometimes improve performance in these types of tasks.

How? By reducing the number of possible answers, it leads to fewer parsing errors. However, in tasks that require more reasoning, overly restrictive formats hold the models back from using their full potential.

So, what's the takeaway? The paper suggests that when using LLMs in real-world applications, it's all about finding the right balance between format adherence and preserving the models' reasoning capabilities. It depends on the specific task at hand.

What is Structured Generation?

Structured generation refers to the process of producing content in standardized formats, such as JSON, XML, and YAML. This approach is particularly significant in the context of LLMs due to its implications for data extraction, parsing, and overall output reliability in real-world applications.

Importance of Structured Generation

Standardization: Structured generation allows for consistent output formats, which is crucial for integrating LLMs into industrial applications. It simplifies the parsing workflows and enhances the reliability of the generated content.
Facilitating Data Extraction: By adhering to structured formats, LLMs can produce outputs that are easier to extract and evaluate, making it simpler for downstream systems to utilize the generated data.
Enhancing Performance: The structured output can potentially improve the performance of LLMs in specific tasks, particularly classification tasks, by reducing errors associated with free-form responses.

Methodologies for Structured Generation

The paper outlines several methodologies for implementing structured generation:

Constrained Decoding (JSON-mode): This technique restricts the output of LLMs by enforcing predefined token spaces during the generation process. JSON mode is widely used in industrial settings, ensuring that the output is valid JSON. This method is implemented as a hyperparameter in various LLM APIs.
Format-Restricting Instructions (FRI): These instructions direct the LLM to generate responses in standardized formats while adhering to specified schemas. This approach ensures that the generated output follows a structured format, facilitating extraction and evaluation.
NL-to-Format: This two-step process first instructs the LLM to generate a response in natural language and then convert that response into the desired structured format. This method aims to maintain the performance of unrestricted natural language responses while still providing structured output.

Impact of Format Restrictions

The study investigates how different levels of format restrictions affect LLM performance, particularly in reasoning tasks. Key findings include:

Performance Degradation: Stricter format constraints generally lead to greater performance degradation in reasoning tasks. For example, LLMs showed a significant decline in reasoning abilities when required to adhere to strict formats compared to generating free-form responses.
Task-Dependent Performance: The impact of structured generation varies by task. While stringent formats may hinder reasoning-intensive tasks, they can enhance accuracy in classification tasks by reducing the space of possible answers, leading to fewer parsing errors.

Experimental Findings

The experiments conducted in the study reveal several insights:

JSON-mode vs. FRI: The JSON-mode often performed worse than FRI in reasoning tasks, as it forced a rigid structure that limited the model's ability to reason through complex problems.
Looser Format Restrictions: The study suggests that using looser format restrictions can improve performance in reasoning tasks. By removing strict schema requirements, LLMs can maintain their inherent reasoning capabilities while still providing structured outputs.
Comparison Across Formats: The effectiveness of structured generation also varies by format. While JSON may be more consistent, XML and YAML formats have their strengths and weaknesses depending on the task and model used.

Example: Imagine you have a friend named Eliza who works 45 hours a week. She gets paid $10 an hour for the first 40 hours, but if she works overtime (which is any hour beyond 40), she earns a bit more—$12 an hour for those extra hours.

Now, if you want to figure out how much Eliza made this week, you could write it out in a straightforward way. Here’s how you might do that in a step-by-step format:

Step-by-Step Calculation

Regular Pay: For the first 40 hours, she earns:
40 hours×10 dollars hour=400 dollars40 hours×10 dollars hour=400 dollars
Overtime Pay: For the 5 hours of overtime, she earns:
5 hours×12 dollars hour=60 dollars5 hours×12 dollars hour=60 dollars
Total Earnings: Now, add both amounts to get her total earnings for the week:
400 dollars+60 dollars=460 dollars400 dollars+60 dollars=460 dollars

So, Eliza's total earnings for the week would be $460.

JSON Format Example

If you wanted to put this information into a structured format like JSON, which is often used in programming and data handling, it would look something like this:

{
  "step_by_step_reasoning": "1. Calculate the earnings for the first 40 hours at $10 per hour. 2. Calculate the earnings for the additional 5 hours at $12 per hour. 3. Add both amounts to find the total earnings for the week.",
  "answer": 460
}

Using a structured format like JSON can be super handy, especially if you want to share this data with a computer program or another system that needs to read it easily. It keeps everything organized and clear!

👨🏻‍💻 AI Startup

Midship a startup from Y Combinator's Summer 2024 batch, is on a mission to revolutionize the way enterprise professionals handle data.

Imagine this: you’re stuck spending hours sifting through complex documents, extracting data, and manually inputting it into spreadsheets. It’s tedious, time-consuming, and often leads to inefficiencies in decision-making. That’s where they step in.

The core idea behind it is simple yet powerful. Users can upload their source documents—be it loan applications, legal contracts, or financial reports—along with a target spreadsheet. Their AI then takes over, extracting the necessary data and mapping it directly into the spreadsheet template. In just minutes, you have a fully populated spreadsheet, ready for analysis. This not only saves time but also allows professionals to focus on what matters: making informed decisions based on accurate data.

Founded by Aahel Iyer, Max Maio, and Kieran Taylor, Midship is based in San Francisco and aims to streamline workflows across various industries. By automating the data entry process, this startup is not just improving efficiency; it’s empowering professionals to work smarter, not harder.

🧠 Article

“The right kind of stubborn” focuses on persistence and obstinacy may seem similar on the surface, as both involve not giving up. However, they are fundamentally different behaviors. Persistent people are driven by energy, imagination, resilience, good judgment, and a focus on a goal.

They are willing to change their ideas and methods as needed to achieve their objectives. Obstinate people, on the other hand, have a reflexive resistance to changing their ideas, even in the face of contrary evidence. They are more likely to be attached to their initial ideas, which are often the least informed by experience working on the problem.

While a small amount of obstinacy can be useful in preventing panic, being too far toward the obstinate end of the spectrum reduces the likelihood of successfully solving complex problems. The article delves into the nuances that distinguish these two mindsets and their implications for success.

📱 Social

Greg Isenberg is well known for sharing startup ideas in almost every industry. This time, he shared his thoughts on how to approach finding great ideas with a bulletproof method:

Until next week,
Nino.

Thanks for reading AI Paper Express! This post is public so feel free to share it.

Nino's Worklog

Discussion about this post

Ready for more?