Comparing LLMs for a simple SQL task

For my weekend project, I decided to explore how various language models would perform on a straightforward SQL task that I recently encountered at work. The task involved writing a query to determine if a snapshot table contained any gaps, something that would typically take me just a couple of minutes to complete. Curious about the current state of LLM performance, I initially ran the task through GPT-3.5, but to my surprise, the generated query was incorrect. Even GPT-4 failed to produce an accurate solution. This piqued my interest, and I set out to investigate how different LLMs would handle this seemingly simple SQL challenge.

The Prompt

Create an SQL statement in PostgreSQL to identify any gaps (non-consecutive days) in the dates table, which has a date column. The statement should:

Return the date of any gaps.
Order the results in ascending order of date.

Setup

I whipped up this python script to pass the prompt through several different models including LLama, Opus, Gemma, Gemini, and GPT.

I used this PostgreSQL script loaded into DB Fiddle to create the dates table with 10 days:

CREATE TABLE dates (
  date DATE PRIMARY KEY
);

INSERT INTO dates (date)
SELECT GENERATE_SERIES(
  '2024-01-01'::DATE, 
  '2024-01-10'::DATE, 
  '1 day'::INTERVAL
) AS date;

DELETE FROM dates WHERE date in ('2024-01-03', '2024-01-06', '2024-01-07');

The DELETE statement will create gaps in the data and be our test case for the llm query.

Results

gemma-7b-it

SELECT d.date
FROM generate_series(
    (SELECT MIN(date) FROM dates),
    (SELECT MAX(date) FROM dates) -- ❗ whoops, missing a parameter
) AS d
LEFT JOIN dates ON d.date = dates.date
WHERE dates.date IS NULL
ORDER BY d.date;

⚠ This query has a bug, generate_series needs a third parameter '1 day'::INTERVAL to be valid, however, with the bugfix, this query works!

gemini-1.5-pro-latest

WITH DateSeries AS (
    SELECT date, LAG(date) OVER (ORDER BY date) AS previous_date
    FROM dates
),
Gaps AS (
    SELECT date
    FROM DateSeries
    WHERE date <> previous_date + INTERVAL '1 day'
    AND previous_date IS NOT NULL
)
SELECT date
FROM Gaps
ORDER BY date;

❌ This query runs successfully without any modification, however, it returns the wrong days: '2024-01-04', '2024-01-08' can you tell why?

claude-3-opus-20240229

SELECT date::date + 1 AS missing_date
FROM (
    SELECT date::date, lead(date::date) OVER (ORDER BY date) AS next_date
    FROM dates
) t
WHERE next_date - date::date > 1
ORDER BY missing_date;

❌ Again, this query runs, but only returns 2 of 3 of the correct dats: '2024-01-03', '2024-01-06'

llama3-70b-8192

WITH consecutive_days AS (
  SELECT date, LAG(date) OVER (ORDER BY date) AS prev_date
  FROM dates
)
SELECT date
FROM consecutive_days
WHERE date - prev_date > INTERVAL '1 day'; -- ❗ can't compare integer with interval

❌ This query returns the error: Query Error: error: operator does not exist: integer > interval and after replacing INTERVAL '1 day' with 1, the returns values are off: '2024-01-04', '2024-01-08'

gpt-4

SELECT 
    current_date + INTERVAL '1 day' as start_gap,
    next_date - INTERVAL '1 day' as end_gap
FROM 
(
    SELECT 
        date as current_date, 
        LEAD(date) OVER (ORDER BY date) as next_date
    FROM 
        dates
)  subquery
WHERE 
    next_date - current_date > INTERVAL '1 day' -- ❗ can't compare integer with interval 
ORDER BY 
    current_date ASC;

❌ Similarly, this query returns the following error: Query Error: error: operator does not exist: integer > interval. Even after fixing this error, this query is not following the prompt correctly to return a single column of dates, but rather two columns with the gap start/end dates. Interesting that Llama-3 and GPT-4 both generate the same mistake.

mixtral-8x7b-32768

WITH date_series AS (
  SELECT
    generate_series(
      (SELECT MIN(date) FROM dates),
      (SELECT MAX(date) FROM dates),
      '1 day'
    )::date as series_date
)
SELECT
  ds.series_date
FROM date_series ds
LEFT JOIN dates d ON ds.series_date = d.date
WHERE d.date IS NULL
ORDER BY ds.series_date;

✅ This model works perfectly, no syntax errors and it returns the 3 correct days!

Conclusion

Surprisingly, the smaller, faster models outperformed the large, expensive ones. Mixtral-8x7b was the clear winner, with Gemma-7b a close second. The other four models were simply inadequate. Next time I need SQL help from AI, I’ll consider all model sizes – bigger doesn’t always mean better!

Model	Syntax	Accuracy
Gemma-7b	❌	✅
Gemini-1.5	✅	❌
Claude3-Opus	✅	⚠️
Llama3-70b	❌	❌
GPT-4	❌	❌
🏆 Mixtral-8x7b	✅	✅