tag-chat by wierdvanderhaar · Pull Request #902 · crate/cratedb-examples

wierdvanderhaar · 2025-04-16T11:48:29Z

Summary of the changes / Why this is an improvement

Simple demo on how to use TAG on Time-Series data

Checklist

Link to issue this PR refers to (if applicable): Fixes #???

coderabbitai · 2025-04-16T11:48:36Z

Caution

Review failed

The pull request is closed.

Walkthrough

This update introduces a complete pipeline for a Timeseries QA Assistant focused on industrial motor monitoring and anomaly detection using CrateDB and OpenAI's GPT-3.5-turbo. It adds scripts for generating synthetic timeseries sensor data and fictional machine manuals, storing both in CrateDB. Two chatbot applications are included: one as a Streamlit web app and another as a command-line assistant, both capable of translating natural language queries into SQL, detecting anomalies, and providing manual-based guidance. Documentation and environment setup files are also provided to guide installation and usage.

Changes

File(s)	Change Summary
`DataGenerator.py`, `Generate-Manuals.py`	Added scripts for generating synthetic timeseries data and fictional machine manuals, storing both in CrateDB via environment-configured connections. Functions for data generation, anomaly injection, and manual content creation are included.
`app_chat.py`, `tag-motor-chat.py`	Added two chatbot applications: a Streamlit web app and a command-line assistant. Both interface with CrateDB and OpenAI GPT-3.5-turbo, translate user queries to SQL, execute queries, detect anomalies, and augment responses with manual guidance. They include utilities for schema fetching, SQL cleaning, intent detection, and result formatting.
`README.md`	Added comprehensive documentation detailing the system architecture, setup instructions, example queries, and the flow from data generation through chat-based querying and manual guidance.
`requirements.txt`	Introduced a requirements file specifying dependencies: `python-dotenv`, `openai`, `crate`, `pandas`, and `ipywidgets` for consistent environment setup.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ChatbotApp
    participant OpenAI
    participant CrateDB

    User->>ChatbotApp: Enter natural language query
    ChatbotApp->>CrateDB: Fetch table schema & manuals (if needed)
    ChatbotApp->>OpenAI: Send prompt (schema, manuals, user query)
    OpenAI-->>ChatbotApp: Return SQL query and/or analysis
    ChatbotApp->>CrateDB: Execute SQL (if present)
    CrateDB-->>ChatbotApp: Return query results
    ChatbotApp->>User: Present results, analysis, and manual guidance

Poem

In a warren of wires, the motors spin round,
With data and manuals, new insights are found.
The bots now converse, with SQL in tow,
Detecting anomalies wherever they go.
Manuals at paw, and queries in ear,
This rabbit-built system brings answers near!
🐇💡

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)

We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d4b4bf and d3b48fe.

📒 Files selected for processing (6)

topic/chatbot/table-augmented-generation/DataGenerator.py (1 hunks)
topic/chatbot/table-augmented-generation/Generate-Manuals.py (1 hunks)
topic/chatbot/table-augmented-generation/README.md (1 hunks)
topic/chatbot/table-augmented-generation/app_chat.py (1 hunks)
topic/chatbot/table-augmented-generation/requirements.txt (1 hunks)
topic/chatbot/table-augmented-generation/tag-motor-chat.py (1 hunks)

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

amotl

Excellent. Thanks!

I've acknowledged your patch to get it merged early, and may accompany it by software tests later.
Feel free to follow the guidelines of CodeRabbit at your disposal.

amotl · 2025-04-16T11:51:43Z

topic/chatbots/table-augmented-generation/DataGenerator.py

There is already a directory about chatbots, written in always preferred singular ¹ topic/chatbot. Can you slot in yours into the same folder?

-- https://github.com/crate/cratedb-examples/tree/main/topic/chatbot

Footnotes

Stay with SINGULAR.

↩

coderabbitai

Actionable comments posted: 8

🔭 Outside diff range comments (1)

topic/chatbots/table-augmented-generation/app_chat.py (1)

162-174: ⚠️ Potential issue

Validate and constrain LLM-generated SQL code.

Currently, the code executes any SQL block found in the LLM’s response if the user query references SQL—including potential destructive statements. Consider restricting queries to read-only (SELECT) commands, or parsing and validating the SQL to prevent malicious or unintended write operations.

Example approach:

Strictly enforce that the SQL block begins with "SELECT".

Parse or inspect the SQL statement to confirm it doesn’t contain "DROP", "DELETE", or other critical commands.

🧹 Nitpick comments (8)

topic/chatbots/table-augmented-generation/requirements.txt (1)
1-6: Consider pinning specific versions for better reproducibility.

While you've correctly specified a minimum version for OpenAI, it's a good practice to pin all dependencies to specific versions to ensure consistent behavior across different environments.
-python-dotenv
-openai>=1.0.0
-crate
-pandas
-ipywidgets
+python-dotenv==1.0.0
+openai==1.9.0
+crate==0.31.3
+pandas==2.0.3
+ipywidgets==8.1.1
Also, the AI summary mentions a Streamlit web app (app_chat.py), but streamlit is missing from your requirements.
+streamlit==1.29.0
topic/chatbots/table-augmented-generation/DataGenerator.py (1)
89-97: Add option to clean existing data before insertion.

Currently, the script always appends new data to the existing table. Consider adding a parameter to optionally clean the table before inserting new data, which would be useful for testing and development.
# ----- MAIN -----
if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description='Generate and store timeseries data for motors.')
+    parser.add_argument('--clean', action='store_true', help='Clean existing data before insertion')
+    args = parser.parse_args()
+    
+    if args.clean:
+        try:
+            connection = client.connect(
+                f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
+                username=CRATEDB_USER,
+                password=CRATEDB_PASSWORD
+            )
+            cursor = connection.cursor()
+            cursor.execute(f"DELETE FROM {CRATEDB_SCHEMA}.motor_readings")
+            print(f"✅ Cleaned existing data from {CRATEDB_SCHEMA}.motor_readings")
+            connection.close()
+        except Exception as e:
+            print(f"❌ Error cleaning data: {e}")
+    
     df = generate_timeseries_data()
     print(df.head())

     store_in_cratedb(df)

     print(f"Total generated readings: {len(df)}")
topic/chatbots/table-augmented-generation/README.md (2)
49-50: Fix Markdown formatting issues.

There are several Markdown formatting issues in the file:

Hard tabs used in lists instead of spaces (lines 84-87, 131-134, 137-140)

Inconsistent code block style (lines 137-140, 144-158)

Multiple spaces after hash on heading on line 136

Here's an example fix for lines 84-87:
-	•	Operational limits
-	•	Anomaly detection triggers
-	•	Emergency protocols
-	•	Maintenance schedules
+- Operational limits
+- Anomaly detection triggers
+- Emergency protocols
+- Maintenance schedules
Apply similar fixes to the other bullet lists and ensure code blocks use consistent formatting with fenced style (```).

Also applies to: 84-87, 131-140

112-127: Consider providing more realistic/detailed example data output.

The example output in the "Example output" section differs significantly from the actual data that would be generated by the DataGenerator.py script (showing a much smaller number of readings than what would be generated). Consider updating this section to show more realistic output or clarifying that this is just a sample.
-Data Overview:
-- Total readings: 1000
-- Time range: 2025-04-07 11:29:35 to 2025-04-08 12:14:35
-- Machines: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
-- Vibration range: 0.76 to 1.11
-- Temperature range: 40.69°C to 50.27°C
-- Rotations range: 1402 RPM to 1492 RPM
-- Anomalies (vibration > 1.5): 0
+Data Overview:
+- Total readings: 28800 (10 machines × 96 readings/day × 30 days)
+- Time range: 2025-03-09 10:29:35 to 2025-04-08 10:14:35
+- Machines: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+- Vibration range: ~0.5 to ~2.0 (with anomalies)
+- Temperature range: ~40°C to ~70°C
+- Rotations range: ~1400 RPM to ~1600 RPM
+- Anomalies: varies based on random injection
topic/chatbots/table-augmented-generation/Generate-Manuals.py (1)
107-109: Add command-line options for flexibility.

Consider adding command-line arguments to provide more flexibility, such as the ability to regenerate only specific machine manuals or to force regeneration of all manuals.
# === Main Execution ===
if __name__ == "__main__":
-    store_manuals_in_cratedb()
+    import argparse
+    
+    parser = argparse.ArgumentParser(description='Generate and store fictional machine manuals.')
+    parser.add_argument('--force-all', action='store_true', 
+                        help='Force regeneration of manuals for all machines')
+    parser.add_argument('--machine-ids', type=int, nargs='+', 
+                        help='Generate manuals only for specific machine IDs')
+    args = parser.parse_args()
+    
+    if args.machine_ids:
+        print(f"Generating manuals for specific machines: {args.machine_ids}")
+        # Implementation for specific machines...
+    elif args.force_all:
+        print("Forcing regeneration of all manuals")
+        # Implementation for force all...
+    else:
+        store_manuals_in_cratedb()
This would allow users to run commands like:

python Generate-Manuals.py --machine-ids 1 3 5 to regenerate manuals for specific machines

python Generate-Manuals.py --force-all to force regeneration of all manuals
topic/chatbots/table-augmented-generation/app_chat.py (2)
7-9: Remove unused imports.

Lines 7 (import datetime) and 9 (from dateutil import parser as date_parser) appear unused. Removing them clarifies dependencies and resolves static analysis alerts.

Apply this diff to remove the unused imports:
-import datetime
-from dateutil import parser as date_parser
🧰 Tools

🪛 Ruff (0.8.2)

7-7: datetime imported but unused

Remove unused import: datetime

(F401)

9-9: dateutil.parser imported but unused

Remove unused import: dateutil.parser

(F401)

28-48: Consider a shared connection strategy or connection pooling.

Each function creates and closes a database connection, which is simple but may be less efficient in high-use scenarios. You might explore using a single shared connection or a connection pool for improved performance.
topic/chatbots/table-augmented-generation/tag-motor-chat.py (1)

210-371: Split chat() into smaller functions for maintainability.

The chat() function spans many responsibilities—handling user input, extracting date filters, detecting manual context, injecting schema, executing queries, formatting results, etc. Extracting these into smaller functions would improve testability and readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eeeebcf and 1d4b4bf.

📒 Files selected for processing (6)

topic/chatbots/table-augmented-generation/DataGenerator.py (1 hunks)
topic/chatbots/table-augmented-generation/Generate-Manuals.py (1 hunks)
topic/chatbots/table-augmented-generation/README.md (1 hunks)
topic/chatbots/table-augmented-generation/app_chat.py (1 hunks)
topic/chatbots/table-augmented-generation/requirements.txt (1 hunks)
topic/chatbots/table-augmented-generation/tag-motor-chat.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

topic/chatbots/table-augmented-generation/app_chat.py (1)

topic/chatbots/table-augmented-generation/tag-motor-chat.py (5)

fetch_table_schema (24-43)

fetch_machine_manual (68-81)

extract_anomaly_info (85-90)

execute_sql (120-133)

chat (210-367)

topic/chatbots/table-augmented-generation/tag-motor-chat.py (1)

topic/chatbots/table-augmented-generation/app_chat.py (4)

fetch_table_schema (28-47)

fetch_machine_manual (50-61)

extract_anomaly_info (64-69)

execute_sql (84-99)

🪛 markdownlint-cli2 (0.17.2)

topic/chatbots/table-augmented-generation/README.md

84-84: Hard tabs
Column: 1

(MD010, no-hard-tabs)

84-84: Hard tabs
Column: 3

(MD010, no-hard-tabs)

85-85: Hard tabs
Column: 1

(MD010, no-hard-tabs)

85-85: Hard tabs
Column: 3

(MD010, no-hard-tabs)

86-86: Hard tabs
Column: 1

(MD010, no-hard-tabs)

86-86: Hard tabs
Column: 3

(MD010, no-hard-tabs)

87-87: Hard tabs
Column: 1

(MD010, no-hard-tabs)

87-87: Hard tabs
Column: 3

(MD010, no-hard-tabs)

131-131: Hard tabs
Column: 1

(MD010, no-hard-tabs)

131-131: Hard tabs
Column: 3

(MD010, no-hard-tabs)

132-132: Hard tabs
Column: 1

(MD010, no-hard-tabs)

132-132: Hard tabs
Column: 3

(MD010, no-hard-tabs)

133-133: Hard tabs
Column: 1

(MD010, no-hard-tabs)

133-133: Hard tabs
Column: 3

(MD010, no-hard-tabs)

134-134: Hard tabs
Column: 1

(MD010, no-hard-tabs)

134-134: Hard tabs
Column: 3

(MD010, no-hard-tabs)

136-136: Multiple spaces after hash on atx style heading
null

(MD019, no-multiple-space-atx)

137-137: Hard tabs
Column: 1

(MD010, no-hard-tabs)

137-137: Hard tabs
Column: 3

(MD010, no-hard-tabs)

137-137: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

138-138: Hard tabs
Column: 1

(MD010, no-hard-tabs)

138-138: Hard tabs
Column: 3

(MD010, no-hard-tabs)

139-139: Hard tabs
Column: 1

(MD010, no-hard-tabs)

139-139: Hard tabs
Column: 3

(MD010, no-hard-tabs)

140-140: Hard tabs
Column: 1

(MD010, no-hard-tabs)

140-140: Hard tabs
Column: 3

(MD010, no-hard-tabs)

144-144: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🪛 Ruff (0.8.2)

topic/chatbots/table-augmented-generation/app_chat.py

7-7: datetime imported but unused

Remove unused import: datetime

(F401)

9-9: dateutil.parser imported but unused

Remove unused import: dateutil.parser

(F401)

coderabbitai · 2025-04-16T11:58:21Z

topic/chatbots/table-augmented-generation/DataGenerator.py

+CRATEDB_HOST = os.getenv("CRATEDB_HOST")
+CRATEDB_PORT = os.getenv("CRATEDB_PORT")
+CRATEDB_USER = os.getenv("CRATEDB_USER")
+CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD")
+CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA")


🛠️ Refactor suggestion

Add error handling for missing environment variables.

Your script loads environment variables but doesn't verify if they're actually set. This could lead to cryptic errors when trying to connect to CrateDB if the environment isn't properly configured.

CRATEDB_HOST = os.getenv("CRATEDB_HOST") CRATEDB_PORT = os.getenv("CRATEDB_PORT") CRATEDB_USER = os.getenv("CRATEDB_USER") CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD") CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA") + +# Validate required environment variables +required_vars = ["CRATEDB_HOST", "CRATEDB_PORT"] +missing_vars = [var for var in required_vars if not os.getenv(var)] +if missing_vars: + raise EnvironmentError(f"Missing required environment variables: {', '.join(missing_vars)}") + +# Use default schema if not provided +if not CRATEDB_SCHEMA: + CRATEDB_SCHEMA = "doc" + print(f"No schema specified, using default: {CRATEDB_SCHEMA}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

CRATEDB_HOST = os.getenv("CRATEDB_HOST")

CRATEDB_PORT = os.getenv("CRATEDB_PORT")

CRATEDB_USER = os.getenv("CRATEDB_USER")

CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD")

CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA")

CRATEDB_HOST = os.getenv("CRATEDB_HOST")

CRATEDB_PORT = os.getenv("CRATEDB_PORT")

CRATEDB_USER = os.getenv("CRATEDB_USER")

CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD")

CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA")

# Validate required environment variables

required_vars = ["CRATEDB_HOST", "CRATEDB_PORT"]

missing_vars = [var for var in required_vars if not os.getenv(var)]

if missing_vars:

raise EnvironmentError(f"Missing required environment variables: {', '.join(missing_vars)}")

# Use default schema if not provided

if not CRATEDB_SCHEMA:

CRATEDB_SCHEMA = "doc"

print(f"No schema specified, using default: {CRATEDB_SCHEMA}")

coderabbitai · 2025-04-16T11:58:21Z

topic/chatbots/table-augmented-generation/DataGenerator.py

+# ----- STORE IN CRATEDB -----
+def store_in_cratedb(df):
+    connection = client.connect(
+        f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
+        username=CRATEDB_USER,
+        password=CRATEDB_PASSWORD
+    )
+    cursor = connection.cursor()
+
+    # Create table if not exists
+    cursor.execute(f"""
+    CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings (
+        machine_id INTEGER,
+        timestamp TIMESTAMP,
+        vibration DOUBLE,
+        temperature DOUBLE,
+        rotations DOUBLE
+    )
+    """)
+
+    # Insert data
+    data = [(int(row.machine_id), row.timestamp.to_pydatetime(), float(row.vibration), float(row.temperature), float(row.rotations)) for _, row in df.iterrows()]
+    cursor.executemany(f"""
+        INSERT INTO {CRATEDB_SCHEMA}.motor_readings 
+        (machine_id, timestamp, vibration, temperature, rotations) 
+        VALUES (?, ?, ?, ?, ?)
+        """, data)
+    connection.close()
+    print(f"✅ Stored {len(df)} rows in CrateDB.")


🛠️ Refactor suggestion

Add error handling for database operations.

The database connection and operations lack proper error handling. If the connection fails or an error occurs during data insertion, the script will crash without a clear message.

# ----- STORE IN CRATEDB ----- def store_in_cratedb(df): - connection = client.connect( - f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", - username=CRATEDB_USER, - password=CRATEDB_PASSWORD - ) - cursor = connection.cursor() + try: + connection = client.connect( + f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", + username=CRATEDB_USER, + password=CRATEDB_PASSWORD + ) + cursor = connection.cursor() - # Create table if not exists - cursor.execute(f""" - CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings ( - machine_id INTEGER, - timestamp TIMESTAMP, - vibration DOUBLE, - temperature DOUBLE, - rotations DOUBLE - ) - """) + # Create table if not exists + cursor.execute(f""" + CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings ( + machine_id INTEGER, + timestamp TIMESTAMP, + vibration DOUBLE, + temperature DOUBLE, + rotations DOUBLE + ) + """) - # Insert data - data = [(int(row.machine_id), row.timestamp.to_pydatetime(), float(row.vibration), float(row.temperature), float(row.rotations)) for _, row in df.iterrows()] - cursor.executemany(f""" - INSERT INTO {CRATEDB_SCHEMA}.motor_readings - (machine_id, timestamp, vibration, temperature, rotations) - VALUES (?, ?, ?, ?, ?) - """, data) - connection.close() - print(f"✅ Stored {len(df)} rows in CrateDB.") + # Insert data - convert to list of tuples for better performance than iterrows + data = df.apply( + lambda row: ( + int(row.machine_id), + row.timestamp.to_pydatetime(), + float(row.vibration), + float(row.temperature), + float(row.rotations) + ), + axis=1 + ).tolist() + + cursor.executemany(f""" + INSERT INTO {CRATEDB_SCHEMA}.motor_readings + (machine_id, timestamp, vibration, temperature, rotations) + VALUES (?, ?, ?, ?, ?) + """, data) + print(f"✅ Stored {len(df)} rows in CrateDB.") + except Exception as e: + print(f"❌ Error storing data in CrateDB: {e}") + raise + finally: + if 'connection' in locals() and connection: + connection.close()

Additionally, iterrows() can be inefficient for large DataFrames. The suggested solution uses apply() which is generally more performant.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# ----- STORE IN CRATEDB -----

def store_in_cratedb(df):

connection = client.connect(

f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",

username=CRATEDB_USER,

password=CRATEDB_PASSWORD

)

cursor = connection.cursor()

# Create table if not exists

cursor.execute(f"""

CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings (

machine_id INTEGER,

timestamp TIMESTAMP,

vibration DOUBLE,

temperature DOUBLE,

rotations DOUBLE

)

""")

# Insert data

data = [(int(row.machine_id), row.timestamp.to_pydatetime(), float(row.vibration), float(row.temperature), float(row.rotations)) for _, row in df.iterrows()]

cursor.executemany(f"""

INSERT INTO {CRATEDB_SCHEMA}.motor_readings

(machine_id, timestamp, vibration, temperature, rotations)

VALUES (?, ?, ?, ?, ?)

""", data)

connection.close()

print(f"✅ Stored {len(df)} rows in CrateDB.")

# ----- STORE IN CRATEDB -----

def store_in_cratedb(df):

try:

connection = client.connect(

f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",

username=CRATEDB_USER,

password=CRATEDB_PASSWORD

)

cursor = connection.cursor()

# Create table if not exists

cursor.execute(f"""

CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings (

machine_id INTEGER,

timestamp TIMESTAMP,

vibration DOUBLE,

temperature DOUBLE,

rotations DOUBLE

)

""")

# Insert data - convert to list of tuples for better performance than iterrows

data = df.apply(

lambda row: (

int(row.machine_id),

row.timestamp.to_pydatetime(),

float(row.vibration),

float(row.temperature),

float(row.rotations)

),

axis=1

).tolist()

cursor.executemany(f"""

INSERT INTO {CRATEDB_SCHEMA}.motor_readings

(machine_id, timestamp, vibration, temperature, rotations)

VALUES (?, ?, ?, ?, ?)

""", data)

print(f"✅ Stored {len(df)} rows in CrateDB.")

except Exception as e:

print(f"❌ Error storing data in CrateDB: {e}")

raise

finally:

if 'connection' in locals() and connection:

connection.close()

coderabbitai · 2025-04-16T11:58:21Z

topic/chatbots/table-augmented-generation/README.md

+            | rag-motor-chat.py        |
+            | - OpenAI Q&A             |


⚠️ Potential issue

Inconsistent component naming in architecture diagram.

The architecture diagram refers to a script named rag-motor-chat.py, but the documentation and example commands mention tag-motor-chat.py. This inconsistency could confuse users.

+--------------------------+ - | rag-motor-chat.py | + | tag-motor-chat.py | | - OpenAI Q&A | | - Manual-based Guidance | +--------------------------+

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| rag-motor-chat.py |

| - OpenAI Q&A |

+--------------------------+

| tag-motor-chat.py |

| - OpenAI Q&A |

| - Manual-based Guidance |

+--------------------------+

coderabbitai · 2025-04-16T11:58:22Z

topic/chatbots/table-augmented-generation/README.md

+      restart_policy:
+        condition: on-failure%
+```
+
+Run docker-compose:
+``` bash
+docker-compose pull   
+docker-compose up -d
+```


⚠️ Potential issue

Fix syntax error in docker-compose restart policy.

There's a stray '%' character at the end of the 'condition: on-failure' line in your docker-compose example.

deploy: replicas: 1 restart_policy: - condition: on-failure% + condition: on-failure

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

restart_policy:

condition: on-failure%

```

Run docker-compose:

``` bash

docker-compose pull

docker-compose up -d

```

deploy:

replicas: 1

restart_policy:

condition: on-failure

coderabbitai · 2025-04-16T11:58:22Z

topic/chatbots/table-augmented-generation/Generate-Manuals.py

+def store_manuals_in_cratedb():
+    connection = client.connect(
+        f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
+        username=CRATEDB_USER,
+        password=CRATEDB_PASSWORD
+    )
+    cursor = connection.cursor()
+
+    # Create table if not exists
+    cursor.execute(f"""
+    CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals (
+        machine_id INTEGER PRIMARY KEY,
+        manual TEXT
+    )
+    """)
+
+    # Get unique machine IDs from motor_readings
+    cursor.execute(f"""
+    SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id
+    """)
+    machine_ids = [row[0] for row in cursor.fetchall()]
+
+    print(f"🔍 Found {len(machine_ids)} unique machine IDs.")
+
+    # Upsert manuals
+    for machine_id in machine_ids:
+        manual = generate_manual(machine_id)
+        cursor.execute(f"""
+        INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual)
+        VALUES (?, ?)
+        ON CONFLICT (machine_id) DO UPDATE SET manual = ?
+        """, (machine_id, manual, manual))
+
+    connection.close()
+    print("✅ Fictional machine manuals stored (or updated) in CrateDB.")


🛠️ Refactor suggestion

Add error handling for database operations.

Similar to the DataGenerator.py script, this function lacks proper error handling for database operations. If the connection fails or an error occurs during queries or insertions, the script will crash without a clear message. Also, there's no verification that the motor_readings table exists before querying it.

def store_manuals_in_cratedb(): - connection = client.connect( - f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", - username=CRATEDB_USER, - password=CRATEDB_PASSWORD - ) - cursor = connection.cursor() + try: + connection = client.connect( + f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", + username=CRATEDB_USER, + password=CRATEDB_PASSWORD + ) + cursor = connection.cursor() - # Create table if not exists - cursor.execute(f""" - CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals ( - machine_id INTEGER PRIMARY KEY, - manual TEXT - ) - """) + # Create table if not exists + cursor.execute(f""" + CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals ( + machine_id INTEGER PRIMARY KEY, + manual TEXT + ) + """) - # Get unique machine IDs from motor_readings - cursor.execute(f""" - SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id - """) - machine_ids = [row[0] for row in cursor.fetchall()] + # Check if motor_readings table exists + cursor.execute(f""" + SELECT table_name FROM information_schema.tables + WHERE table_schema = ? AND table_name = 'motor_readings' + """, (CRATEDB_SCHEMA,)) + + if not cursor.fetchone(): + print(f"❌ Table {CRATEDB_SCHEMA}.motor_readings does not exist. Please run DataGenerator.py first.") + return + + # Get unique machine IDs from motor_readings + cursor.execute(f""" + SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id + """) + machine_ids = [row[0] for row in cursor.fetchall()] - print(f"🔍 Found {len(machine_ids)} unique machine IDs.") + if not machine_ids: + print("⚠️ No machine IDs found in motor_readings table.") + return + + print(f"🔍 Found {len(machine_ids)} unique machine IDs.") - # Upsert manuals - for machine_id in machine_ids: - manual = generate_manual(machine_id) - cursor.execute(f""" - INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual) - VALUES (?, ?) - ON CONFLICT (machine_id) DO UPDATE SET manual = ? - """, (machine_id, manual, manual)) + # Upsert manuals + for machine_id in machine_ids: + manual = generate_manual(machine_id) + cursor.execute(f""" + INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual) + VALUES (?, ?) + ON CONFLICT (machine_id) DO UPDATE SET manual = ? + """, (machine_id, manual, manual)) - connection.close() - print("✅ Fictional machine manuals stored (or updated) in CrateDB.") + print("✅ Fictional machine manuals stored (or updated) in CrateDB.") + except Exception as e: + print(f"❌ Error working with CrateDB: {e}") + raise + finally: + if 'connection' in locals() and connection: + connection.close()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def store_manuals_in_cratedb():

connection = client.connect(

f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",

username=CRATEDB_USER,

password=CRATEDB_PASSWORD

)

cursor = connection.cursor()

# Create table if not exists

cursor.execute(f"""

CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals (

machine_id INTEGER PRIMARY KEY,

manual TEXT

)

""")

# Get unique machine IDs from motor_readings

cursor.execute(f"""

SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id

""")

machine_ids = [row[0] for row in cursor.fetchall()]

print(f"🔍 Found {len(machine_ids)} unique machine IDs.")

# Upsert manuals

for machine_id in machine_ids:

manual = generate_manual(machine_id)

cursor.execute(f"""

INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual)

VALUES (?, ?)

ON CONFLICT (machine_id) DO UPDATE SET manual = ?

""", (machine_id, manual, manual))

connection.close()

print("✅ Fictional machine manuals stored (or updated) in CrateDB.")

def store_manuals_in_cratedb():

try:

connection = client.connect(

f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",

username=CRATEDB_USER,

password=CRATEDB_PASSWORD

)

cursor = connection.cursor()

# Create table if not exists

cursor.execute(f"""

CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals (

machine_id INTEGER PRIMARY KEY,

manual TEXT

)

""")

# Check if motor_readings table exists

cursor.execute(f"""

SELECT table_name FROM information_schema.tables

WHERE table_schema = ? AND table_name = 'motor_readings'

""", (CRATEDB_SCHEMA,))

if not cursor.fetchone():

print(f"❌ Table {CRATEDB_SCHEMA}.motor_readings does not exist. Please run DataGenerator.py first.")

return

# Get unique machine IDs from motor_readings

cursor.execute(f"""

SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id

""")

machine_ids = [row[0] for row in cursor.fetchall()]

if not machine_ids:

print("⚠️ No machine IDs found in motor_readings table.")

return

print(f"🔍 Found {len(machine_ids)} unique machine IDs.")

# Upsert manuals

for machine_id in machine_ids:

manual = generate_manual(machine_id)

cursor.execute(f"""

INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual)

VALUES (?, ?)

ON CONFLICT (machine_id) DO UPDATE SET manual = ?

""", (machine_id, manual, manual))

print("✅ Fictional machine manuals stored (or updated) in CrateDB.")

except Exception as e:

print(f"❌ Error working with CrateDB: {e}")

raise

finally:

if 'connection' in locals() and connection:

connection.close()

coderabbitai · 2025-04-16T11:58:22Z

topic/chatbots/table-augmented-generation/tag-motor-chat.py

+                    for machine_id in affected_ids:
+                        if manual_context_added:
+                            continue
+                            manual = fetch_machine_manual(machine_id)
+                            if manual:
+                                anomaly_text, emergency_text = extract_anomaly_info(manual)
+                                print(f"\n📘 Manual Guidance for Machine {machine_id}:")
+                                print(f"--- Anomaly Detection ---\n{anomaly_text}\n--- Emergency Protocol ---\n{emergency_text}")
+                            else:
+                                print(f"\n📘 No manual guidance found for Machine {machine_id}.")
+


⚠️ Potential issue

Remove or revise unreachable code block.

The if manual_context_added: continue statement makes lines 337-344 and 350-356 unreachable, disabling further manual guidance logic for newly found anomalies. This appears unintended.

Apply this diff to remove or correct the continue statement:

--- a/topic/chatbots/table-augmented-generation/tag-motor-chat.py +++ b/topic/chatbots/table-augmented-generation/tag-motor-chat.py @@ -334,2 +334,2 @@ if manual_context_added: - continue + pass # or remove this check if you always want to show manual guidance - manual = fetch_machine_manual(machine_id)

Also applies to: 346-356

coderabbitai · 2025-04-16T11:58:22Z

topic/chatbots/table-augmented-generation/tag-motor-chat.py

+        if "SELECT" in reply.upper():
+            try:
+                sql_query = reply.strip().split("```")[1].replace("sql", "").strip()
+                sql_query = clean_llm_sql(sql_query)
+                sql_query = inject_schema(sql_query, CRATEDB_SCHEMA)
+
+                if start_epoch and end_epoch:
+                    condition = f"timestamp >= {start_epoch} AND timestamp < {end_epoch}"
+                    if "WHERE" in sql_query.upper():
+                        sql_query = re.sub(r"(WHERE\s+)", f"WHERE {condition} AND ", sql_query, flags=re.IGNORECASE)
+                    else:
+                        sql_query += f" WHERE {condition}"
+
+                print(f"\n Running SQL:\n{sql_query}")
+                result = execute_sql(sql_query)
+
+                if isinstance(result, list):
+                    result = format_timestamps(result)
+                    print(f"\n Formatted Result:\n{result}")
+                else:
+                    print(f"\n Query Result:\n{result}")
+
+                # Add result to conversation
+                conversation.append({"role": "user", "content": f"Query Result: {result}"})
+
+                # Check for anomaly timestamps per machine
+                if (
+                    isinstance(result, list)
+                    and all(isinstance(row, list) and len(row) == 2 for row in result)
+                    and all(isinstance(row[0], int) and isinstance(row[1], str) for row in result)
+                ):
+                    print("\n📌 Summary: Most recent anomaly per machine")
+                    for row in result:
+                        print(f"- Machine {row[0]}: {row[1]}")
+
+                # Check for top vibration events
+                elif (
+                    isinstance(result, list)
+                    and len(result) > 0
+                    and isinstance(result[0], list)
+                    and len(result[0]) >= 3
+                    and "vibration" in sql_query.lower()
+                    and any(isinstance(val, float) and val > ANOMALY_THRESHOLD for row in result for val in row if isinstance(val, (float, int)))
+                ):
+                    print("\n📌 Summary: High vibration events")
+                    for row in result[:5]:
+                        try:
+                            machine_id = row[0]
+                            ts = row[1]
+                            vib = row[2]
+                            print(f"- Machine {machine_id}: {vib:.2f} at {ts}")
+                        except IndexError:
+                            print(f"- Result (incomplete): {row}")
+
+                    # Show manual if applicable
+                    affected_ids = list({row[0] for row in result if len(row) > 2 and isinstance(row[0], int) and row[2] > ANOMALY_THRESHOLD})
+                    for machine_id in affected_ids:
+                        if manual_context_added:
+                            continue
+                            manual = fetch_machine_manual(machine_id)
+                            if manual:
+                                anomaly_text, emergency_text = extract_anomaly_info(manual)
+                                print(f"\n📘 Manual Guidance for Machine {machine_id}:")
+                                print(f"--- Anomaly Detection ---\n{anomaly_text}\n--- Emergency Protocol ---\n{emergency_text}")
+                            else:
+                                print(f"\n📘 No manual guidance found for Machine {machine_id}.")
+
+                    # 📘 Add manual advice for affected machines
+                    affected_ids = list({row[0] for row in result if row[2] > ANOMALY_THRESHOLD})
+                    for machine_id in affected_ids:
+                        if manual_context_added:
+                            continue
+                            manual = fetch_machine_manual(machine_id)
+                            if manual:
+                                anomaly_text, emergency_text = extract_anomaly_info(manual)
+                                print(f"\n📘 Manual Guidance for Machine {machine_id}:")
+                                print(f"--- Anomaly Detection ---\n{anomaly_text}\n--- Emergency Protocol ---\n{emergency_text}")
+                            else:
+                                print(f"\n📘 No manual guidance found for Machine {machine_id}.")
+
+                # Default to LLM explanation
+                else:
+                    explanation = client_ai.chat.completions.create(
+                        model="gpt-3.5-turbo",
+                        messages=conversation
+                    )
+                    print(f"\n💡 Explanation:\n{explanation.choices[0].message.content}")
+


⚠️ Potential issue

Restrict queries to prevent write or destructive statements.

This block executes any SQL returned by the model if the text contains "SELECT", but there’s still a risk of dangerous commands appended in the same query. You may want to parse the SQL or permit only explicit SELECT operations.

coderabbitai · 2025-04-16T11:58:22Z

topic/chatbots/table-augmented-generation/tag-motor-chat.py

+# === Fetch Table Schema ===
+def fetch_table_schema(table_name):
+    connection = client.connect(
+        f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
+        username=CRATEDB_USER,
+        password=CRATEDB_PASSWORD
+    )
+    cursor = connection.cursor()
+    query = f"""
+    SELECT column_name, data_type
+    FROM information_schema.columns
+    WHERE table_schema = '{CRATEDB_SCHEMA}' AND table_name = 'motor_readings'
+    ORDER BY ordinal_position
+    """
+    cursor.execute(query)
+    columns = cursor.fetchall()
+    connection.close()
+    schema_text = f"Table: {table_name}\nColumns:\n"
+    for col_name, data_type in columns:
+        schema_text += f"- {col_name} ({data_type})\n"
+    return schema_text
+


🛠️ Refactor suggestion

Avoid duplicating data access utilities across modules.

Repeatedly defining fetch_table_schema, fetch_machine_manual, extract_anomaly_info, and execute_sql in both files can lead to inconsistent maintenance if one is updated differently. Consider centralizing these utilities in a shared module and reusing them.

Also applies to: 68-82, 85-91, 119-134

tag-chat

1d4b4bf

amotl approved these changes Apr 16, 2025

View reviewed changes

Moved to exsisting dir

d3b48fe

wierdvanderhaar merged commit 2c487c2 into main Apr 16, 2025

wierdvanderhaar deleted the chatbot-tag branch April 16, 2025 11:55

coderabbitai bot reviewed Apr 16, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Jun 17, 2025

Chatbot assistant on AWS #989

Merged

Conversation

wierdvanderhaar commented Apr 16, 2025

Summary of the changes / Why this is an improvement

Checklist

Uh oh!

coderabbitai bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

amotl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotl Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Apr 16, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

amotl left a comment •

edited

Loading

amotl Apr 16, 2025 •

edited

Loading