Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThis update introduces a complete pipeline for a Timeseries QA Assistant focused on industrial motor monitoring and anomaly detection using CrateDB and OpenAI's GPT-3.5-turbo. It adds scripts for generating synthetic timeseries sensor data and fictional machine manuals, storing both in CrateDB. Two chatbot applications are included: one as a Streamlit web app and another as a command-line assistant, both capable of translating natural language queries into SQL, detecting anomalies, and providing manual-based guidance. Documentation and environment setup files are also provided to guide installation and usage. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ChatbotApp
participant OpenAI
participant CrateDB
User->>ChatbotApp: Enter natural language query
ChatbotApp->>CrateDB: Fetch table schema & manuals (if needed)
ChatbotApp->>OpenAI: Send prompt (schema, manuals, user query)
OpenAI-->>ChatbotApp: Return SQL query and/or analysis
ChatbotApp->>CrateDB: Execute SQL (if present)
CrateDB-->>ChatbotApp: Return query results
ChatbotApp->>User: Present results, analysis, and manual guidance
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (6)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
There is already a directory about chatbots, written in always preferred singular 1 topic/chatbot. Can you slot in yours into the same folder?
-- https://github.com/crate/cratedb-examples/tree/main/topic/chatbot
Footnotes
There was a problem hiding this comment.
Actionable comments posted: 8
🔭 Outside diff range comments (1)
topic/chatbots/table-augmented-generation/app_chat.py (1)
162-174:⚠️ Potential issueValidate and constrain LLM-generated SQL code.
Currently, the code executes any SQL block found in the LLM’s response if the user query references SQL—including potential destructive statements. Consider restricting queries to read-only (SELECT) commands, or parsing and validating the SQL to prevent malicious or unintended write operations.
Example approach:
- Strictly enforce that the SQL block begins with "SELECT".
- Parse or inspect the SQL statement to confirm it doesn’t contain "DROP", "DELETE", or other critical commands.
🧹 Nitpick comments (8)
topic/chatbots/table-augmented-generation/requirements.txt (1)
1-6: Consider pinning specific versions for better reproducibility.While you've correctly specified a minimum version for OpenAI, it's a good practice to pin all dependencies to specific versions to ensure consistent behavior across different environments.
-python-dotenv -openai>=1.0.0 -crate -pandas -ipywidgets +python-dotenv==1.0.0 +openai==1.9.0 +crate==0.31.3 +pandas==2.0.3 +ipywidgets==8.1.1Also, the AI summary mentions a Streamlit web app (
app_chat.py), butstreamlitis missing from your requirements.+streamlit==1.29.0topic/chatbots/table-augmented-generation/DataGenerator.py (1)
89-97: Add option to clean existing data before insertion.Currently, the script always appends new data to the existing table. Consider adding a parameter to optionally clean the table before inserting new data, which would be useful for testing and development.
# ----- MAIN ----- if __name__ == "__main__": + import argparse + parser = argparse.ArgumentParser(description='Generate and store timeseries data for motors.') + parser.add_argument('--clean', action='store_true', help='Clean existing data before insertion') + args = parser.parse_args() + + if args.clean: + try: + connection = client.connect( + f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", + username=CRATEDB_USER, + password=CRATEDB_PASSWORD + ) + cursor = connection.cursor() + cursor.execute(f"DELETE FROM {CRATEDB_SCHEMA}.motor_readings") + print(f"✅ Cleaned existing data from {CRATEDB_SCHEMA}.motor_readings") + connection.close() + except Exception as e: + print(f"❌ Error cleaning data: {e}") + df = generate_timeseries_data() print(df.head()) store_in_cratedb(df) print(f"Total generated readings: {len(df)}")topic/chatbots/table-augmented-generation/README.md (2)
49-50: Fix Markdown formatting issues.There are several Markdown formatting issues in the file:
- Hard tabs used in lists instead of spaces (lines 84-87, 131-134, 137-140)
- Inconsistent code block style (lines 137-140, 144-158)
- Multiple spaces after hash on heading on line 136
Here's an example fix for lines 84-87:
- • Operational limits - • Anomaly detection triggers - • Emergency protocols - • Maintenance schedules +- Operational limits +- Anomaly detection triggers +- Emergency protocols +- Maintenance schedulesApply similar fixes to the other bullet lists and ensure code blocks use consistent formatting with fenced style (```).
Also applies to: 84-87, 131-140
112-127: Consider providing more realistic/detailed example data output.The example output in the "Example output" section differs significantly from the actual data that would be generated by the
DataGenerator.pyscript (showing a much smaller number of readings than what would be generated). Consider updating this section to show more realistic output or clarifying that this is just a sample.-Data Overview: -- Total readings: 1000 -- Time range: 2025-04-07 11:29:35 to 2025-04-08 12:14:35 -- Machines: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] -- Vibration range: 0.76 to 1.11 -- Temperature range: 40.69°C to 50.27°C -- Rotations range: 1402 RPM to 1492 RPM -- Anomalies (vibration > 1.5): 0 +Data Overview: +- Total readings: 28800 (10 machines × 96 readings/day × 30 days) +- Time range: 2025-03-09 10:29:35 to 2025-04-08 10:14:35 +- Machines: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] +- Vibration range: ~0.5 to ~2.0 (with anomalies) +- Temperature range: ~40°C to ~70°C +- Rotations range: ~1400 RPM to ~1600 RPM +- Anomalies: varies based on random injectiontopic/chatbots/table-augmented-generation/Generate-Manuals.py (1)
107-109: Add command-line options for flexibility.Consider adding command-line arguments to provide more flexibility, such as the ability to regenerate only specific machine manuals or to force regeneration of all manuals.
# === Main Execution === if __name__ == "__main__": - store_manuals_in_cratedb() + import argparse + + parser = argparse.ArgumentParser(description='Generate and store fictional machine manuals.') + parser.add_argument('--force-all', action='store_true', + help='Force regeneration of manuals for all machines') + parser.add_argument('--machine-ids', type=int, nargs='+', + help='Generate manuals only for specific machine IDs') + args = parser.parse_args() + + if args.machine_ids: + print(f"Generating manuals for specific machines: {args.machine_ids}") + # Implementation for specific machines... + elif args.force_all: + print("Forcing regeneration of all manuals") + # Implementation for force all... + else: + store_manuals_in_cratedb()This would allow users to run commands like:
python Generate-Manuals.py --machine-ids 1 3 5to regenerate manuals for specific machinespython Generate-Manuals.py --force-allto force regeneration of all manualstopic/chatbots/table-augmented-generation/app_chat.py (2)
7-9: Remove unused imports.Lines 7 (
import datetime) and 9 (from dateutil import parser as date_parser) appear unused. Removing them clarifies dependencies and resolves static analysis alerts.Apply this diff to remove the unused imports:
-import datetime -from dateutil import parser as date_parser🧰 Tools
🪛 Ruff (0.8.2)
7-7:
datetimeimported but unusedRemove unused import:
datetime(F401)
9-9:
dateutil.parserimported but unusedRemove unused import:
dateutil.parser(F401)
28-48: Consider a shared connection strategy or connection pooling.Each function creates and closes a database connection, which is simple but may be less efficient in high-use scenarios. You might explore using a single shared connection or a connection pool for improved performance.
topic/chatbots/table-augmented-generation/tag-motor-chat.py (1)
210-371: Splitchat()into smaller functions for maintainability.The
chat()function spans many responsibilities—handling user input, extracting date filters, detecting manual context, injecting schema, executing queries, formatting results, etc. Extracting these into smaller functions would improve testability and readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
topic/chatbots/table-augmented-generation/DataGenerator.py(1 hunks)topic/chatbots/table-augmented-generation/Generate-Manuals.py(1 hunks)topic/chatbots/table-augmented-generation/README.md(1 hunks)topic/chatbots/table-augmented-generation/app_chat.py(1 hunks)topic/chatbots/table-augmented-generation/requirements.txt(1 hunks)topic/chatbots/table-augmented-generation/tag-motor-chat.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
topic/chatbots/table-augmented-generation/app_chat.py (1)
topic/chatbots/table-augmented-generation/tag-motor-chat.py (5)
fetch_table_schema(24-43)fetch_machine_manual(68-81)extract_anomaly_info(85-90)execute_sql(120-133)chat(210-367)
topic/chatbots/table-augmented-generation/tag-motor-chat.py (1)
topic/chatbots/table-augmented-generation/app_chat.py (4)
fetch_table_schema(28-47)fetch_machine_manual(50-61)extract_anomaly_info(64-69)execute_sql(84-99)
🪛 markdownlint-cli2 (0.17.2)
topic/chatbots/table-augmented-generation/README.md
84-84: Hard tabs
Column: 1
(MD010, no-hard-tabs)
84-84: Hard tabs
Column: 3
(MD010, no-hard-tabs)
85-85: Hard tabs
Column: 1
(MD010, no-hard-tabs)
85-85: Hard tabs
Column: 3
(MD010, no-hard-tabs)
86-86: Hard tabs
Column: 1
(MD010, no-hard-tabs)
86-86: Hard tabs
Column: 3
(MD010, no-hard-tabs)
87-87: Hard tabs
Column: 1
(MD010, no-hard-tabs)
87-87: Hard tabs
Column: 3
(MD010, no-hard-tabs)
131-131: Hard tabs
Column: 1
(MD010, no-hard-tabs)
131-131: Hard tabs
Column: 3
(MD010, no-hard-tabs)
132-132: Hard tabs
Column: 1
(MD010, no-hard-tabs)
132-132: Hard tabs
Column: 3
(MD010, no-hard-tabs)
133-133: Hard tabs
Column: 1
(MD010, no-hard-tabs)
133-133: Hard tabs
Column: 3
(MD010, no-hard-tabs)
134-134: Hard tabs
Column: 1
(MD010, no-hard-tabs)
134-134: Hard tabs
Column: 3
(MD010, no-hard-tabs)
136-136: Multiple spaces after hash on atx style heading
null
(MD019, no-multiple-space-atx)
137-137: Hard tabs
Column: 1
(MD010, no-hard-tabs)
137-137: Hard tabs
Column: 3
(MD010, no-hard-tabs)
137-137: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
138-138: Hard tabs
Column: 1
(MD010, no-hard-tabs)
138-138: Hard tabs
Column: 3
(MD010, no-hard-tabs)
139-139: Hard tabs
Column: 1
(MD010, no-hard-tabs)
139-139: Hard tabs
Column: 3
(MD010, no-hard-tabs)
140-140: Hard tabs
Column: 1
(MD010, no-hard-tabs)
140-140: Hard tabs
Column: 3
(MD010, no-hard-tabs)
144-144: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
🪛 Ruff (0.8.2)
topic/chatbots/table-augmented-generation/app_chat.py
7-7: datetime imported but unused
Remove unused import: datetime
(F401)
9-9: dateutil.parser imported but unused
Remove unused import: dateutil.parser
(F401)
| CRATEDB_HOST = os.getenv("CRATEDB_HOST") | ||
| CRATEDB_PORT = os.getenv("CRATEDB_PORT") | ||
| CRATEDB_USER = os.getenv("CRATEDB_USER") | ||
| CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD") | ||
| CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA") |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling for missing environment variables.
Your script loads environment variables but doesn't verify if they're actually set. This could lead to cryptic errors when trying to connect to CrateDB if the environment isn't properly configured.
CRATEDB_HOST = os.getenv("CRATEDB_HOST")
CRATEDB_PORT = os.getenv("CRATEDB_PORT")
CRATEDB_USER = os.getenv("CRATEDB_USER")
CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD")
CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA")
+
+# Validate required environment variables
+required_vars = ["CRATEDB_HOST", "CRATEDB_PORT"]
+missing_vars = [var for var in required_vars if not os.getenv(var)]
+if missing_vars:
+ raise EnvironmentError(f"Missing required environment variables: {', '.join(missing_vars)}")
+
+# Use default schema if not provided
+if not CRATEDB_SCHEMA:
+ CRATEDB_SCHEMA = "doc"
+ print(f"No schema specified, using default: {CRATEDB_SCHEMA}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CRATEDB_HOST = os.getenv("CRATEDB_HOST") | |
| CRATEDB_PORT = os.getenv("CRATEDB_PORT") | |
| CRATEDB_USER = os.getenv("CRATEDB_USER") | |
| CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD") | |
| CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA") | |
| CRATEDB_HOST = os.getenv("CRATEDB_HOST") | |
| CRATEDB_PORT = os.getenv("CRATEDB_PORT") | |
| CRATEDB_USER = os.getenv("CRATEDB_USER") | |
| CRATEDB_PASSWORD = os.getenv("CRATEDB_PASSWORD") | |
| CRATEDB_SCHEMA = os.getenv("CRATEDB_SCHEMA") | |
| # Validate required environment variables | |
| required_vars = ["CRATEDB_HOST", "CRATEDB_PORT"] | |
| missing_vars = [var for var in required_vars if not os.getenv(var)] | |
| if missing_vars: | |
| raise EnvironmentError(f"Missing required environment variables: {', '.join(missing_vars)}") | |
| # Use default schema if not provided | |
| if not CRATEDB_SCHEMA: | |
| CRATEDB_SCHEMA = "doc" | |
| print(f"No schema specified, using default: {CRATEDB_SCHEMA}") |
| # ----- STORE IN CRATEDB ----- | ||
| def store_in_cratedb(df): | ||
| connection = client.connect( | ||
| f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", | ||
| username=CRATEDB_USER, | ||
| password=CRATEDB_PASSWORD | ||
| ) | ||
| cursor = connection.cursor() | ||
|
|
||
| # Create table if not exists | ||
| cursor.execute(f""" | ||
| CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings ( | ||
| machine_id INTEGER, | ||
| timestamp TIMESTAMP, | ||
| vibration DOUBLE, | ||
| temperature DOUBLE, | ||
| rotations DOUBLE | ||
| ) | ||
| """) | ||
|
|
||
| # Insert data | ||
| data = [(int(row.machine_id), row.timestamp.to_pydatetime(), float(row.vibration), float(row.temperature), float(row.rotations)) for _, row in df.iterrows()] | ||
| cursor.executemany(f""" | ||
| INSERT INTO {CRATEDB_SCHEMA}.motor_readings | ||
| (machine_id, timestamp, vibration, temperature, rotations) | ||
| VALUES (?, ?, ?, ?, ?) | ||
| """, data) | ||
| connection.close() | ||
| print(f"✅ Stored {len(df)} rows in CrateDB.") |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling for database operations.
The database connection and operations lack proper error handling. If the connection fails or an error occurs during data insertion, the script will crash without a clear message.
# ----- STORE IN CRATEDB -----
def store_in_cratedb(df):
- connection = client.connect(
- f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
- username=CRATEDB_USER,
- password=CRATEDB_PASSWORD
- )
- cursor = connection.cursor()
+ try:
+ connection = client.connect(
+ f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
+ username=CRATEDB_USER,
+ password=CRATEDB_PASSWORD
+ )
+ cursor = connection.cursor()
- # Create table if not exists
- cursor.execute(f"""
- CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings (
- machine_id INTEGER,
- timestamp TIMESTAMP,
- vibration DOUBLE,
- temperature DOUBLE,
- rotations DOUBLE
- )
- """)
+ # Create table if not exists
+ cursor.execute(f"""
+ CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings (
+ machine_id INTEGER,
+ timestamp TIMESTAMP,
+ vibration DOUBLE,
+ temperature DOUBLE,
+ rotations DOUBLE
+ )
+ """)
- # Insert data
- data = [(int(row.machine_id), row.timestamp.to_pydatetime(), float(row.vibration), float(row.temperature), float(row.rotations)) for _, row in df.iterrows()]
- cursor.executemany(f"""
- INSERT INTO {CRATEDB_SCHEMA}.motor_readings
- (machine_id, timestamp, vibration, temperature, rotations)
- VALUES (?, ?, ?, ?, ?)
- """, data)
- connection.close()
- print(f"✅ Stored {len(df)} rows in CrateDB.")
+ # Insert data - convert to list of tuples for better performance than iterrows
+ data = df.apply(
+ lambda row: (
+ int(row.machine_id),
+ row.timestamp.to_pydatetime(),
+ float(row.vibration),
+ float(row.temperature),
+ float(row.rotations)
+ ),
+ axis=1
+ ).tolist()
+
+ cursor.executemany(f"""
+ INSERT INTO {CRATEDB_SCHEMA}.motor_readings
+ (machine_id, timestamp, vibration, temperature, rotations)
+ VALUES (?, ?, ?, ?, ?)
+ """, data)
+ print(f"✅ Stored {len(df)} rows in CrateDB.")
+ except Exception as e:
+ print(f"❌ Error storing data in CrateDB: {e}")
+ raise
+ finally:
+ if 'connection' in locals() and connection:
+ connection.close()Additionally, iterrows() can be inefficient for large DataFrames. The suggested solution uses apply() which is generally more performant.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # ----- STORE IN CRATEDB ----- | |
| def store_in_cratedb(df): | |
| connection = client.connect( | |
| f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", | |
| username=CRATEDB_USER, | |
| password=CRATEDB_PASSWORD | |
| ) | |
| cursor = connection.cursor() | |
| # Create table if not exists | |
| cursor.execute(f""" | |
| CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings ( | |
| machine_id INTEGER, | |
| timestamp TIMESTAMP, | |
| vibration DOUBLE, | |
| temperature DOUBLE, | |
| rotations DOUBLE | |
| ) | |
| """) | |
| # Insert data | |
| data = [(int(row.machine_id), row.timestamp.to_pydatetime(), float(row.vibration), float(row.temperature), float(row.rotations)) for _, row in df.iterrows()] | |
| cursor.executemany(f""" | |
| INSERT INTO {CRATEDB_SCHEMA}.motor_readings | |
| (machine_id, timestamp, vibration, temperature, rotations) | |
| VALUES (?, ?, ?, ?, ?) | |
| """, data) | |
| connection.close() | |
| print(f"✅ Stored {len(df)} rows in CrateDB.") | |
| # ----- STORE IN CRATEDB ----- | |
| def store_in_cratedb(df): | |
| try: | |
| connection = client.connect( | |
| f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", | |
| username=CRATEDB_USER, | |
| password=CRATEDB_PASSWORD | |
| ) | |
| cursor = connection.cursor() | |
| # Create table if not exists | |
| cursor.execute(f""" | |
| CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.motor_readings ( | |
| machine_id INTEGER, | |
| timestamp TIMESTAMP, | |
| vibration DOUBLE, | |
| temperature DOUBLE, | |
| rotations DOUBLE | |
| ) | |
| """) | |
| # Insert data - convert to list of tuples for better performance than iterrows | |
| data = df.apply( | |
| lambda row: ( | |
| int(row.machine_id), | |
| row.timestamp.to_pydatetime(), | |
| float(row.vibration), | |
| float(row.temperature), | |
| float(row.rotations) | |
| ), | |
| axis=1 | |
| ).tolist() | |
| cursor.executemany(f""" | |
| INSERT INTO {CRATEDB_SCHEMA}.motor_readings | |
| (machine_id, timestamp, vibration, temperature, rotations) | |
| VALUES (?, ?, ?, ?, ?) | |
| """, data) | |
| print(f"✅ Stored {len(df)} rows in CrateDB.") | |
| except Exception as e: | |
| print(f"❌ Error storing data in CrateDB: {e}") | |
| raise | |
| finally: | |
| if 'connection' in locals() and connection: | |
| connection.close() |
| | rag-motor-chat.py | | ||
| | - OpenAI Q&A | |
There was a problem hiding this comment.
Inconsistent component naming in architecture diagram.
The architecture diagram refers to a script named rag-motor-chat.py, but the documentation and example commands mention tag-motor-chat.py. This inconsistency could confuse users.
+--------------------------+
- | rag-motor-chat.py |
+ | tag-motor-chat.py |
| - OpenAI Q&A |
| - Manual-based Guidance |
+--------------------------+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | rag-motor-chat.py | | |
| | - OpenAI Q&A | | |
| +--------------------------+ | |
| | tag-motor-chat.py | | |
| | - OpenAI Q&A | | |
| | - Manual-based Guidance | | |
| +--------------------------+ |
| restart_policy: | ||
| condition: on-failure% | ||
| ``` | ||
|
|
||
| Run docker-compose: | ||
| ``` bash | ||
| docker-compose pull | ||
| docker-compose up -d | ||
| ``` |
There was a problem hiding this comment.
Fix syntax error in docker-compose restart policy.
There's a stray '%' character at the end of the 'condition: on-failure' line in your docker-compose example.
deploy:
replicas: 1
restart_policy:
- condition: on-failure%
+ condition: on-failure📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| restart_policy: | |
| condition: on-failure% | |
| ``` | |
| Run docker-compose: | |
| ``` bash | |
| docker-compose pull | |
| docker-compose up -d | |
| ``` | |
| deploy: | |
| replicas: 1 | |
| restart_policy: | |
| condition: on-failure |
| def store_manuals_in_cratedb(): | ||
| connection = client.connect( | ||
| f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", | ||
| username=CRATEDB_USER, | ||
| password=CRATEDB_PASSWORD | ||
| ) | ||
| cursor = connection.cursor() | ||
|
|
||
| # Create table if not exists | ||
| cursor.execute(f""" | ||
| CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals ( | ||
| machine_id INTEGER PRIMARY KEY, | ||
| manual TEXT | ||
| ) | ||
| """) | ||
|
|
||
| # Get unique machine IDs from motor_readings | ||
| cursor.execute(f""" | ||
| SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id | ||
| """) | ||
| machine_ids = [row[0] for row in cursor.fetchall()] | ||
|
|
||
| print(f"🔍 Found {len(machine_ids)} unique machine IDs.") | ||
|
|
||
| # Upsert manuals | ||
| for machine_id in machine_ids: | ||
| manual = generate_manual(machine_id) | ||
| cursor.execute(f""" | ||
| INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual) | ||
| VALUES (?, ?) | ||
| ON CONFLICT (machine_id) DO UPDATE SET manual = ? | ||
| """, (machine_id, manual, manual)) | ||
|
|
||
| connection.close() | ||
| print("✅ Fictional machine manuals stored (or updated) in CrateDB.") |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling for database operations.
Similar to the DataGenerator.py script, this function lacks proper error handling for database operations. If the connection fails or an error occurs during queries or insertions, the script will crash without a clear message. Also, there's no verification that the motor_readings table exists before querying it.
def store_manuals_in_cratedb():
- connection = client.connect(
- f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
- username=CRATEDB_USER,
- password=CRATEDB_PASSWORD
- )
- cursor = connection.cursor()
+ try:
+ connection = client.connect(
+ f"https://{CRATEDB_HOST}:{CRATEDB_PORT}",
+ username=CRATEDB_USER,
+ password=CRATEDB_PASSWORD
+ )
+ cursor = connection.cursor()
- # Create table if not exists
- cursor.execute(f"""
- CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals (
- machine_id INTEGER PRIMARY KEY,
- manual TEXT
- )
- """)
+ # Create table if not exists
+ cursor.execute(f"""
+ CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals (
+ machine_id INTEGER PRIMARY KEY,
+ manual TEXT
+ )
+ """)
- # Get unique machine IDs from motor_readings
- cursor.execute(f"""
- SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id
- """)
- machine_ids = [row[0] for row in cursor.fetchall()]
+ # Check if motor_readings table exists
+ cursor.execute(f"""
+ SELECT table_name FROM information_schema.tables
+ WHERE table_schema = ? AND table_name = 'motor_readings'
+ """, (CRATEDB_SCHEMA,))
+
+ if not cursor.fetchone():
+ print(f"❌ Table {CRATEDB_SCHEMA}.motor_readings does not exist. Please run DataGenerator.py first.")
+ return
+
+ # Get unique machine IDs from motor_readings
+ cursor.execute(f"""
+ SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id
+ """)
+ machine_ids = [row[0] for row in cursor.fetchall()]
- print(f"🔍 Found {len(machine_ids)} unique machine IDs.")
+ if not machine_ids:
+ print("⚠️ No machine IDs found in motor_readings table.")
+ return
+
+ print(f"🔍 Found {len(machine_ids)} unique machine IDs.")
- # Upsert manuals
- for machine_id in machine_ids:
- manual = generate_manual(machine_id)
- cursor.execute(f"""
- INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual)
- VALUES (?, ?)
- ON CONFLICT (machine_id) DO UPDATE SET manual = ?
- """, (machine_id, manual, manual))
+ # Upsert manuals
+ for machine_id in machine_ids:
+ manual = generate_manual(machine_id)
+ cursor.execute(f"""
+ INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual)
+ VALUES (?, ?)
+ ON CONFLICT (machine_id) DO UPDATE SET manual = ?
+ """, (machine_id, manual, manual))
- connection.close()
- print("✅ Fictional machine manuals stored (or updated) in CrateDB.")
+ print("✅ Fictional machine manuals stored (or updated) in CrateDB.")
+ except Exception as e:
+ print(f"❌ Error working with CrateDB: {e}")
+ raise
+ finally:
+ if 'connection' in locals() and connection:
+ connection.close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def store_manuals_in_cratedb(): | |
| connection = client.connect( | |
| f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", | |
| username=CRATEDB_USER, | |
| password=CRATEDB_PASSWORD | |
| ) | |
| cursor = connection.cursor() | |
| # Create table if not exists | |
| cursor.execute(f""" | |
| CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals ( | |
| machine_id INTEGER PRIMARY KEY, | |
| manual TEXT | |
| ) | |
| """) | |
| # Get unique machine IDs from motor_readings | |
| cursor.execute(f""" | |
| SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id | |
| """) | |
| machine_ids = [row[0] for row in cursor.fetchall()] | |
| print(f"🔍 Found {len(machine_ids)} unique machine IDs.") | |
| # Upsert manuals | |
| for machine_id in machine_ids: | |
| manual = generate_manual(machine_id) | |
| cursor.execute(f""" | |
| INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual) | |
| VALUES (?, ?) | |
| ON CONFLICT (machine_id) DO UPDATE SET manual = ? | |
| """, (machine_id, manual, manual)) | |
| connection.close() | |
| print("✅ Fictional machine manuals stored (or updated) in CrateDB.") | |
| def store_manuals_in_cratedb(): | |
| try: | |
| connection = client.connect( | |
| f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", | |
| username=CRATEDB_USER, | |
| password=CRATEDB_PASSWORD | |
| ) | |
| cursor = connection.cursor() | |
| # Create table if not exists | |
| cursor.execute(f""" | |
| CREATE TABLE IF NOT EXISTS {CRATEDB_SCHEMA}.machine_manuals ( | |
| machine_id INTEGER PRIMARY KEY, | |
| manual TEXT | |
| ) | |
| """) | |
| # Check if motor_readings table exists | |
| cursor.execute(f""" | |
| SELECT table_name FROM information_schema.tables | |
| WHERE table_schema = ? AND table_name = 'motor_readings' | |
| """, (CRATEDB_SCHEMA,)) | |
| if not cursor.fetchone(): | |
| print(f"❌ Table {CRATEDB_SCHEMA}.motor_readings does not exist. Please run DataGenerator.py first.") | |
| return | |
| # Get unique machine IDs from motor_readings | |
| cursor.execute(f""" | |
| SELECT DISTINCT machine_id FROM {CRATEDB_SCHEMA}.motor_readings ORDER BY machine_id | |
| """) | |
| machine_ids = [row[0] for row in cursor.fetchall()] | |
| if not machine_ids: | |
| print("⚠️ No machine IDs found in motor_readings table.") | |
| return | |
| print(f"🔍 Found {len(machine_ids)} unique machine IDs.") | |
| # Upsert manuals | |
| for machine_id in machine_ids: | |
| manual = generate_manual(machine_id) | |
| cursor.execute(f""" | |
| INSERT INTO {CRATEDB_SCHEMA}.machine_manuals (machine_id, manual) | |
| VALUES (?, ?) | |
| ON CONFLICT (machine_id) DO UPDATE SET manual = ? | |
| """, (machine_id, manual, manual)) | |
| print("✅ Fictional machine manuals stored (or updated) in CrateDB.") | |
| except Exception as e: | |
| print(f"❌ Error working with CrateDB: {e}") | |
| raise | |
| finally: | |
| if 'connection' in locals() and connection: | |
| connection.close() |
| for machine_id in affected_ids: | ||
| if manual_context_added: | ||
| continue | ||
| manual = fetch_machine_manual(machine_id) | ||
| if manual: | ||
| anomaly_text, emergency_text = extract_anomaly_info(manual) | ||
| print(f"\n📘 Manual Guidance for Machine {machine_id}:") | ||
| print(f"--- Anomaly Detection ---\n{anomaly_text}\n--- Emergency Protocol ---\n{emergency_text}") | ||
| else: | ||
| print(f"\n📘 No manual guidance found for Machine {machine_id}.") | ||
|
|
There was a problem hiding this comment.
Remove or revise unreachable code block.
The if manual_context_added: continue statement makes lines 337-344 and 350-356 unreachable, disabling further manual guidance logic for newly found anomalies. This appears unintended.
Apply this diff to remove or correct the continue statement:
--- a/topic/chatbots/table-augmented-generation/tag-motor-chat.py
+++ b/topic/chatbots/table-augmented-generation/tag-motor-chat.py
@@ -334,2 +334,2 @@
if manual_context_added:
- continue
+ pass # or remove this check if you always want to show manual guidance
- manual = fetch_machine_manual(machine_id)Also applies to: 346-356
| if "SELECT" in reply.upper(): | ||
| try: | ||
| sql_query = reply.strip().split("```")[1].replace("sql", "").strip() | ||
| sql_query = clean_llm_sql(sql_query) | ||
| sql_query = inject_schema(sql_query, CRATEDB_SCHEMA) | ||
|
|
||
| if start_epoch and end_epoch: | ||
| condition = f"timestamp >= {start_epoch} AND timestamp < {end_epoch}" | ||
| if "WHERE" in sql_query.upper(): | ||
| sql_query = re.sub(r"(WHERE\s+)", f"WHERE {condition} AND ", sql_query, flags=re.IGNORECASE) | ||
| else: | ||
| sql_query += f" WHERE {condition}" | ||
|
|
||
| print(f"\n Running SQL:\n{sql_query}") | ||
| result = execute_sql(sql_query) | ||
|
|
||
| if isinstance(result, list): | ||
| result = format_timestamps(result) | ||
| print(f"\n Formatted Result:\n{result}") | ||
| else: | ||
| print(f"\n Query Result:\n{result}") | ||
|
|
||
| # Add result to conversation | ||
| conversation.append({"role": "user", "content": f"Query Result: {result}"}) | ||
|
|
||
| # Check for anomaly timestamps per machine | ||
| if ( | ||
| isinstance(result, list) | ||
| and all(isinstance(row, list) and len(row) == 2 for row in result) | ||
| and all(isinstance(row[0], int) and isinstance(row[1], str) for row in result) | ||
| ): | ||
| print("\n📌 Summary: Most recent anomaly per machine") | ||
| for row in result: | ||
| print(f"- Machine {row[0]}: {row[1]}") | ||
|
|
||
| # Check for top vibration events | ||
| elif ( | ||
| isinstance(result, list) | ||
| and len(result) > 0 | ||
| and isinstance(result[0], list) | ||
| and len(result[0]) >= 3 | ||
| and "vibration" in sql_query.lower() | ||
| and any(isinstance(val, float) and val > ANOMALY_THRESHOLD for row in result for val in row if isinstance(val, (float, int))) | ||
| ): | ||
| print("\n📌 Summary: High vibration events") | ||
| for row in result[:5]: | ||
| try: | ||
| machine_id = row[0] | ||
| ts = row[1] | ||
| vib = row[2] | ||
| print(f"- Machine {machine_id}: {vib:.2f} at {ts}") | ||
| except IndexError: | ||
| print(f"- Result (incomplete): {row}") | ||
|
|
||
| # Show manual if applicable | ||
| affected_ids = list({row[0] for row in result if len(row) > 2 and isinstance(row[0], int) and row[2] > ANOMALY_THRESHOLD}) | ||
| for machine_id in affected_ids: | ||
| if manual_context_added: | ||
| continue | ||
| manual = fetch_machine_manual(machine_id) | ||
| if manual: | ||
| anomaly_text, emergency_text = extract_anomaly_info(manual) | ||
| print(f"\n📘 Manual Guidance for Machine {machine_id}:") | ||
| print(f"--- Anomaly Detection ---\n{anomaly_text}\n--- Emergency Protocol ---\n{emergency_text}") | ||
| else: | ||
| print(f"\n📘 No manual guidance found for Machine {machine_id}.") | ||
|
|
||
| # 📘 Add manual advice for affected machines | ||
| affected_ids = list({row[0] for row in result if row[2] > ANOMALY_THRESHOLD}) | ||
| for machine_id in affected_ids: | ||
| if manual_context_added: | ||
| continue | ||
| manual = fetch_machine_manual(machine_id) | ||
| if manual: | ||
| anomaly_text, emergency_text = extract_anomaly_info(manual) | ||
| print(f"\n📘 Manual Guidance for Machine {machine_id}:") | ||
| print(f"--- Anomaly Detection ---\n{anomaly_text}\n--- Emergency Protocol ---\n{emergency_text}") | ||
| else: | ||
| print(f"\n📘 No manual guidance found for Machine {machine_id}.") | ||
|
|
||
| # Default to LLM explanation | ||
| else: | ||
| explanation = client_ai.chat.completions.create( | ||
| model="gpt-3.5-turbo", | ||
| messages=conversation | ||
| ) | ||
| print(f"\n💡 Explanation:\n{explanation.choices[0].message.content}") | ||
|
|
There was a problem hiding this comment.
Restrict queries to prevent write or destructive statements.
This block executes any SQL returned by the model if the text contains "SELECT", but there’s still a risk of dangerous commands appended in the same query. You may want to parse the SQL or permit only explicit SELECT operations.
| # === Fetch Table Schema === | ||
| def fetch_table_schema(table_name): | ||
| connection = client.connect( | ||
| f"https://{CRATEDB_HOST}:{CRATEDB_PORT}", | ||
| username=CRATEDB_USER, | ||
| password=CRATEDB_PASSWORD | ||
| ) | ||
| cursor = connection.cursor() | ||
| query = f""" | ||
| SELECT column_name, data_type | ||
| FROM information_schema.columns | ||
| WHERE table_schema = '{CRATEDB_SCHEMA}' AND table_name = 'motor_readings' | ||
| ORDER BY ordinal_position | ||
| """ | ||
| cursor.execute(query) | ||
| columns = cursor.fetchall() | ||
| connection.close() | ||
| schema_text = f"Table: {table_name}\nColumns:\n" | ||
| for col_name, data_type in columns: | ||
| schema_text += f"- {col_name} ({data_type})\n" | ||
| return schema_text | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid duplicating data access utilities across modules.
Repeatedly defining fetch_table_schema, fetch_machine_manual, extract_anomaly_info, and execute_sql in both files can lead to inconsistent maintenance if one is updated differently. Consider centralizing these utilities in a shared module and reusing them.
Also applies to: 68-82, 85-91, 119-134
Summary of the changes / Why this is an improvement
Simple demo on how to use TAG on Time-Series data
Checklist