I had a teachable moment and when I exited chat I encountered an error within the teachable agent.
NOTE:
pyautogen 0.1.14
I've yet to make the 0.2 swtich due to system requirements
Another note: Fix at the bottom.
The lecture notes "Optimization for Machine Learning" by Elad Hazan discuss the concept of generalization in the context of machine learning optimization. Specifically, the notes address the relationship between regret minimization in online learning and generalization in statistical learning. This relationship is crucial for understanding how well a model trained on a given dataset can perform on unseen data.
Generalization refers to the ability of a machine learning model to perform well on new, previously unseen data, and it is a fundamental aspect of a model's performance. The notes likely cover how optimization techniques can influence the generalization capabilities of a model. For instance, the use of regularization is introduced as a technique to prevent overfitting to the training data, which in turn can improve the model's generalization to new data.
Regularization methods, such as L1 and L2 regularization, are commonly used to impose constraints on the model complexity, thereby encouraging simpler models that are less likely to overfit. The notes also mention adaptive regularization, including algorithms like AdaGrad, which adapt the learning rate during training to improve performance and potentially enhance generalization.
In summary, "Optimization for Machine Learning" by Elad Hazan discusses the importance of optimization techniques in achieving good generalization in machine learning models, with a focus on the role of regularization and the connection between online learning and statistical learning theory.
--------------------------------------------------------------------------------
Provide feedback to ResearcherAgent. Press enter to skip and use auto-reply, or type 'exit' to end the conversation: exit
REVIEWING CHAT FOR USER TEACHINGS TO REMEMBER
Traceback (most recent call last):
File "/home/codeninja/autogen/.venv/lib/python3.10/site-packages/openai/openai_object.py", line 59, in __getattr__
return self[k]
KeyError: 'lower'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/codeninja/autogen/src/pipelines/research.py", line 67, in <module>
researcher.agent.learn_from_user_feedback()
File "/home/codeninja/autogen/.venv/lib/python3.10/site-packages/autogen/agentchat/contrib/teachable_agent.py", line 143, in learn_from_user_feedback
self.consider_memo_storage(comment)
File "/home/codeninja/autogen/.venv/lib/python3.10/site-packages/autogen/agentchat/contrib/teachable_agent.py", line 159, in consider_memo_storage
if "none" not in advice.lower():
File "/home/codeninja/autogen/.venv/lib/python3.10/site-packages/openai/openai_object.py", line 61, in __getattr__
raise AttributeError(*err.args)
AttributeError: lower
(.venv) 412) codeninja[~/autogen](research-v2)$
Here is the code for the main agent class.
This agent handles using the executor to search a site and download some data, then it leverages the analyser with a custom prompt to generate structured notes for the content and remember what was said about the paper.
The goal is to create a research agent which holds a diverse set of knowledge about a subject matter from multiple sources.
# autogen agent responsible for creating structured notes about a subject or corpus of text.
# is able to read the text from a file or from a url. (fetch, fetch_file)
# is able to be provided the text directly by the user
# is able to iterate over the text in chunks using create notes about the text which get compiled into a single note.
# final note is stored in the /learnings chroma database.
from autogen.agentchat.contrib.teachable_agent import TeachableAgent
from .researchers.research_executor import ResearcherExecutor
from .researchers import arxiv_executor
class ResearcherAgent:
function_definitions = [
{
"name": "search",
"description": "search arxiv for a topic",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The query to search for.",
},
"max_results": {
"type": "integer",
"description": "The maximum number of results to return. Default 20, Max 300.",
},
"start_date": {
"type": "string",
"description": "The start date to search from: format YYYY-MM-DD.",
},
"end_date": {
"type": "string",
"description": "The end date to search to: format YYYY-MM-DD.",
},
},
"required": ["query"],
},
},
{
"name": "get",
"description": "get downloadable content from a URL. This could be a PDF, or a video, or an audio file, or text transcript.",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The url to download the content from.",
}
},
"required": ["url"],
},
},
]
def __init__(
self,
name="ResearcherAgent",
):
# use query_function_default to define the default query function and use it in the query_function_map if no query_function is provided.
self.research_executor_map = {
"arxiv": arxiv_executor.ArxivExecutor(),
# "youtube": youtube_executor.YoutubeExecutor(),
}
self.function_map = {"search": self.search, "get": self.get}
self.agent = TeachableAgent(
name=name,
llm_config={
"model": "gpt-4-1106-preview",
"temperature": 0,
"max_tokens": 2048,
"functions": self.function_definitions,
"timeout": 600,
},
system_message="""You are an expert researcher and you intereact with the User, Business, Product, and Development teams.
You have 3 main responsibilities:
* Search for information using executor agents and remember the information in the form of structured notes.
* Answer questions about the information you have stored in your structured notes.
* Analyze the information you have stored in your structured notes and make recommendations to the User, Business, Product, and Development teams.
When you need to preform a search for new, use the search() method.
When you are asked about a topic, you first search for any internal memos, and then if nothing is found you may issue a search() request for the researchers to perform.
When you are asked for recent or trending content, always perform a search() request.
You can interact with the user to perform tasks related to research and data analysis. Please ask clairifying questions if you get confused.""",
default_auto_reply="Researcher Agent: I am a researcher agent. I can interact with the research user proxy agent to perform tasks related to research and data analysis.",
teach_config={
"verbosity": 0,
"reset_db": False,
"path_to_db_dir": ".cache/research",
"recall_threshold": 1.5,
"timeout": 600,
},
)
def get_agent(self):
return self.agent
def search(self, **kwargs):
# results = self.research_executor.search(*args, **kwargs)
# iterate over results and get() them, then analyze() them, then consider_memo_storage.
# for each item in the research_executor_map call the search function and then get all results and then analyze them and then consider_memo_storage.
print("searching from researcher agent", kwargs)
results = []
print("self.research_executor_map", self.research_executor_map)
for name, executor in self.research_executor_map.items():
search_results = executor.search(**kwargs)
search_results = [
executor.get(result["download"]) for result in search_results
]
# print first 100 characters of the first result
print("downloaded_results", search_results)
analyzed_results = [self.analyze(result) for result in search_results]
stored_results = [
self.agent.consider_memo_storage(result) for result in analyzed_results
]
return analyzed_results
def get(self, url):
pass
def internal_memo_search(self, query):
pass
def analyze(self, data):
instructions = """You are an expert in the field of research and software development.
You are to create structured notes on this content. You must capture source and identifying information.
You must remember all relevant data that would be necessary to know during software planning sessions for futre features or product development.
This remember but is not limited to:
- the source of the information
- the date of the information
- the author of the information
- A short summary of the information
- Detailed notes about the information and how it relates to the work we are doing.
- Potential applications of the information
- Potential future research topics related to the information"""
analysis = self.agent.analyze(data, instructions)
print("analysis", analysis)
return analysis
from .research_executor import ResearcherExecutor
import feedparser
import requests
import os
import re
import autogen.retrieve_utils as retrieve_utils
The Arxiv executor that does searching and retrieval
class ArxivExecutor(ResearcherExecutor):
def __init__(self):
super().__init__()
def search(
self,
**kwargs,
):
query = kwargs.get("query")
max_results = kwargs.get("max_results", 20)
start_date = kwargs.get("start_date")
end_date = kwargs.get("end_date")
if not query:
print("No query was provided")
return
print("Searching arxiv for: ", query)
base_url = "http://export.arxiv.org/api/query?"
search_query = f"search_query=all:{query}"
if start_date and end_date:
search_query += f"+AND+submittedDate:[{start_date}+TO+{end_date}]"
start = 0
max_results = f"max_results={max_results}"
url = f"{base_url}{search_query}&start={start}&{max_results}"
response = requests.get(url)
feed = feedparser.parse(response.content)
print("feed", feed.entries[0])
papers = [
{
"source": "arxiv",
"title": entry.title,
"link": entry.link,
# search links to find the link with the title pdf from the links list
# links ={'links': [{'href': 'http://arxiv.org/abs/1909.03550v1', 'rel': 'alternate', 'type': 'text/html'}, {'title': 'pdf', 'href': 'http://arxiv.org/pdf/1909.03550v1', 'rel': 'related', 'type': 'application/pdf'}]}
"download": [
link["href"]
for link in entry.links
if link["type"] == "application/pdf"
][0],
"summary": entry.summary,
"date": entry.published,
"category": entry.arxiv_primary_category["term"]
if "arxiv_primary_category" in entry
else entry.tags[0]["term"],
}
for entry in feed.entries
]
print(papers)
return papers
def get(self, url: str) -> str:
"""
Download a pdf from a url and save it in a topic categorized folder.
:param url: The url to download the pdf from.
:return: The path to the downloaded pdf.
"""
# Sanitize the topic string to create a valid directory name
filename = url.split("/")[-1]
# Create the directory path for the topic
storage_dir = os.path.join(".cache", "research", "arxiv")
os.makedirs(storage_dir, exist_ok=True)
# Sanitize the filename string to create a valid filename make sure to include the .pdf extension
sanitized_filename = filename.strip().replace(" ", "_").replace("/", "")
# Create the full path for the pdf
pdf_path = os.path.join(storage_dir, sanitized_filename)
# Download and save the pdf
if os.path.exists(pdf_path):
result = self.read_pdf(pdf_path)
return result
else:
response = requests.get(url)
with open(pdf_path, "wb") as f:
f.write(response.content)
result = self.read_pdf(pdf_path)
return result
def read_pdf(self, filename: str) -> str:
# Read the PDF and generate structured notes
extracted_text = "No file was found at path: " + filename
if os.path.exists(filename):
extracted_text = retrieve_utils.extract_text_from_pdf(filename)
return extracted_text
My pipeline executer
# Import the autogen library, necessary agent modules, and other required libraries
import autogen
from autogen.agentchat import GroupChat, GroupChatManager
from src.agents.research.arxiv.arxiv_agent import ArxivAgent
from src.agents.managers.vanilla_boss import VanillaBoss
from src.agents.research.research.researcher_agent import ResearcherAgent
from src.agents.research.research.researchers.arxiv_executor import ArxivExecutor
researcher = ResearcherAgent(
name="ResearcherAgent",
)
researcherAgent = researcher.get_agent()
# Set up user proxy agents for interaction with the system
# (Assuming UserProxyAgent class exists and is imported correctly)
user_proxy_agent = userProxy = autogen.UserProxyAgent(
name="User",
human_input_mode="ALWAYS",
code_execution_config={"work_dir": "arxiv"},
function_map=researcher.function_map,
default_auto_reply="Search for the first machine learning article",
)
if __name__ == "__main__":
# Start chatting with the user proxy agent
researcherAgent.initiate_chat(
user_proxy_agent,
message="What can we help you with today?",
)
researcher.agent.learn_from_user_feedback()
FIX
The error comes from the type assumption that Analyxe() will return a string and it can in fact return any | str which causes the problem when we call response.lower(). This is because the implementation for analyze diverts if verbosity is set to 2 and the content of
def analyze(..)
...
if self.verbosity >=2
return str(self.last_message(self.analyzer)["content"])
else
return self.analyzer.analyze_text(text_to_analyze, analysis_instructions)
in this implementation content could be any | string
The solution is to cast the result returned from analyze as a string.
def analyze(self, text_to_analyze, analysis_instructions):
"""Asks TextAnalyzerAgent to analyze the given text according to specific instructions."""
if self.verbosity >= 2:
# Use the messaging mechanism so that the analyzer's messages are included in the printed chat.
self.analyzer.reset() # Clear the analyzer's list of messages.
self.send(
recipient=self.analyzer, message=text_to_analyze, request_reply=False
) # Put the message in the analyzer's list.
self.send(recipient=self.analyzer, message=analysis_instructions, request_reply=True) # Request the reply.
return str(self.last_message(self.analyzer)["content"])
else:
# Use the analyzer's method directly, to leave analyzer message out of the printed chat.
return str(self.analyzer.analyze_text(text_to_analyze, analysis_instructions))
I had a teachable moment and when I exited chat I encountered an error within the teachable agent.
NOTE:
pyautogen 0.1.14I've yet to make the 0.2 swtich due to system requirements
Another note: Fix at the bottom.
Here is the code for the main agent class.
This agent handles using the executor to search a site and download some data, then it leverages the analyser with a custom prompt to generate structured notes for the content and remember what was said about the paper.
The goal is to create a research agent which holds a diverse set of knowledge about a subject matter from multiple sources.
The Arxiv executor that does searching and retrieval
My pipeline executer
FIX
The error comes from the type assumption that Analyxe() will return a string and it can in fact return
any | strwhich causes the problem when we callresponse.lower(). This is because the implementation for analyze diverts if verbosity is set to 2 and the content ofin this implementation
contentcould beany | stringThe solution is to cast the result returned from analyze as a string.