Restaurant DeepResearch

This project is open-source! Check out the code, contribute, or use it for your own projects.

# project_inspiration.py

def project_inspiration():
    """Project Inspiration"""
    
    # The story behind Restaurant DeepResearch

The idea for this project came from a painfully familiar situation: trying to decide where to eat. I'd open Google Maps with high hopes, only to be buried in a sea of restaurants that all kind of looked the same. The filters? Not helpful. My cravings? Vague at best. Every choice felt like a gamble. Sometimes I'd text a friend for help, and while the conversation would often start with indecision, it usually turned into something more helpful. We'd share tips, recommend places, and even discover new spots through our back-and-forth. In the end, we almost always found somewhere that worked. That's when it hit me—what if I had a smart assistant who actually understood what I wanted, could think through the options, and maybe even talk it out with me like a foodie friend who never gets tired?

That led to a core question: how can we build an AI system that mimics how humans decide where to eat—through discussion, reasoning, and using external tools like Google Maps? Most existing AI planning systems rely on rigid workflows, yet real-life decision-making is often spontaneous and conversational. This project explores how a multi-agent system can simulate this natural, collaborative process.

To enable this, I leverage an MCP (Model Context Protocol) server to unify various tool-calling APIs and use the CAMEL-AI framework to build agents capable of reasoning, communicating, and acting in a shared environment. The goal is to replicate the casual yet effective way people choose restaurants: search, compare, ask, and refine.

# project_inspiration_sources.py

def inspiration_sources():
    """Key Ideas That Inspired This Project"""

This project is inspired by several ideas:

MCP Servers' Growing Popularity: Nowadays, MCP servers have become very popular, significantly streamlining the process for developers creating tool-calling functions for large language models (LLMs). For example, I use the Google Maps MCP server in this project, combining seven useful tools—search_nearby, get_place_details, maps_geocode, maps_reverse_geocode, maps_distance_matrix, maps_directions, and maps_elevation—into a single JSON file, which is remarkably efficient.
Mimicking Real-life Decision-making: When searching for places to eat, I typically use Google Maps and then discuss options with my friends, eventually reaching a satisfying decision. So why not allow AI agents to replicate this same process?
Encouraging Open-ended Exploration: I don't believe we necessarily need a rigid workflow for this type of task. Our successful decisions often result from discussions and interactions with external tools like Google. Creating an environment where agents can autonomously explore ideas is crucial. That's why I chose to use CAMEL-AI's framework instead of alternatives like LangChain or AutoGen.

Based on these ideas, my approach was to break down the task into manageable components or, more accurately, recreate my daily decision-making environment for AI agents.

# implementation_approach.py

def implementation_approach():
    """Step-by-step Approach"""

Firstly, my project is triggered by user prompts. From a product design perspective, it's unrealistic to expect users to write prompts in markdown or clearly specify every need. Therefore, we require a robust reasoning model to thoroughly understand user intentions and needs.

Secondly, beyond understanding users, models also need to understand each other. Thus, we applied prompt engineering techniques to ensure the output from the first reasoning model is structured clearly in markdown format. This step significantly enhances the task environment, setting the stage effectively for subsequent role-playing sessions.

Thirdly, the project's most crucial component is the role-playing session itself. Since this is my first significant project, I chose a straightforward implementation—directly using CAMEL's pre-defined OwlRolePlaying class. This class involves two roles: a user and an assistant. To conceptualize this, imagine chatting with a friend who happens to be an interactive toolbox. When you share ideas, it executes relevant tools, provides feedback, and reviews its findings—truly an impressive capability. Creating such an environment is indeed essential. I'll demonstrate some examples below. For the assistant role, I highly recommend using a powerful tool-calling model.

I also recommend that anyone using this framework take time to carefully read through the conversations between agents. Occasionally, you'll encounter a genuine "Aha!" moment—where the agents demonstrate surprisingly intuitive reasoning or creative insight. These are the sparks that show why this kind of environment matters, and why reinforcement learning could play a key role in making it even better. I'll continue updating this point as the project evolves.

Finally, the sessions are constrained to 10 rounds, ensuring the agents maintain high-quality dialogue within a limited context. The assistant's final answer is the output you receive, clearly explaining why a specific restaurant was chosen while offering engaging, meaningful interactions.

# evaluation_process.py

def evaluation_methodology():
    """Evaluation Methodology"""
    
    # Note: Due to API quota limitations, I was unable to complete all tests.
    # Will update with full test results once the API limits are reset.

To evaluate the performance of different tools, I tested the same task across four platforms: Google DeepResearch, Perplexity DeepResearch, OpenAI DeepResearch, and Manus (yes—I recently got access). While I don't have a formal benchmark, this comparison offers a practical view of each tool's behavior under the same conditions.

To ensure fairness, I used nearly identical prompts across all tools. Since Manus includes a human-in-the-loop design, I first submitted the original prompt to Manus, then incorporated the additional information it requested into a revised version. This slightly modified version was then used as the standard prompt for the other tools, including my own system. Due to current access limitations, I was only able to use Gemini 2.5 Pro Experimental, which unfortunately hit its quota during testing. I'll update my own results as soon as I'm able to rerun the full test.

Here's the prompt we used for evaluation:

We are a family of three from China visiting Tokyo for the very first time, and we are planning our trip for early October to enjoy the mild autumn weather and seasonal Japanese specialties. Our family includes two adults and one enthusiastic 6-year-old child.

We are seeking an extraordinary dinner experience that embodies the authentic essence of Japanese cuisine and culture. Specifically, we are very interested in a dining concept that offers either a sushi omakase or kaiseki course menu—or even a blend of both—highlighting the season's freshest ingredients and the culinary artistry behind each dish. An interactive element, such as an open kitchen or chef's table setting, would be a delightful bonus, allowing us and our child to observe and appreciate the meticulous preparation of the meals.

In terms of ambiance, we envision a venue that is both elegant and family-friendly, combining traditional Japanese decor (think tatami seating, ambient lighting, and subtle cultural details) with a modern and comfortable atmosphere suitable for a relaxed evening. Additionally, to make our dining experience more accessible and enjoyable, we prefer restaurants that offer menus or service in English or Chinese.

Our budget is approximately ¥8,000 to ¥15,000 per person, and we are looking for a location in a culturally rich, historic area of Tokyo—such as Asakusa or a similarly charming neighborhood—that offers a seamless blend of traditional heritage and modern convenience. We also value the ease of making reservations, so a restaurant that accepts online bookings or where assistance with direct reservations is available would be ideal.

Overall, we aim for a dinner experience that not only delights our taste buds with authentic Japanese flavors and seasonal specialties but also immerses us in the cultural traditions and warm hospitality of Japan, making our first visit to Tokyo truly unforgettable.

# challenges_and_improvements.py

def challenges_encountered():
    """Challenges Encountered During Development"""

Initially, upon choosing CAMEL's framework, I instinctively wanted to design explicit workflows (decomposing large tasks into smaller subtasks, assigning individual agents, and finally synthesizing the results). Being stuck on this approach delayed the project by 2-3 days.
Through A/B testing, I assessed whether foundational models significantly impacted results. Starting with Gemini 2.0 Flash and then upgrading to Gemini 2.5 Pro Experimental, I observed considerable improvement in both the quality of results and the role-playing interactions. Interestingly, Gemini 1.5 Pro frequently attempted to invoke external tools despite clear prompt instructions, whereas Gemini 2.5 Pro resolved this issue entirely. Overall, Gemini 2.5 Pro proved extremely effective.
I attempted adding a critic role to the session, but due to possible misconfiguration, this resulted in infinite loops.
As a beginner in programming, uploading my project to GitHub was challenging. Despite using Docker to streamline the process, I encountered persistent issues, particularly with errors stating my Google Maps API key was invalid.

def areas_for_improvement():
    """Areas for Future Improvement"""

While Google Maps MCP server is robust, integrating additional tools could enhance results further.
During role-playing sessions, agents occasionally expressed uncertainty regarding user requirements, proceeding based on assumptions. Thus, incorporating human-in-the-loop interactions—where agents actively seek additional details from humans—could greatly enhance research depth. During tests comparing my project with other deep research tools such as Manus, OpenAI, Google, and Perplexity, we found that both Manus and OpenAI have automatic human-in-the-loop processes. They can proactively ask the user for missing information instead of relying on the user to modify their plan. This ability is clearly valuable and highlights the importance of such a process.
Although I haven't precisely measured the cost per search, based on observation, it appears expensive. Improving efficiency remains crucial.
Addressing points 2 and 3, developing a customized Roleplay class is essential, and this work is underway.

# conclusions.py

def final_thoughts():
    """Final Thoughts"""

AI Agents will significantly reshape the AI market landscape. However, we shouldn't impose human-centric thinking when designing AI systems. While LLMs exhibit human-like qualities, they follow fundamentally different rules. Thus, it's important to provide environments tailored specifically to AI agents, allowing them to reach their full potential. Rigid workflows may no longer be optimal; instead, we should leverage flexible environments to foster positive outcomes. My next project will explore this philosophy further.

Also, with Google's recent release of A2A, we are witnessing even greater potential for a sophisticated Multi-Agent web ecosystem. Let's keep pushing the boundaries.

if __name__ == "__main__":
    project_inspiration()
    inspiration_sources()
    implementation_approach()
    evaluation_methodology()
    challenges_encountered()
    areas_for_improvement()
    final_thoughts()
    
    # Thank you for reading!

Restaurant DeepResearch: Creating a Multi-Agent System Inspired by Real-Life Decision-Making via Using MCP Server