20.05. RAG Chatbot Pt. 2 📚

📍 Download notebook and session files

In today’l lab, we will complete our RAG chatbot and use the data we have preprocessed the last time to inject our custom knowledge to the LLM.

Our plan for today:

Prerequisites

To start with the tutorial, complete the steps Prerequisites, Environment Setup, and Getting API Key from the LLM Inference Guide.

Today, we have more packages so we’ll use the requirements file to install the dependencies:

pip install -r requirements.txt

We will also reproduce the basic chatbot we implemented earlier as the base for the future RAG Chatbot. The only difference will be that we now need a simpler state that only keeps track of the message history (the other fields were demonstrational).

from langchain_core.messages import SystemMessage, HumanMessage
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.rate_limiters import InMemoryRateLimiter
# read system variables
import os
import dotenv

dotenv.load_dotenv()    # that loads the .env file variables into os.environ
True
# choose any model, catalogue is available under https://build.nvidia.com/models
MODEL_NAME = "meta/llama-3.3-70b-instruct"

# this rate limiter will ensure we do not exceed the rate limit
# of 40 RPM given by NVIDIA
rate_limiter = InMemoryRateLimiter(
    requests_per_second=30 / 60,  # 30 requests per minute to be sure
    check_every_n_seconds=0.1,  # wake up every 100 ms to check whether allowed to make a request,
    max_bucket_size=4,  # controls the maximum burst size
)

llm = ChatNVIDIA(
    model=MODEL_NAME,
    api_key=os.getenv("NVIDIA_API_KEY"), 
    temperature=0,   # ensure reproducibility,
    rate_limiter=rate_limiter  # bind the rate limiter
)
from typing import Annotated, List
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_core.runnables.graph import MermaidDrawMethod
import nest_asyncio
nest_asyncio.apply()  # this is needed to draw the PNG in Jupyter
class SimpleState(TypedDict):
    # `messages` is a list of messages of any kind. The `add_messages` function
    # in the annotation defines how this state key should be updated
    # (in this case, it appends messages to the list, rather than overwriting them)
    messages: Annotated[List[BaseMessage], add_messages]
class Chatbot:

    _graph_path = "./graph.png"
    
    def __init__(self, llm):
        self.llm = llm
        self._build()
        self._display_graph()

    def _build(self):
        # graph builder
        self._graph_builder = StateGraph(SimpleState)
        # add the nodes
        self._graph_builder.add_node("input", self._input_node)
        self._graph_builder.add_node("respond", self._respond_node)
        # define edges
        self._graph_builder.add_edge(START, "input")
        self._graph_builder.add_conditional_edges("input", self._is_quitting_node, {False: "respond", True: END})
        self._graph_builder.add_edge("respond", "input")
        # compile the graph
        self._compile()

    def _compile(self):
        self.chatbot = self._graph_builder.compile()

    def _input_node(self, state: SimpleState) -> dict:
        user_query = input("Your message: ")
        human_message = HumanMessage(content=user_query)
        # add the input to the messages
        return {
            "messages": human_message   # this will append the input to the messages
        }
    
    def _respond_node(self, state: SimpleState) -> dict:
        messages = state["messages"]    # will already contain the user query
        response = self.llm.invoke(messages)
        # add the response to the messages
        return {
            "messages": response   # this will append the response to the messages
        }
    
    def _is_quitting_node(self, state: SimpleState) -> dict:
        # check if the user wants to quit
        user_message = state["messages"][-1].content
        return user_message.lower() == "quit"
    
    def _display_graph(self):
        # unstable
        try:
            self.chatbot.get_graph().draw_mermaid_png(
                draw_method=MermaidDrawMethod.PYPPETEER,
                output_file_path=self._graph_path
            )
        except Exception as e:
            pass

    # add the run method
    def run(self):
        input = {
            "messages": [
                SystemMessage(
                    content="You are a helpful and honest assistant." # role
                )
            ]
        }
        for event in self.chatbot.stream(input, stream_mode="values"):   #stream_mode="updates"):
            for key, value in event.items():
                print(f"{key}:\t{value}")
            print("\n")

1. Recap: Data Preprocessing 📕

We will now go over data preprocessing to rehearse the workflow and also to recreate the data collection (we used an in-memory index, so the index we created the last time was deleted after we interrupted the notebook kernel).

Data preprocessing includes:

  1. Loading: load the source (document, website etc.) as a text.

  2. Chunking: chunk the loaded text onto smaller pieces.

  3. Converting to embeddings: embed the chunks into dense vector for further similarity search.

  4. Indexing: put the embeddings into a so-called index – a special database for efficient storage and search of vectors.

Loading

We will take a PDF version of the Topic Overview for this course. No LLM can know the contents of it, especially some highly specific facts such as dates or key points.

One of ways to load a PDF is to use PyPDFLoader that load simple textual PDFs and their metadata. In this tutorial, we focus on a simpler variant when there are no multimodal data in the PDF. You can find out more about advanced loading in tutorial How to load PDFs from LangChain.

from langchain_community.document_loaders import PyPDFLoader
file_path = "./topic_overview.pdf"
loader = PyPDFLoader(file_path)
pages = []
async for page in loader.alazy_load():
    pages.append(page)
Ignoring wrong pointing object 10 0 (offset 0)
Ignoring wrong pointing object 31 0 (offset 0)

This function returns a list of Document objects, each containing the text of the PDF and its metadata such as title, page, creation date etc.

pages
[Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 0, 'page_label': '1'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 1 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nTo p i c s  O v e r v i e wThe schedule is preliminary and subject to changes!\nThe reading for each lecture is given as references to the sources the respective lectures base on. Youare not obliged to read anything. However, you are strongly encouraged to read references marked bypin emojis \n: those are comprehensive overviews on the topics or important works that are beneficialfor a better understanding of the key concepts. For the pinned papers, I also specify the pages span foryou to focus on the most important fragments. Some of the sources are also marked with a popcornemoji \n: that is misc material you might want to take a look at: blog posts, GitHub repos, leaderboardsetc. (also a couple of LLM-based games). For each of the sources, I also leave my subjectiveestimation of how important this work is for this specific topic: from yellow \n ‘partially useful’ thoughorange \n ‘useful’ to red \n ‘crucial findings / thoughts’.  T h e s e  e s t i m a t i o n s  w i l l  b e  c o n t i n u o u s l yupdated as I revise the materials.\nFor the labs, you are provided with practical tutorials that respective lab tasks will mostly derive from.The core tutorials are marked with a writing emoji \n; you are asked to inspect them in advance(better yet: try them out). On lab sessions, we will only briefly recap them so it is up to you to preparein advance to keep up with the lab.\nDisclaimer: the reading entries are no proper citations; the bibtex references as well as detailed infosabout the authors, publish date etc. can be found under the entry links.\nBlock 1: IntroWeek 122.04. Lecture: LLMs as a Form of Intelligence vs LLMs as Statistical MachinesThat is an introductory lecture, in which I will briefly introduce the course and we’ll have a warming updiscussion about different perspectives on LLMs’ nature. We will focus on two prominent outlooks: LLMis a form of intelligence and LLM is a complex statistical machine. We’ll discuss differences of LLMswith human intelligence and the degree to which LLMs exhibit (self-)awareness.\nKey points:\nCourse introduction\nDifferent perspectives on the nature of LLMs\nSimilarities and differences between human and artificial intelligence\nLLMs’ (self-)awareness\nCore Reading:\n The Debate Over Understanding in AI’s Large Language Models (pages 1-7), Santa Fe\nInstitute \nMeaning without reference in large language models, UC Berkeley & DeepMind \nDissociating language and thought in large language models (intro [right after the abstract, seemore on the sectioning in this paper at the bottom of page 2], sections 1, 2.3 [LLMs are predictive…], 3-5), The University of Texas at Austin et al. \nAdditional Reading:\nLLM-basedAssistants\nINFOS AND STUFF\nBLOCK 1: INTRO\nBLOCK 2: CORE TOPICS | PART 1:BUSINESS APPLICATIONS\nBLOCK 2: CORE TOPICS | PART 2:APPLICATIONS IN SCIENCE\nBLOCK 3: WRAP-UP\nTopics Overview\nDebates\nPitches\nLLM Inference Guide\n22.04. LLMs as a Form ofIntelligence vs LLMs asStatistical Machines\n24.04. LLM & Agent Basics\n29.04. Intro to LangChain \n!\n"\n06.05. Virtual Assistants Pt. 1:Chatbots\n08.05. Basic LLM-basedChatbot \n#\nUnder development\nUnder development\nSearch'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 1, 'page_label': '2'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 2 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nDo Large Language Models Understand Us?, Google Research \nSparks of Artificial General Intelligence: Early experiments with GPT-4 (chapters 1-8 & 10),Microsoft Research \nOn the Dangers of Stochastic Parrots: Can Language Models Be Too Big? \n  (paragraphs 1, 5, 6.1),University of Washington et al. \nLarge Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective onUnderstanding, Leiden Institute of Advanced Computer Science & Leiden University Medical\nCentre \n24.04. Lecture: LLM & Agent BasicsIn this lecture, we’ll recap some basics about LLMs and LLM-based agents to make sure we’re on thesame page.\nKey points:\nLLM recap\nPrompting\nStructured output\nTool calling\nPiping & Planning\nCore Reading:\nA Survey of Large Language Models, (sections 1, 2.1, 4.1, 4.2.1, 4.2.3-4.2.4, 4.3, 5.1.1-5.1.3, 5.2.1-5.2.4, 5.3.1, 6) Renmin University of China et al. \nEmergent Abilities of Large Language Models, Google Research, Stanford, UNC Chapel Hill,\nDeepMind\n“We Need Structured Output”: Towards User-centered Constraints on Large Language ModelOutput, Google Research & Google\n Agent Instructs Large Language Models to be General Zero-Shot Reasoners (pages 1-9),Washington University & UC Berkeley\nAdditional Reading:\nLanguage Models are Few-Shot Learners, OpenAI\nChain-of-Thought Prompting Elicits Reasoning in Large Language Models, Google Research\nThe Llama 3 Herd of Models, Meta AI\nIntroducing Structured Outputs in the API, OpenAI\nTool Learning with Large Language Models: A Survey, Renmin University of China et al.\nToolACE: Winning the Points of LLM Function Calling, Huawei Noah’s Ark Lab et al.\nToolformer: Language Models Can Teach Themselves to Use Tools, Meta AI\nGranite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning ofGranular Tasks, IBM Research\n Berkeley Function-Calling Leaderboard, UC Berkeley (leaderboard)\nA Survey on Multimodal Large Language Models, University of Science and Technology of China\n& Tencent YouTu Lab\nWeek 229.04. Lab: Intro to LangChainThe final introductory session will guide you through the most basic concepts of LangChain for thefurther practical sessions.'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 2, 'page_label': '3'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 3 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nReading:\nRunnable interface, LangChain\nLangChain Expression Language (LCEL), LangChain\nMessages, LangChain\nChat models, LangChain\nStructured outputs, LangChain\nTools, LangChain\nTool calling, LangChain\n01.05.Ausfalltermin\nBlock 2: Core T opics\nPart 1: Business ApplicationsWeek 306.05. Lecture: Virtual Assistants Pt. 1: ChatbotsThe first core topic concerns chatbots. We’ll discuss how chatbots are built, how they (should) handleharmful requests and you can tune it for your use case.\nKey points:\nLLMs alignment\nMemory\nPrompting & automated prompt generation\nEvaluation\nCore Reading:\n Aligning Large Language Models with Human: A Survey (pages 1-14), Huawei Noah’s Ark Lab\nSelf-Instruct: Aligning Language Models with Self-Generated Instructions, University of\nWashington et al.\nA Systematic Survey of Prompt Engineering in Large Language Models: Techniques andApplications, Indian Institute of Technology Patna, Stanford & Amazon AI\nAdditional Reading:\nTraining language models to follow instructions with human feedback, OpenAI\nTraining a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback,Anthropic\nA Survey on the Memory Mechanism of Large Language Model based Agents, Renmin University\nof China & Huawei Noah’s Ark Lab\nAugmenting Language Models with Long-Term Memory, UC Santa Barbara & Microsoft Research\nFrom LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of LargeLanguage Models, Beike Inc.\nAutomatic Prompt Selection for Large Language Models, Cinnamon AI, Hung Yen University of\nTechnology and Education & Deakin University\nPromptGen: Automatically Generate Prompts using Generative Models, Baidu Research\nEvaluating Large Language Models. A Comprehensive Survey, Tianjin University'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 3, 'page_label': '4'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 4 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\n08.05. Lab: Basic LLM-based Chatbot\nOn material of session 06.05\nIn this lab, we’ll build a chatbot and try different prompts and settings to see how it affects the output.\nReading:\n Build a Chatbot, LangChain\n LangGraph Quickstart: Build a Basic Chatbot (parts 1, 3), LangGraph\n How to add summary of the conversation history, LangGraph\nPrompt Templates, LangChain\nFew-shot prompting, LangChain\nWeek 413.05. Lecture: Virtual Assistants Pt. 2: RAGContinuing the first part, the second part will expand scope of chatbot functionality and will teach it torefer to custom knowledge base to retrieve and use user-specific information. Finally, the most widelyused deployment methods will be briefly introduced.\nKey points:\nGeneral knowledge vs context\nKnowledge indexing, retrieval & ranking\nRetrieval tools\nAgentic RAG\nCore Reading:\n Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and HybridApproach (pages 1-7), Google DeepMind & University of Michigan \nA Survey on Retrieval-Augmented Text Generation for Large Language Models (sections 1-7), York\nUniversity \nAdditional Reading:\nDon’t Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, National\nChengchi University & Academia Sinica \nSelf-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, University of\nWashington, Allen Institute for AI & IBM Research AI\nAdaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through QuestionComplexity, Korea Advanced Institute of Science and Technology\nAuto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models, Chinese\nAcademy of Sciences\nQuerying Databases with Function Calling, Weaviate, Contextual AI & Morningstar\n15.05. Lab: RAG Chatbot\nOn material of session 13.05\nIn this lab, we’ll expand the functionality of the chatbot built at the last lab to connect it to user-specificinformation.'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 4, 'page_label': '5'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 5 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nReading:\nHow to load PDFs, LangChain\nText splitters, LangChain\nEmbedding models, LangChain\nVector stores, LangChain\nRetrievers, LangChain\n Retrieval augmented generation (RAG), LangChain\n LangGraph Quickstart: Build a Basic Chatbot (part 2), LangGraph\n Agentic RAG, LangGraph\nAdaptive RAG, LangGraph\nMultimodality, LangChain\nWeek 520.05. Lecture: Virtual Assistants Pt. 3: Multi-agent EnvironmentThis lectures concludes the Virtual Assistants cycle and directs its attention to automating everyday /business operations in a multi-agent environment. We’ll look at how agents communicate with eachother, how their communication can be guided (both with and without involvement of a human), andthis all is used in real applications.\nKey points:\nMulti-agent environment\nHuman in the loop\nLLMs as evaluators\nExamples of pipelines for business operations\nCore Reading:\n LLM-based Multi-Agent Systems: Techniques and Business Perspectives (pages 1-8), Shanghai\nJiao Tong University & OPPO Research Institute\nGenerative Agents: Interactive Simulacra of Human Behavior, Stanford, Google Research &\nDeepMind\nAdditional Reading:\nImproving Factuality and Reasoning in Language Models through Multiagent Debate, MIT & Google\nBrain\nExploring Collaboration Mechanisms for LLM Agents: A Social Psychology View, Zhejiang\nUniversity, National University of Singapore & DeepMind\nAutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, Microsoft Research\net al.\n How real-world businesses are transforming with AI — with more than 140 new stories,Microsoft (blog post)\n Built with LangGraph, LangGraph (website page)\nPlan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLMAgents As A Daily Assistant, Delft University of Technology & The University of Queensland\n22.05. Lab: Multi-agent Environment\nOn material of session 20.05'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 5, 'page_label': '6'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 6 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nThis lab will introduce a short walkthrough to creation of a multi-agent environment for automatedmeeting scheduling and preparation. We will see how the coordinator agent will communicate with twoauxiliary agents to check time availability and prepare an agenda for the meeting.\nReading:\n Multi-agent network, LangGraph\n Human-in-the-loop, LangGraph\nPlan-and-Execute, LangGraph\nReflection, LangGraph\n Multi-agent supervisor, LangGraph\nQuick Start, AutoGen\nWeek 627.05. Lecture: Software Development Pt. 1: Code Generation, Evaluation &TestingThis lectures opens a new lecture mini-cycle dedicated to software development. The first lectureoverviews how LLMs are used to generate reliable code and how generated code is tested andimproved to deal with the errors.\nKey points:\nCode generation & refining\nAutomated testing\nGenerated code evaluation\nCore Reading:\nLarge Language Model-Based Agents for Software Engineering: A Survey, Fudan University,\nNanyang Technological University & University of Illinois at Urbana-Champaign\n CodeRL: Mastering Code Generation through Pretrained Models and Deep ReinforcementLearning (pages 1-20), Salesforce Research\nThe ART of LLM Refinement: Ask, Refine, and Trust, ETH Zurich & Meta AI\nAdditional Reading:\nPlanning with Large Language Models for Code Generation, MIT-IBM Watson AI Lab et al.\nCode Repair with LLMs gives an Exploration-Exploitation Tradeoff, Cornell, Shanghai Jiao Tong\nUniversity & University of Toronto\nChatUniTest: A Framework for LLM-Based Test Generation, Zhejiang University & Hangzhou City\nUniversity\nTestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and RepairIteration, Nanjing University & Huawei Cloud Computing Technologies\nEvaluating Large Language Models Trained on Code, `OpenAI\n Code Generation on HumanEval, OpenAI (leaderboard)\nCodeJudge: Evaluating Code Generation with Large Language Models, Huazhong University of\nScience and Technology & Purdue University\n29.05.Ausfalltermin'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 6, 'page_label': '7'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 7 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nWeek 703.06. Lecture: Software Development Pt. 2: Copilots, LLM-powered WebsitesThe second and the last lecture of the software development cycle focuses on practical application ofLLM code generation, in particular, on widely-used copilots (real-time code generation assistants) andLLM-supported web development.\nKey points:\nCopilots & real-time hints\nLLM-powered websites\nLLM-supported deployment\nFurther considerations: reliability, sustainability etc.\nCore Reading:\n LLMs in Web Development: Evaluating LLM-Generated PHP Code Unveiling Vulnerabilities andLimitations (pages 1-11), University of Oslo\nA Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis,Google DeepMind & The University of Tokyo\nCan ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large LanguageModel Code Generation, UC San Diego\nAdditional Reading:\nDesign and evaluation of AI copilots – case studies of retail copilot templates, Microsoft\n Your AI Companion, Microsoft (blog post)\nGitHub Copilot, GitHub (product page)\n Research: quantifying GitHub Copilot’s impact on developer productivity and happiness, GitHub\n(blog post)\n Cursor: The AI Code Editor, Cursor (product page)\nAutomated Unit Test Improvement using Large Language Models at Meta, Meta\nHuman-In-the-Loop Software Development Agents, Monash University, The University of\nMelbourne & Atlassian\nAn LLM-based Agent for Reliable Docker Environment Configuration, Harbin Institute of\nTechnology & ByteDance\nLearn to Code Sustainably: An Empirical Study on LLM-based Green Code Generation, TWT GmbH\nScience & Innovation et al.\nEnhancing Large Language Models for Secure Code Generation: A Dataset-driven Study onVulnerability Mitigation, South China University of Technology & University of Innsbruck\n05.06 Lab: LLM-powered Website\nOn material of session 03.06\nIn this lab, we’ll have the LLM make a website for us: it will both generate the contents of the websiteand generate all the code required for rendering, styling and navigation.\nReading:\nsee session 22.05\n HTML: Creating the content, MDN\n Getting started with CSS, MDN'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 7, 'page_label': '8'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nWeek 8: Having Some Rest10.06.Ausfalltermin\n12.06.Ausfalltermin\nWeek 917.06. Pitch: RAG Chatbot\nOn material of session 06.05 and session 13.05\nThe first pitch will be dedicated to a custom RAG chatbot that the contractors (the presentingstudents, see the infos about Pitches) will have prepared to present. The RAG chatbot will have to beable to retrieve specific information from the given documents (not from the general knowledge!) anduse it in its responses. Specific requirements will be released on 22.05.\nReading: see session 06.05, session 08.05, session 13.05, and session 15.05\n19.06.Ausfalltermin\nWeek 1024.06. Pitch: Handling Customer Requests in a Multi-agent Environment\nOn material of session 20.05\nIn the second pitch, the contractors will present their solution to automated handling of customerrequests. The solution will have to introduce a multi-agent environment to take off working load froman imagined support team. The solution will have to read and categorize tickets, generate replies and(in case of need) notify the human that their interference is required. Specific requirements will bereleased on 27.05.\nReading: see session 20.05 and session 22.05\n26.06. Lecture: Other Business Applications: Game Design, Financial Analysisetc.This lecture will serve a small break and will briefly go over other business scenarios that the LLMs areused in.\nKey points:\nGame design & narrative games\nFinancial applications\nContent creation\nAdditional Reading:'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 8, 'page_label': '9'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 9 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nPlayer-Driven Emergence in LLM-Driven Game Narrative, Microsoft Research\nGenerating Converging Narratives for Games with Large Language Models, U.S. Army Research\nLaboratory\nGame Agent Driven by Free-Form Text Command: Using LLM-based Code Generation and BehaviorBranch, University of Tokyo\n AI Dungeon Games, AI Dungeon (game catalogue)\n AI Town, Andreessen Horowitz & Convex (game)\nIntroducing NPC-Playground, a 3D playground to interact with LLM-powered NPCs, HuggingFace\n(blog post)\nBlip, bliporg (GitHub repo)\ngigax, GigaxGames (GitHub repo)\nLarge Language Models in Finance: A Survey, Columbia & New York University\nFinLlama: Financial Sentiment Classification for Algorithmic Trading Applications, Imperial College\nLondon & MIT\nEquipping Language Models with Tool Use Capability for Tabular Data Analysis in Finance, Monash\nUniversity\nLLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation,Shanghai Jiao Tong University et al.\nAssisting in Writing Wikipedia-like Articles From Scratch with Large Language Models, Stanford\nLarge Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools,MIT, Harvard University & MIT-IBM Watson AI Lab\nPart 2: Applications in ScienceWeek 1101.07. Lecture: LLMs in Research: Experiment Planning & HypothesisGenerationThe first lecture dedicated to scientific applications shows how LLMs are used to plan experiments andgenerate hypothesis to accelerate research.\nKey points:\nExperiment planning\nHypothesis generation\nPredicting possible results\nCore Reading:\n Hypothesis Generation with Large Language Models (pages 1-9), University of Chicago &\nToyota Technological Institute at Chicago\n LLMs for Science: Usage for Code Generation and Data Analysis (pages 1-6), TUM\nEmergent autonomous scientific research capabilities of large language models, Carnegie Mellon\nUniversity\nAdditional Reading:'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 9, 'page_label': '10'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 10 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nImproving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models,University of Virginia\nPaper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance,University of Illinois at Urbana-Champaign, Carnegie Mellon University & Carleton College\nSciLitLLM: How to Adapt LLMs for Scientific Literature Understanding, University of Science and\nTechnology of China & DP Technology\nMapping the Increasing Use of LLMs in Scientific Papers, Stanford\n03.07: Lab: Experiment Planning & Hypothesis Generation\nOn material of session 01.07\nIn this lab, we’ll practice in facilitating researcher’s work with LLMs on the example of a toy scientificresearch.\nReading: see session 22.05\nWeek 1208.07: Pitch: Agent for Code Generation\nOn material of session 27.05\nThis pitch will revolve around the contractors’ implementation of a self-improving code generator. Thecode generator will have to generate both scripts and test cases for a problem given in the inputprompt, run the tests and refine the code if needed. Specific requirements will be released on 17.06.\nReading: see session 27.05 and session 05.06\n10.07. Lecture: Other Applications in Science: Drug Discovery, Math etc. &Scientific ReliabilityThe final core topic will mention other scientific applications of LLMs that were not covered in theprevious lectures and address the question of reliability of the results obtained with LLMs.\nKey points:\nDrug discovery, math & other applications\nScientific confidence & reliability\nCore Reading:\n Can LLMs replace Neil deGrasse Tyson? Evaluating the Reliability of LLMs as ScienceCommunicators (pages 1-9), Indian Institute of Technology\nAdditional Reading:'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 10, 'page_label': '11'}, page_content='12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 11 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nA Comprehensive Survey of Scientific Large Language Models and Their Applications in ScientificDiscovery, University of Illinois at Urbana-Champaign et al.\nLarge Language Models in Drug Discovery and Development: From Disease Mechanisms to ClinicalTrials, Department of Data Science and AI, Monash University et al.\nLLM-SR: Scientific Equation Discovery via Programming with Large Language Models, Virginia\nTech et al.\n Awesome Scientific Language Models, yuzhimanhua (GitHub repo)\nCURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning, Google\net al.\nMultiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-ConfidentEven When They Are Wrong, Nanjing University of Aeronautics and Astronautics et al.\nBlock 3: Wrap-upWeek 1315.07. Pitch: Agent for Web Development\nOn material of session 03.06\nThe contractors will present their agent that will have to generate full (minimalistic) websites by aprompt. For each website, the agent will have to generate its own style and a simple menu with workingnavigation as well as the contents. Specific requirements will be released on 24.06.\nReading: see session 03.06 and session 05.06\n17.07. Lecture: Role of AI in Recent YearsThe last lecture of the course will turn to societal considerations regarding LLMs and AI in general andwill investigate its role and influence on the humanity nowadays.\nKey points:\nStudies on influence of AI in the recent years\nStudies on AI integration rate\nEthical, legal & environmental aspects\nCore Reading:\n Protecting Human Cognition in the Age of AI (pages 1-5), The University of Texas at Austin et al.\n Artificial intelligence governance: Ethical considerations and implications for social responsibility(pages 1-12), University of Malta\nAdditional Reading:'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 11, 'page_label': '12'}, page_content="12.05.25, 17:28Topics Overview - LLM-based Assistants\nPage 12 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html\nAugmenting Minds or Automating Skills: The Differential Role of Human Capital in Generative AI’sImpact on Creative Tasks, Tsinghua University & Wuhan University of Technology\nHuman Creativity in the Age of LLMs: Randomized Experiments on Divergent and ConvergentThinking, University of Toronto\nEmpirical evidence of Large Language Model’s influence on human spoken communication, Max-\nPlanck Institute for Human Development\n The 2025 AI Index Report: Top Takeaways, Stanford\nGrowing Up: Navigating Generative AI’s Early Years – AI Adoption Report: Executive Summary, AI at\nWharton\nEthical Implications of AI in Data Collection: Balancing Innovation with Privacy, AI Data Chronicles\nLegal and ethical implications of AI-based crowd analysis: the AI Act and beyond, Vrije\nUniversiteit\nA Survey of Sustainability in Large Language Models: Applications, Economics, and Challenges,Cleveland State University et al.\nWeek 1422.07. Pitch: LLM-based Research Assistant\nOn material of session 01.07\nThe last pitch will introduce an agent that will have to plan the research, generate hypotheses, find theliterature etc. for a given scientific problem. It will then have to introduce its results in form of a TODOor a guide for the researcher to start off of. Specific requirements will be released on 01.07.\nReading: see session 01.07 and session 03.07\n24.07. Debate: Role of AI in Recent Years + Wrap-up\nOn material of session 17.07\nThe course will be concluded by the final debates, after which a short Q&A session will be held.\nDebate topics:\nLLM Behavior: Evidence of Awareness or Illusion of Understanding?\nShould We Limit the Usage of AI?\nReading: see session 17.07\nCopyright © 2025, Maksim ShmaltsMade with Sphinx and @pradyunsg's Furo")]
print(pages[0].page_content)
12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 1 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
To p i c s  O v e r v i e wThe schedule is preliminary and subject to changes!
The reading for each lecture is given as references to the sources the respective lectures base on. Youare not obliged to read anything. However, you are strongly encouraged to read references marked bypin emojis 
: those are comprehensive overviews on the topics or important works that are beneficialfor a better understanding of the key concepts. For the pinned papers, I also specify the pages span foryou to focus on the most important fragments. Some of the sources are also marked with a popcornemoji 
: that is misc material you might want to take a look at: blog posts, GitHub repos, leaderboardsetc. (also a couple of LLM-based games). For each of the sources, I also leave my subjectiveestimation of how important this work is for this specific topic: from yellow 
 ‘partially useful’ thoughorange 
 ‘useful’ to red 
 ‘crucial findings / thoughts’.  T h e s e  e s t i m a t i o n s  w i l l  b e  c o n t i n u o u s l yupdated as I revise the materials.
For the labs, you are provided with practical tutorials that respective lab tasks will mostly derive from.The core tutorials are marked with a writing emoji 
; you are asked to inspect them in advance(better yet: try them out). On lab sessions, we will only briefly recap them so it is up to you to preparein advance to keep up with the lab.
Disclaimer: the reading entries are no proper citations; the bibtex references as well as detailed infosabout the authors, publish date etc. can be found under the entry links.
Block 1: IntroWeek 122.04. Lecture: LLMs as a Form of Intelligence vs LLMs as Statistical MachinesThat is an introductory lecture, in which I will briefly introduce the course and we’ll have a warming updiscussion about different perspectives on LLMs’ nature. We will focus on two prominent outlooks: LLMis a form of intelligence and LLM is a complex statistical machine. We’ll discuss differences of LLMswith human intelligence and the degree to which LLMs exhibit (self-)awareness.
Key points:
Course introduction
Different perspectives on the nature of LLMs
Similarities and differences between human and artificial intelligence
LLMs’ (self-)awareness
Core Reading:
 The Debate Over Understanding in AI’s Large Language Models (pages 1-7), Santa Fe
Institute 
Meaning without reference in large language models, UC Berkeley & DeepMind 
Dissociating language and thought in large language models (intro [right after the abstract, seemore on the sectioning in this paper at the bottom of page 2], sections 1, 2.3 [LLMs are predictive…], 3-5), The University of Texas at Austin et al. 
Additional Reading:
LLM-basedAssistants
INFOS AND STUFF
BLOCK 1: INTRO
BLOCK 2: CORE TOPICS | PART 1:BUSINESS APPLICATIONS
BLOCK 2: CORE TOPICS | PART 2:APPLICATIONS IN SCIENCE
BLOCK 3: WRAP-UP
Topics Overview
Debates
Pitches
LLM Inference Guide
22.04. LLMs as a Form ofIntelligence vs LLMs asStatistical Machines
24.04. LLM & Agent Basics
29.04. Intro to LangChain 
!
"
06.05. Virtual Assistants Pt. 1:Chatbots
08.05. Basic LLM-basedChatbot 
#
Under development
Under development
Search

As you can see, the result is not satisfying because the PDF has a more complex structure than just one-paragraph text. To handle it’s layout, we could use UnstructuredLoader that will return a Document not for the whole page but for a single structure; for simplicity, let’s now go with PyPDF.

Chunking

During RAG, relevant documents are usually retrieved by semantic similarity that is calculated between the search query and each document in the index. However, if we calculate vectors for the entire PDF pages, we risk not to capture any meaning in the embedding because the context is just too long. That is why usually, loaded text is chunked in a RAG application; embeddings for smaller pieces of text are more discriminative, and thus the relevant context may be retrieved better. Furthermore, it ensure process consistency when working documents of varying sizes, and is just more computationally efficient.

Different approaches to chunking are described in tutorial Text splitters from LangChain. We’ll use RecursiveCharacterTextSplitter – a good option in terms of simplicity-quality ratio for simple cases. This splitter tries to keep text structures (paragraphs, sentences) together and thus maintain text coherence in chunks.

from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from typing import List
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512, # maximum number of characters in a chunk
    chunk_overlap=50 # number of characters to overlap between chunks
)

def split_page(page: Document) -> List[Document]:
    chunks = text_splitter.split_text(page.page_content)
    return [
        Document(
            page_content=chunk,
            metadata=page.metadata,
        ) 
        for chunk in chunks
    ]
docs = []
for page in pages:
    docs += split_page(page)

print(f"Converted {len(pages)} pages into {len(docs)} chunks.")
Converted 12 pages into 66 chunks.
print(docs[3].page_content)
For the labs, you are provided with practical tutorials that respective lab tasks will mostly derive from.The core tutorials are marked with a writing emoji 
; you are asked to inspect them in advance(better yet: try them out). On lab sessions, we will only briefly recap them so it is up to you to preparein advance to keep up with the lab.

Convert to Embeddings

As discussed, the retrieval usually succeeds by vector similarity and the index contains not the actual texts but their vector representations. Vector representations are created by embedding models – models usually made specifically for this objective by being trained to create more similar vectors for more similar sentences and to push apart dissimilar sentences in the vector space.

In the last session, we used the nv-embedqa-e5-v5 model – a model from NVIDIA pretrained for English QA. However, their didn’t work very stable, so in this session, we’ll substitute them with HF Sentence Transformers Embeddings: an open-source lightweight alternative that runs locally. However, the choice of the model here also heavily depends on the use case; for example, the model we will be using – all-MiniLM-L6-v2 – truncates input text longer than 256 word pieces by default, which is fine for our short passages but may be critical in other applications.

from langchain_huggingface import HuggingFaceEmbeddings
EMBEDDING_NAME = "all-MiniLM-L6-v2"

embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_NAME)
/Users/maxschmaltz/Documents/Course-LLM-based-Assistants/llm-based-assistants/sessions/block2_core_topics/pt1_business/2005/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`

An embedding model receives an input text and returns a dense vector that is believed to capture its semantic properties.

test_embedding = embeddings.embed_query("retrieval augmented generation")
test_embedding
[-0.08718939125537872,
 -0.017552487552165985,
 -0.02729196660220623,
 0.05240696668624878,
 -0.021445486694574356,
 0.07236161828041077,
 0.06576938927173615,
 -0.05251311510801315,
 -0.010767153464257717,
 -0.03447244316339493,
 0.06100146472454071,
 -0.011422315612435341,
 0.07560884952545166,
 -0.03255048021674156,
 -0.021511150524020195,
 0.008553139865398407,
 0.034805942326784134,
 0.03402310237288475,
 -0.03876182436943054,
 -0.06949298828840256,
 -0.015863697975873947,
 0.040087372064590454,
 0.019959909841418266,
 -0.03369267284870148,
 0.003124572103843093,
 -0.01715978793799877,
 -0.006432410795241594,
 -0.03536583110690117,
 0.141240656375885,
 -0.02228039875626564,
 0.04633574187755585,
 0.10714968293905258,
 -0.07780978083610535,
 0.04519134759902954,
 -0.03231765329837799,
 0.05535123497247696,
 -0.10309430956840515,
 0.10207292437553406,
 0.009038606658577919,
 -0.03525232896208763,
 -0.049930084496736526,
 0.031047439202666283,
 0.03019150160253048,
 -0.02269360050559044,
 0.06476514786481857,
 -0.06673639267683029,
 -0.09085537493228912,
 -0.0030079304706305265,
 0.0018228460103273392,
 0.011870376765727997,
 -0.04707968235015869,
 -0.004527844022959471,
 -0.025533059611916542,
 -0.016370372846722603,
 0.011221242137253284,
 0.06357301026582718,
 0.0033866423182189465,
 -0.0373876690864563,
 -0.08252184092998505,
 -0.06394127756357193,
 -0.014947792515158653,
 -0.1283581256866455,
 -0.07762394100427628,
 -0.0534483939409256,
 0.0475839339196682,
 -0.02412143163383007,
 0.06741201132535934,
 -0.0056900521740317345,
 0.011296355165541172,
 0.019414132460951805,
 -0.06329578906297684,
 -0.010496418923139572,
 -0.025765469297766685,
 0.08242304623126984,
 0.04078071564435959,
 0.1073637381196022,
 0.012940718792378902,
 -0.0813097432255745,
 0.03622240945696831,
 0.004342259839177132,
 0.006024372298270464,
 -0.03921079635620117,
 0.06007775664329529,
 0.00166932528372854,
 0.034063201397657394,
 0.0020538237877190113,
 0.03759211301803589,
 -0.04690760374069214,
 0.016682617366313934,
 0.027662847191095352,
 -0.02605142630636692,
 -0.03171469271183014,
 0.03400687873363495,
 -0.013206538744270802,
 -0.026792503893375397,
 -0.0156557559967041,
 0.09320804476737976,
 -0.06534723192453384,
 0.016737347468733788,
 0.09889230132102966,
 0.07293327897787094,
 0.0716785416007042,
 0.02019762620329857,
 -0.014685732312500477,
 -0.044228456914424896,
 -0.027015535160899162,
 0.044873982667922974,
 0.02416594699025154,
 -0.01541962567716837,
 -0.07742788642644882,
 -0.014204404316842556,
 -0.012951254844665527,
 0.02990981936454773,
 -0.0670965239405632,
 -0.023469531908631325,
 0.019714754074811935,
 -0.006615940947085619,
 -0.02472003735601902,
 0.07622068375349045,
 -0.10430707782506943,
 0.012092015706002712,
 0.0103624127805233,
 0.0072660925798118114,
 0.03822939842939377,
 -0.05302713066339493,
 -0.05786251276731491,
 0.02362321875989437,
 -3.1146500760242126e-33,
 0.02739580161869526,
 0.007799218408763409,
 0.030314011499285698,
 0.09257230907678604,
 0.009595884941518307,
 0.0165739506483078,
 0.0775405615568161,
 0.04281589388847351,
 -0.04223857447504997,
 -0.107220359146595,
 -0.04938485473394394,
 0.08949682861566544,
 -0.018288100138306618,
 0.08887151628732681,
 0.06218535080552101,
 -0.021059619262814522,
 -0.022731175646185875,
 0.09228453040122986,
 0.03685983642935753,
 0.016744032502174377,
 0.0068116458132863045,
 0.014910407364368439,
 0.021146360784769058,
 -0.06683419644832611,
 0.014837664552032948,
 0.00986749492585659,
 -0.013698508962988853,
 -0.04000495746731758,
 -0.06458384543657303,
 0.000607188674621284,
 -0.05840948224067688,
 0.042031846940517426,
 -0.01182975247502327,
 0.024098893627524376,
 -0.02406744472682476,
 0.04856685549020767,
 0.048356227576732635,
 -0.00706762308254838,
 0.0214801374822855,
 -0.007435947190970182,
 0.04717991128563881,
 0.03317352756857872,
 0.06760194897651672,
 -0.0725097581744194,
 -0.08834613114595413,
 0.004588903859257698,
 0.0333319753408432,
 0.04611990600824356,
 -0.06713993847370148,
 0.0068238903768360615,
 0.034443460404872894,
 0.07798873633146286,
 -0.12097905576229095,
 -0.0878448337316513,
 0.0739184096455574,
 -0.004626526031643152,
 -0.049848511815071106,
 0.0845608115196228,
 0.056117650121450424,
 0.05702828988432884,
 0.04476482793688774,
 0.053383275866508484,
 0.09401398152112961,
 0.056368619203567505,
 -0.064379021525383,
 0.04319624975323677,
 0.045496802777051926,
 -0.04076174646615982,
 0.17445245385169983,
 0.016478080302476883,
 -0.03775310888886452,
 -0.036436427384614944,
 0.013020608574151993,
 -0.1050802543759346,
 0.08973690122365952,
 -0.05857545882463455,
 -0.027984371408820152,
 -0.038374338299036026,
 0.010340191423892975,
 -0.04257754981517792,
 -0.06980421394109726,
 -0.022713419049978256,
 -0.034653082489967346,
 0.002139865653589368,
 0.019379043951630592,
 -0.07190203666687012,
 0.04081670939922333,
 -0.09128168970346451,
 -0.007772673387080431,
 -0.069792740046978,
 0.011165888048708439,
 0.04168428108096123,
 -0.02773478999733925,
 0.027545265853405,
 0.043000251054763794,
 1.9090292743582884e-33,
 0.023852724581956863,
 -0.05112120881676674,
 -0.006110853049904108,
 -0.018936337903141975,
 0.012887736782431602,
 -0.040601737797260284,
 -0.007608255371451378,
 -0.064246766269207,
 -0.05957970395684242,
 -0.028532736003398895,
 -0.049880608916282654,
 -0.0249363761395216,
 0.014040950685739517,
 -0.02930120751261711,
 -0.02681984193623066,
 -0.001526402309536934,
 -0.03989081457257271,
 -0.06108027324080467,
 -0.025966564193367958,
 0.07759884744882584,
 0.016837269067764282,
 0.024788782000541687,
 -0.04913561791181564,
 -0.011247153393924236,
 0.031221885234117508,
 0.02929314598441124,
 0.01990882307291031,
 -0.046744197607040405,
 -0.015650060027837753,
 -0.0033993797842413187,
 0.01988779380917549,
 -0.05016341432929039,
 0.03616257384419441,
 -0.009688613004982471,
 -0.053137846291065216,
 0.015548999421298504,
 0.08536969870328903,
 0.00046374136582016945,
 -0.10475442558526993,
 0.0959901511669159,
 -0.010590538382530212,
 0.0005332615692168474,
 -0.0315580889582634,
 0.06814399361610413,
 0.024714365601539612,
 -0.01642696000635624,
 -0.044301100075244904,
 0.05846698209643364,
 0.0305433738976717,
 0.0351850762963295,
 0.03364138677716255,
 0.03364896401762962,
 -0.05115457624197006,
 -0.07231186330318451,
 -0.04751227796077728,
 -0.070430226624012,
 -0.03169975429773331,
 -0.056794293224811554,
 0.069991834461689,
 0.023399094119668007,
 -0.057767603546381,
 -0.058912817388772964,
 -0.02279546484351158,
 0.0005012541078031063,
 0.05812565237283707,
 -0.0464903861284256,
 -0.017210209742188454,
 0.005402414593845606,
 -0.08622312545776367,
 -0.000911232375074178,
 0.021159576252102852,
 -0.002533447928726673,
 0.0549214668571949,
 -0.04015940800309181,
 0.11574247479438782,
 0.031063484027981758,
 0.04054126515984535,
 0.05624070018529892,
 0.05388813093304634,
 -0.07476453483104706,
 -0.12748748064041138,
 0.07535384595394135,
 0.04001101851463318,
 0.1253139078617096,
 -0.08527559787034988,
 -0.036197226494550705,
 0.039345644414424896,
 0.05518045648932457,
 -0.021991057321429253,
 -0.01585155911743641,
 -0.017070360481739044,
 -0.007107829209417105,
 -0.016361547634005547,
 0.016446873545646667,
 0.04554374888539314,
 -1.115378278626622e-08,
 -0.08995316177606583,
 0.0313422828912735,
 -0.03209814056754112,
 0.03675302863121033,
 0.054376643151044846,
 0.03777965530753136,
 -0.05917801335453987,
 0.06558234244585037,
 -0.04349064826965332,
 -0.069773368537426,
 -0.010083134286105633,
 -0.005738626234233379,
 -0.01205505058169365,
 0.015104155987501144,
 0.040337566286325455,
 -0.008147886022925377,
 0.014851447194814682,
 0.03131204470992088,
 -0.032955992966890335,
 0.00409349799156189,
 0.03588413819670677,
 0.07718852907419205,
 0.04611629992723465,
 0.01334471721202135,
 0.023300277069211006,
 0.02821965515613556,
 -0.0066739823669195175,
 -0.0029166792519390583,
 0.1073087826371193,
 0.0037622086238116026,
 0.049567416310310364,
 -0.009718682616949081,
 0.04146944731473923,
 0.01478524599224329,
 0.06772028654813766,
 0.007248429581522942,
 -0.04934094473719597,
 -0.0584770031273365,
 -0.04953351989388466,
 -0.06965198367834091,
 0.03667891025543213,
 0.02164360135793686,
 -0.04110253229737282,
 -0.024316079914569855,
 0.038025762885808945,
 -0.02508619800209999,
 0.03187211602926254,
 -0.0794006884098053,
 -0.013028672896325588,
 -0.03650461882352829,
 -0.11604329198598862,
 -0.13706286251544952,
 0.04139665886759758,
 0.011682813055813313,
 0.1062411218881607,
 0.013992193154990673,
 0.029270553961396217,
 -0.015916042029857635,
 0.06916730105876923,
 0.0022920502815395594,
 0.1077147051692009,
 0.055848877876996994,
 -0.06970211863517761,
 0.0435895211994648]

Indexing

Now that we have split our data and initialized the embeddings, we can start indexing it. There are a lot of different implementations of indexes, you can take a lot at available options in Vector stores. One of the popular choices is Qdrant that provides a simple data management and can be deployed both locally, on a remote machine, and on the cloud.

Qdrant support persisting your vector storage, i.e. storing it on the working machine, but for simplicity, we will use it in the in-memory mode, so that the storage exists only as long as the notebook does.

from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from uuid import uuid4

First things first, we need to create a client – a Qdrant instance that will be the entrypoint for all the actions we do with the data.

qd_client = QdrantClient(":memory:")    # in-memory Qdrant client
len(test_embedding)
384

Then, as we use an in-memory client that does not store the index between the notebook sessions, we need to initialize a collection. Alternatively, if we were persisting the data, we would perform a check if the collection exists and then either create or load it.

For Qdrant to initialize the structure of the index correctly, we need to provide the dimentionality of the embedding we will be using as well as teh distance metric.

collection_name = "2005"

qd_client.create_collection(
    collection_name=collection_name,
    # embedding params here
    vectors_config=VectorParams(
        size=len(test_embedding),   # is there a better way?
        distance=Distance.COSINE    # cosine distance
    )
)
True

Finally, we use a LangChain wrapper to connect to the index to unify the workflow.

vector_store = QdrantVectorStore(
    client=qd_client,
    collection_name=collection_name,
    embedding=embeddings
)

Now we are ready to add our chunks to the vector storage. As we will be adding the chunks, the index will take care about converting our passages into embeddings.

In order to be able to delete / modify the chunks afterwards, we assign them with unique ids that we generate dynamically.

ids = [str(uuid4()) for _ in range(len(docs))]
vector_store.add_documents(
    docs,
    ids=ids
)
['94ad0d3e-d41f-4fa6-833c-6a6c4afeaa45',
 'cd8a6fd5-8024-46a2-9b74-e890379f40bb',
 '1a8a5f6c-1672-4293-896d-86a5e6bc189c',
 'f925722c-424c-4356-89ef-4d2ad767f4d5',
 '6513f0d5-258f-42ef-8d22-0dd3ce0954a3',
 'c2a4adac-17a7-4c34-836e-11096649aeb5',
 'd0c1cd52-bbe1-4a3d-a125-e3aa8acdaeec',
 '1c5e520d-12c3-4495-ad71-2641612906ad',
 '0cc8904d-8ea9-4695-9bdf-3078177df420',
 'beae1abc-e3f9-49b4-b86e-b1490eb4e330',
 'a4753493-4ff4-4b85-ae34-f898076d0029',
 'ed48ce57-885f-491c-9fc8-197e3a9481fc',
 'b932bf6d-5f37-4841-a8d7-8680df916167',
 '9b894a91-d821-4909-8828-53ba77d14c4a',
 'fec59831-30d9-45b3-a062-64b0af932838',
 'b4dd4e44-5083-439f-b4e8-44452c0705b9',
 'e3620974-4fbc-4f2e-be60-bdbc002d7740',
 '72b1b0ec-ade1-4c8f-9c43-016365e3b487',
 'a3c866b0-b216-4906-8a5b-5c3c5bc11e28',
 '6cd4afdc-8741-41bc-bd01-464274d4c4fd',
 'cdbdbfff-b43c-4e11-b63b-a74a6147316b',
 'a6660fde-2c01-46d6-9a95-056512134dae',
 '656c7c62-a746-4ee3-80a9-b5ff36210ffe',
 'b0070e4a-8bac-4d7e-a311-7daed4774bb1',
 'f5aef846-c175-4281-9dfa-76487e125130',
 '82b3f0d8-3ed9-46dc-aa2c-8e176e08ade3',
 'e4c4ab49-5c0d-441b-88f3-94675e81a567',
 '06ae5d03-3b6e-4e80-b86d-a7b3f67cbb8d',
 '99d08640-94ff-453e-90c0-7e4941d00d56',
 'f230f890-1c31-44d2-81aa-bd504fe7539b',
 '17245b54-6142-4875-af33-2251e691c684',
 '8f677e73-462e-4f50-9ef5-55d11d7b00e0',
 '965381f6-922c-4a0c-87d7-1bb66790fd53',
 'ff6ea092-d77d-4206-95ed-43f9c4346b03',
 '9b0391ae-8fff-4544-a240-e82be44ccf24',
 '98037bee-8d70-4e3b-bd38-58ea5453eeea',
 'adf2d1b5-65d6-4504-9421-f3304133a4a8',
 '79608c97-df83-44a1-93e2-447032afa715',
 'b0614f9f-a8ca-447f-bc2b-e64209f84b09',
 '804a1a29-94f2-4dea-b895-5ab54ec000be',
 'd641eacd-7f61-49d5-afed-ad68c8f32d11',
 '670a00f0-e061-4a8e-af11-ff02493ad162',
 'edad71fe-b07a-4bd6-aa89-1229a949956c',
 '0ef16281-6afb-48a5-8aab-b0ec0db79683',
 'f211358d-5d5d-4217-94e1-fa5296bfebd0',
 'd158c4d9-ce85-42b7-9ba5-87780f413d1f',
 '2d7e4e85-c65a-496a-ae5b-9ead0484fb1a',
 'e719c032-a1ee-4616-84ce-e0a382e5b85e',
 '0f6f9a90-b0d2-4711-9292-6e7d5445e760',
 '39e8dd3a-8a74-4f0a-97d3-365b271bc9d3',
 'f11c78f5-f17f-4705-a557-3dd27ef8d92e',
 'ffbe8c5a-395f-4fdb-a7d1-206b3b5eb59b',
 '4e92b582-84af-40ac-a1cd-08ab66972325',
 '17c583b3-e480-4367-9616-48f500dfc2a5',
 'eddaa35d-d185-4ca2-b92c-7e769ee5e96e',
 '6f86e05c-0a56-4188-985d-9391ae506624',
 '317d9449-575d-4290-b1e0-6aafbd90df42',
 '5d9754bc-dcf5-403b-8c44-1991bf22f22f',
 '2dbd8454-4515-4918-b4cd-ac7627878c38',
 '1e0b5a80-1746-4000-bb98-b38735f3cfc5',
 '81aa5b84-9a40-4e57-8fc8-c6ec0b682be8',
 '7fbaacdb-1ad9-4e26-9736-9cb79f2ab2cf',
 '1d3dba43-222f-4f13-8b82-3b97c1f1e4b4',
 'c1537d53-64c9-4477-913c-7878e0b2c448',
 '6c3beac6-3274-4267-8312-8638c49015bd',
 '36bf6a1a-c1ac-4cac-923f-d11049855c4e']
vector_store.similarity_search("retrieval augmented generation", k=3)
[Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 3, 'page_label': '4', '_id': '656c7c62-a746-4ee3-80a9-b5ff36210ffe', '_collection_name': '2005'}, page_content='Retrieval tools\nAgentic RAG\nCore Reading:\n Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and HybridApproach (pages 1-7), Google DeepMind & University of Michigan \nA Survey on Retrieval-Augmented Text Generation for Large Language Models (sections 1-7), York\nUniversity \nAdditional Reading:\nDon’t Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, National\nChengchi University & Academia Sinica'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 3, 'page_label': '4', '_id': 'b0070e4a-8bac-4d7e-a311-7daed4774bb1', '_collection_name': '2005'}, page_content='Chengchi University & Academia Sinica \nSelf-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, University of\nWashington, Allen Institute for AI & IBM Research AI\nAdaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through QuestionComplexity, Korea Advanced Institute of Science and Technology\nAuto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models, Chinese\nAcademy of Sciences'),
 Document(metadata={'producer': 'macOS Version 12.7.6 (Build 21H1320) Quartz PDFContext', 'creator': 'Safari', 'creationdate': "D:20250512152829Z00'00'", 'title': 'Topics Overview - LLM-based Assistants', 'moddate': "D:20250512152829Z00'00'", 'source': './topic_overview.pdf', 'total_pages': 12, 'page': 5, 'page_label': '6', '_id': '965381f6-922c-4a0c-87d7-1bb66790fd53', '_collection_name': '2005'}, page_content='Code generation & refining\nAutomated testing\nGenerated code evaluation\nCore Reading:\nLarge Language Model-Based Agents for Software Engineering: A Survey, Fudan University,\nNanyang Technological University & University of Illinois at Urbana-Champaign\n CodeRL: Mastering Code Generation through Pretrained Models and Deep ReinforcementLearning (pages 1-20), Salesforce Research\nThe ART of LLM Refinement: Ask, Refine, and Trust, ETH Zurich & Meta AI\nAdditional Reading:')]

2. Simple RAG 💉

The basic RAG workflow is pretty straightforward: we just retrieve k most relevant documents and them insert them into the prompt as a part of the context.

For that, we will combine the skills we have obtained so far to build a LangGraph agent that receives the input, checks if the user wants to quit, then do the retrieval and generate a context-aware response if not. We will build on the basic version of our first chatbot; to add the RAG functionality, we need to add a retrieval node and modify the generation prompt to inject the retrieved documents.

First, we need to initialize our LLM.

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools.retriever import create_retriever_tool
# role: restrict it from the parametric knowledge
basic_rag_system_prompt = """\
You are an assistant that has access to a knowledge base. \
You should use the knowledge base to answer the user's questions.
"""


# this will add the context to the input
context_injection_prompt = """\
The user is asking a question. \
You should answer using the following context:

==========================
{context}
==========================


The user question is:
{input}
"""


# finally, gather the system message, the previous messages,
# and the input with the context
basic_rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", basic_rag_system_prompt),   # system message
        MessagesPlaceholder(variable_name="messages"),  # previous messages
        ("user", context_injection_prompt)  # user message
    ]
)

LangGraph provides a pre-built tool to conveniently create a retriever tool. As this is basic RAG, we don’t generate queries for the retriever for now and just use the user input as the query.

class BasicRAGChatbot(Chatbot):

    _graph_path = "./graph_basic_rag.png"
    
    def __init__(self, llm, k=5):
        super().__init__(llm)
        self.basic_rag_prompt = basic_rag_prompt
        self.retriever = vector_store.as_retriever(search_kwargs={"k": k})    # retrieve 5 documents
        self.retriever_tool = create_retriever_tool(    # and this is the tool
            self.retriever,
            "retrieve_internal_data",  # name
            "Search relevant information in internal documents.",   # description
        )

    def _build(self):
        # graph builder
        self._graph_builder = StateGraph(SimpleState)
        # add the nodes
        self._graph_builder.add_node("input", self._input_node)
        self._graph_builder.add_node("retrieve", self._retrieve_node)
        self._graph_builder.add_node("respond", self._respond_node)
        # define edges
        self._graph_builder.add_edge(START, "input")
        # basic rag: no planning, just always retrieve
        self._graph_builder.add_conditional_edges("input", self._is_quitting_node, {False: "retrieve", True: END})
        self._graph_builder.add_edge("retrieve", "respond")
        self._graph_builder.add_edge("respond", "input")
        # compile the graph
        self._compile()
    
    def _retrieve_node(self, state: SimpleState) -> dict:
        # retrieve the context
        user_query = state["messages"][-1].content  # use the last message as the query
        context = self.retriever_tool.invoke({"query": user_query})
        # add the context to the messages
        return {
            "messages": context
        }
    
    def _respond_node(self, state: SimpleState) -> dict:
        # the workflow is designed so that the context is always the last message
        # and the user query is the second to last message;
        # finally, we will be combining the context and the user query
        # into a single message so we remove those two from the messages
        context = state["messages"].pop(-1).content
        user_query = state["messages"].pop(-1).content
        prompt = self.basic_rag_prompt.invoke(
            {
                "messages": state["messages"],  # this goes to the message placeholder
                "context": context,  # this goes to the user message
                "input": user_query    # this goes to the user message
            }
        )
        response = self.llm.invoke(prompt)
        # add the response to the messages
        return {
            "messages": response
        }

    def run(self):
        input = {"messages": []}
        for event in self.chatbot.stream(input, stream_mode="values"):
            if event["messages"]:
                event["messages"][-1].pretty_print()
                print("\n")
basic_rag_chatbot = BasicRAGChatbot(llm)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
basic_rag_chatbot.run()
================================ Human Message =================================

what sessions do I have about virtual assistants?


Adaptive RAG, LangGraph
Multimodality, LangChain
Week 520.05. Lecture: Virtual Assistants Pt. 3: Multi-agent EnvironmentThis lectures concludes the Virtual Assistants cycle and directs its attention to automating everyday /business operations in a multi-agent environment. We’ll look at how agents communicate with eachother, how their communication can be guided (both with and without involvement of a human), andthis all is used in real applications.
Key points:
Multi-agent environment
Human in the loop

Prompt Templates, LangChain
Few-shot prompting, LangChain
Week 413.05. Lecture: Virtual Assistants Pt. 2: RAGContinuing the first part, the second part will expand scope of chatbot functionality and will teach it torefer to custom knowledge base to retrieve and use user-specific information. Finally, the most widelyused deployment methods will be briefly introduced.
Key points:
General knowledge vs context
Knowledge indexing, retrieval & ranking
Retrieval tools
Agentic RAG
Core Reading:

01.05.Ausfalltermin
Block 2: Core T opics
Part 1: Business ApplicationsWeek 306.05. Lecture: Virtual Assistants Pt. 1: ChatbotsThe first core topic concerns chatbots. We’ll discuss how chatbots are built, how they (should) handleharmful requests and you can tune it for your use case.
Key points:
LLMs alignment
Memory
Prompting & automated prompt generation
Evaluation
Core Reading:
 Aligning Large Language Models with Human: A Survey (pages 1-14), Huawei Noah’s Ark Lab

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Week 8: Having Some Rest10.06.Ausfalltermin
12.06.Ausfalltermin
Week 917.06. Pitch: RAG Chatbot
On material of session 06.05 and session 13.05

Built with LangGraph, LangGraph (website page)
Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLMAgents As A Daily Assistant, Delft University of Technology & The University of Queensland
22.05. Lab: Multi-agent Environment
On material of session 20.05
================================ Human Message =================================

Adaptive RAG, LangGraph
Multimodality, LangChain
Week 520.05. Lecture: Virtual Assistants Pt. 3: Multi-agent EnvironmentThis lectures concludes the Virtual Assistants cycle and directs its attention to automating everyday /business operations in a multi-agent environment. We’ll look at how agents communicate with eachother, how their communication can be guided (both with and without involvement of a human), andthis all is used in real applications.
Key points:
Multi-agent environment
Human in the loop

Prompt Templates, LangChain
Few-shot prompting, LangChain
Week 413.05. Lecture: Virtual Assistants Pt. 2: RAGContinuing the first part, the second part will expand scope of chatbot functionality and will teach it torefer to custom knowledge base to retrieve and use user-specific information. Finally, the most widelyused deployment methods will be briefly introduced.
Key points:
General knowledge vs context
Knowledge indexing, retrieval & ranking
Retrieval tools
Agentic RAG
Core Reading:

01.05.Ausfalltermin
Block 2: Core T opics
Part 1: Business ApplicationsWeek 306.05. Lecture: Virtual Assistants Pt. 1: ChatbotsThe first core topic concerns chatbots. We’ll discuss how chatbots are built, how they (should) handleharmful requests and you can tune it for your use case.
Key points:
LLMs alignment
Memory
Prompting & automated prompt generation
Evaluation
Core Reading:
 Aligning Large Language Models with Human: A Survey (pages 1-14), Huawei Noah’s Ark Lab

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Week 8: Having Some Rest10.06.Ausfalltermin
12.06.Ausfalltermin
Week 917.06. Pitch: RAG Chatbot
On material of session 06.05 and session 13.05

Built with LangGraph, LangGraph (website page)
Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLMAgents As A Daily Assistant, Delft University of Technology & The University of Queensland
22.05. Lab: Multi-agent Environment
On material of session 20.05


The user is asking a question. You should answer using the following context:

==========================
Adaptive RAG, LangGraph
Multimodality, LangChain
Week 520.05. Lecture: Virtual Assistants Pt. 3: Multi-agent EnvironmentThis lectures concludes the Virtual Assistants cycle and directs its attention to automating everyday /business operations in a multi-agent environment. We’ll look at how agents communicate with eachother, how their communication can be guided (both with and without involvement of a human), andthis all is used in real applications.
Key points:
Multi-agent environment
Human in the loop

Prompt Templates, LangChain
Few-shot prompting, LangChain
Week 413.05. Lecture: Virtual Assistants Pt. 2: RAGContinuing the first part, the second part will expand scope of chatbot functionality and will teach it torefer to custom knowledge base to retrieve and use user-specific information. Finally, the most widelyused deployment methods will be briefly introduced.
Key points:
General knowledge vs context
Knowledge indexing, retrieval & ranking
Retrieval tools
Agentic RAG
Core Reading:

01.05.Ausfalltermin
Block 2: Core T opics
Part 1: Business ApplicationsWeek 306.05. Lecture: Virtual Assistants Pt. 1: ChatbotsThe first core topic concerns chatbots. We’ll discuss how chatbots are built, how they (should) handleharmful requests and you can tune it for your use case.
Key points:
LLMs alignment
Memory
Prompting & automated prompt generation
Evaluation
Core Reading:
 Aligning Large Language Models with Human: A Survey (pages 1-14), Huawei Noah’s Ark Lab

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Week 8: Having Some Rest10.06.Ausfalltermin
12.06.Ausfalltermin
Week 917.06. Pitch: RAG Chatbot
On material of session 06.05 and session 13.05

Built with LangGraph, LangGraph (website page)
Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLMAgents As A Daily Assistant, Delft University of Technology & The University of Queensland
22.05. Lab: Multi-agent Environment
On material of session 20.05
==========================


The user question is:
what sessions do I have about virtual assistants?

================================== Ai Message ==================================

You have three sessions about Virtual Assistants:

1. Week 3: Lecture: Virtual Assistants Pt. 1: Chatbots (06.05)
2. Week 4: Lecture: Virtual Assistants Pt. 2: RAG (13.05)
3. Week 5: Lecture: Virtual Assistants Pt. 3: Multi-agent Environment (20.05)

Additionally, you have a pitch session on RAG Chatbot in Week 9 (17.06) and a lab session on Multi-agent Environment (22.05) which may also be related to Virtual Assistants.


================================ Human Message =================================

what are their dates?


================================ Human Message =================================

Disclaimer: the reading entries are no proper citations; the bibtex references as well as detailed infosabout the authors, publish date etc. can be found under the entry links.

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Week 8: Having Some Rest10.06.Ausfalltermin
12.06.Ausfalltermin
Week 917.06. Pitch: RAG Chatbot
On material of session 06.05 and session 13.05

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 1 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
To p i c s  O v e r v i e wThe schedule is preliminary and subject to changes!
The reading for each lecture is given as references to the sources the respective lectures base on. Youare not obliged to read anything. However, you are strongly encouraged to read references marked bypin emojis

On material of session 01.07
The last pitch will introduce an agent that will have to plan the research, generate hypotheses, find theliterature etc. for a given scientific problem. It will then have to introduce its results in form of a TODOor a guide for the researcher to start off of. Specific requirements will be released on 01.07.
Reading: see session 01.07 and session 03.07
24.07. Debate: Role of AI in Recent Years + Wrap-up
On material of session 17.07

On material of session 06.05 and session 13.05
The first pitch will be dedicated to a custom RAG chatbot that the contractors (the presentingstudents, see the infos about Pitches) will have prepared to present. The RAG chatbot will have to beable to retrieve specific information from the given documents (not from the general knowledge!) anduse it in its responses. Specific requirements will be released on 22.05.
Reading: see session 06.05, session 08.05, session 13.05, and session 15.05
19.06.Ausfalltermin


================================== Ai Message ==================================

Based on the provided context, the dates mentioned are:

1. 06.05 (session on Virtual Assistants Pt. 1: Chatbots)
2. 13.05 (session on Virtual Assistants Pt. 2: RAG)
3. 17.06 (Pitch: RAG Chatbot)
4. 10.06 (Ausfalltermin - Week 8)
5. 12.06 (Ausfalltermin - Week 8)
6. 19.06 (Ausfalltermin)
7. 22.05 (release of specific requirements for the RAG chatbot)
8. 01.07 (session with specific requirements for the last pitch)
9. 03.07 (session related to the last pitch)
10. 17.07 (session related to the debate)
11. 24.07 (Debate: Role of AI in Recent Years + Wrap-up)


================================ Human Message =================================

quit

As you can see, it already works pretty well, but as the retrieval goes by the user query directly, all the previous context of the conversation is not considered. To handle that, let’s add a node that would reformulate the query taking in consideration the previous interaction.

For that, we need an additional prompt.

# finally, gather the system message, the previous messages,
# and the input with the context
reformulate_query_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given the previous conversation, reformulate the user query in the last message to a full question. "
            "Return only the reformulated query, without any other text."
        ),   # system message
        MessagesPlaceholder(variable_name="messages")  # previous messages
    ]
)
class BasicPlusRAGChatbot(BasicRAGChatbot):

    _graph_path = "./graph_basic_plus_rag.png"

    def __init__(self, llm, k=5):
        super().__init__(llm, k)
        self.reformulate_query_prompt = reformulate_query_prompt

    def _build(self):
        # graph builder
        self._graph_builder = StateGraph(SimpleState)
        # add the nodes
        self._graph_builder.add_node("input", self._input_node)
        self._graph_builder.add_node("reformulate_query", self._reformulate_query_node)
        self._graph_builder.add_node("retrieve", self._retrieve_node)
        self._graph_builder.add_node("respond", self._respond_node)
        # define edges
        self._graph_builder.add_edge(START, "input")
        # basic rag: no planning, just always retrieve
        self._graph_builder.add_conditional_edges("input", self._is_quitting_node, {False: "reformulate_query", True: END})
        self._graph_builder.add_edge("reformulate_query", "retrieve")
        self._graph_builder.add_edge("retrieve", "respond")
        self._graph_builder.add_edge("respond", "input")
        # compile the graph
        self._compile()

    def _reformulate_query_node(self, state: SimpleState) -> dict:
        prompt = self.reformulate_query_prompt.invoke(state)
        generated_query = self.llm.invoke(prompt)
        # since we use the generated query instead of the user query,
        # we need to remove the user query from the messages
        state["messages"].pop(-1)
        return {
            "messages": generated_query # append the generated query to the messages
        }
basic_plus_rag_chatbot = BasicPlusRAGChatbot(llm, k=7)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
basic_plus_rag_chatbot.run()
================================ Human Message =================================

what do I have about RAG?


================================== Ai Message ==================================

What information do I have about RAG?


================================ Human Message =================================

On material of session 06.05 and session 13.05
The first pitch will be dedicated to a custom RAG chatbot that the contractors (the presentingstudents, see the infos about Pitches) will have prepared to present. The RAG chatbot will have to beable to retrieve specific information from the given documents (not from the general knowledge!) anduse it in its responses. Specific requirements will be released on 22.05.
Reading: see session 06.05, session 08.05, session 13.05, and session 15.05
19.06.Ausfalltermin

Retrieval tools
Agentic RAG
Core Reading:
 Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and HybridApproach (pages 1-7), Google DeepMind & University of Michigan 
A Survey on Retrieval-Augmented Text Generation for Large Language Models (sections 1-7), York
University 
Additional Reading:
Don’t Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks, National
Chengchi University & Academia Sinica

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 5 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Reading:
How to load PDFs, LangChain
Text splitters, LangChain
Embedding models, LangChain
Vector stores, LangChain
Retrievers, LangChain
 Retrieval augmented generation (RAG), LangChain
 LangGraph Quickstart: Build a Basic Chatbot (part 2), LangGraph
 Agentic RAG, LangGraph
Adaptive RAG, LangGraph
Multimodality, LangChain

Prompt Templates, LangChain
Few-shot prompting, LangChain
Week 413.05. Lecture: Virtual Assistants Pt. 2: RAGContinuing the first part, the second part will expand scope of chatbot functionality and will teach it torefer to custom knowledge base to retrieve and use user-specific information. Finally, the most widelyused deployment methods will be briefly introduced.
Key points:
General knowledge vs context
Knowledge indexing, retrieval & ranking
Retrieval tools
Agentic RAG
Core Reading:

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Week 8: Having Some Rest10.06.Ausfalltermin
12.06.Ausfalltermin
Week 917.06. Pitch: RAG Chatbot
On material of session 06.05 and session 13.05

Chengchi University & Academia Sinica 
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection, University of
Washington, Allen Institute for AI & IBM Research AI
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through QuestionComplexity, Korea Advanced Institute of Science and Technology
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models, Chinese
Academy of Sciences

Disclaimer: the reading entries are no proper citations; the bibtex references as well as detailed infosabout the authors, publish date etc. can be found under the entry links.


================================== Ai Message ==================================

Based on the provided context, here's what I found about RAG (Retrieval Augmented Generation):

1. **Definition and Purpose**: RAG is a technique used in large language models to retrieve specific information from a given knowledge base and use it in its responses.
2. **Requirements**: For the pitch on June 17th, a custom RAG chatbot needs to be prepared, which can retrieve specific information from given documents (not from general knowledge) and use it in its responses. Specific requirements will be released on May 22nd.
3. **Reading Materials**: There are several reading materials available on RAG, including:
	* "Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach" by Google DeepMind & University of Michigan (pages 1-7)
	* "A Survey on Retrieval-Augmented Text Generation for Large Language Models" by York University (sections 1-7)
	* "Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks" by National Chengchi University & Academia Sinica
4. **Types of RAG**: There are different types of RAG, including:
	* Agentic RAG
	* Adaptive RAG
	* Auto-RAG
	* Self-RAG
5. **Tools and Resources**: LangChain provides various tools and resources for RAG, including text splitters, embedding models, vector stores, retrievers, and prompt templates.
6. **Lecture and Pitch**: There will be a lecture on May 13th about Virtual Assistants Pt. 2: RAG, which will expand the scope of chatbot functionality and teach it to refer to a custom knowledge base. The pitch on June 17th will be dedicated to a custom RAG chatbot.

I hope this information helps! Let me know if you have any further questions.


================================ Human Message =================================

what is the name of the course?


================================== Ai Message ==================================

What is the name of the course being referred to in the context of the RAG chatbot and the upcoming pitch on June 17th?


================================ Human Message =================================

On material of session 06.05 and session 13.05
The first pitch will be dedicated to a custom RAG chatbot that the contractors (the presentingstudents, see the infos about Pitches) will have prepared to present. The RAG chatbot will have to beable to retrieve specific information from the given documents (not from the general knowledge!) anduse it in its responses. Specific requirements will be released on 22.05.
Reading: see session 06.05, session 08.05, session 13.05, and session 15.05
19.06.Ausfalltermin

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Week 8: Having Some Rest10.06.Ausfalltermin
12.06.Ausfalltermin
Week 917.06. Pitch: RAG Chatbot
On material of session 06.05 and session 13.05

On material of session 17.07
The course will be concluded by the final debates, after which a short Q&A session will be held.
Debate topics:
LLM Behavior: Evidence of Awareness or Illusion of Understanding?
Should We Limit the Usage of AI?
Reading: see session 17.07
Copyright © 2025, Maksim ShmaltsMade with Sphinx and @pradyunsg's Furo

Prompt Templates, LangChain
Few-shot prompting, LangChain
Week 413.05. Lecture: Virtual Assistants Pt. 2: RAGContinuing the first part, the second part will expand scope of chatbot functionality and will teach it torefer to custom knowledge base to retrieve and use user-specific information. Finally, the most widelyused deployment methods will be briefly introduced.
Key points:
General knowledge vs context
Knowledge indexing, retrieval & ranking
Retrieval tools
Agentic RAG
Core Reading:

12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 4 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
08.05. Lab: Basic LLM-based Chatbot
On material of session 06.05
In this lab, we’ll build a chatbot and try different prompts and settings to see how it affects the output.
Reading:
 Build a Chatbot, LangChain
 LangGraph Quickstart: Build a Basic Chatbot (parts 1, 3), LangGraph
 How to add summary of the conversation history, LangGraph

19.06.Ausfalltermin
Week 1024.06. Pitch: Handling Customer Requests in a Multi-agent Environment
On material of session 20.05

Academy of Sciences
Querying Databases with Function Calling, Weaviate, Contextual AI & Morningstar
15.05. Lab: RAG Chatbot
On material of session 13.05
In this lab, we’ll expand the functionality of the chatbot built at the last lab to connect it to user-specificinformation.


================================== Ai Message ==================================

The name of the course being referred to in the context of the RAG chatbot and the upcoming pitch on June 17th is "LLM-based Assistants".


================================ Human Message =================================

quit

3. Advanced RAG 😎

Now we can move to a more complicated implementation. We will no make an iterative RAG chatbot: this chatbot will retrieve contexts iteratively and decide at each step whether the chunks retrieved so far are sufficient to answer the question; the answer is generated only when the retrieved contexts are enough.

Basically, we have almost everything we need to implement an iterative RAG pipeline. We only need to add three more nodes:

  1. A node to generate search queries for the index: now we won’t use the user query but specifically generate queries for the index.

  2. A decision node, in which the LLM will decide whether the context retrieved so far is enough to proceed to generation of the response.

  3. Query transformer that will reformulate the query to retrieve further chunks when it’s needed.

As a useful addition, we will also add LLM-based filtering of the retrieved documents to filter out the documents that are semantically similar to the query but are not really relevant for answering the question.

Thus, we need to add 4 nodes totally.

We will start with the grader that will output a binary score for the relevance: True(relevant) or False (irrelevant). To implement this functionality, we’ll bind the LLM to a true/false structured output.

from pydantic import BaseModel, Field
class YesNoVerdict(BaseModel):
    verdict: bool = Field(..., description="Boolean answer to the given binary question.")

We will also need to transform the state to accumulate the contexts gathered so far.

class AdvancedRAGState(SimpleState): # "messages" is already defined in SimpleRAGState
    contexts: List[List[Document]]    # this is the list of retrieved documents, one list per retrieval

And we also need prompts to filter the documents, to decide whether the contexts are supportive enough, and to transform the query if not.

generate_query_template = """\
The user is asking a question. \
You have an access to a knowledge base. \
Your task is to generate a query that will retrieve the most relevant documents \
from the knowledge base to answer the user question. \
Return the query only, without any other text.


The user question is:
{input}
"""

generate_query_prompt = ChatPromptTemplate.from_template(generate_query_template)
context_relevant_template = """\
The user is asking a question. \
For answering the question, you colleague have retrieved the following document:


===========================
{context}
===========================


You task is to assess whether this document is relevant to answer the user question. \
Relevant means that the document contains specific information should be used \
directly to answer the user question. \
Return True if the document is relevant, and False otherwise.


The user question is:
{input}
"""

context_relevant_prompt = ChatPromptTemplate.from_template(context_relevant_template)
contexts_sufficient_template = """\
The user is asking a question. \

For answering the question, you colleague have retrieved the following document:


===========================
{contexts_str}
===========================


You task is to assess whether the retrieved documents contain an answer the user question. \
Return True if the documents are sufficient, and False otherwise.


The user question is:
{input}
"""

contexts_sufficient_prompt = ChatPromptTemplate.from_template(contexts_sufficient_template)
transform_query_template = """\
The user is asking a question. \

For answering the question, your colleague the following document has been retrieved:


===========================
{contexts_str}
===========================


To retrieve these documents, the following query has been used:
{query}


However, the query is not very good so the retrieved documents were not helpful. \
Your task is to transform the query into a better one, so that the retrieved documents are more relevant. \
Return the transformed query only, without any other text.


The user question is:
{input}
"""

transform_query_prompt = ChatPromptTemplate.from_template(transform_query_template)
class IterativeRAGChatbot(BasicPlusRAGChatbot):

    _graph_path = "./graph_iterative_rag.png"

    def __init__(self, llm, k=5, max_generations=4):
        super().__init__(llm, k)
        self.max_generations = max_generations
        self.boolean_llm = llm.with_structured_output(YesNoVerdict)
        self.generate_query_prompt = generate_query_prompt
        self.context_relevant_grader = context_relevant_prompt | self.boolean_llm
        self.contexts_sufficient_grader = contexts_sufficient_prompt | self.boolean_llm
        self.transform_query_prompt = transform_query_prompt

    def _build(self):
        # graph builder
        self._graph_builder = StateGraph(AdvancedRAGState)
        # add the nodes
        self._graph_builder.add_node("input", self._input_node)
        self._graph_builder.add_node("reformulate_query", self._reformulate_query_node)
        self._graph_builder.add_node("generate_query", self._generate_query_node)
        self._graph_builder.add_node("retrieve", self._retrieve_node)
        self._graph_builder.add_node("filter_documents", self._filter_documents_node)
        self._graph_builder.add_node("transform_query", self._transform_query_node)
        self._graph_builder.add_node("respond", self._respond_node)
        # define edges
        self._graph_builder.add_edge(START, "input")
        # basic rag: no planning, just always retrieve
        self._graph_builder.add_conditional_edges("input", self._is_quitting_node, {False: "reformulate_query", True: END})
        self._graph_builder.add_edge("reformulate_query", "generate_query")
        self._graph_builder.add_edge("generate_query", "retrieve")
        self._graph_builder.add_edge("retrieve", "filter_documents")
        self._graph_builder.add_conditional_edges(
            "filter_documents",
            self._contexts_sufficient_node,
            {
                False: "transform_query",
                True: "respond",
                None: END   # max generations reached
            }
        )
        self._graph_builder.add_edge("transform_query", "retrieve")
        self._graph_builder.add_edge("respond", "input")
        # compile the graph
        self._compile()

    def _generate_query_node(self, state: AdvancedRAGState) -> dict:    
        user_query = state["messages"][-1].content  # that will be the reformulated user query
        prompt = generate_query_prompt.invoke({"input": user_query})
        search_query = self.llm.invoke(prompt)
        return {
            "messages": search_query
        }

    # now store the contexts in the separate field
    def _retrieve_node(self, state: AdvancedRAGState) -> dict:    
        # retrieve the context
        query = state["messages"][-1].content  # that will be the generated query
        # now use the retriever directly to get a list of documents and not a combined string
        contexts = self.retriever.invoke(query)
        # add the context to the messages
        return {
            "contexts": state["contexts"] + [contexts]  # could have also used `Annotated` here
        }
    
    def _filter_documents_node(self, state: AdvancedRAGState) -> dict:
        query = state["messages"][-1].content  # that will be the generated query
        # since the retrieved documents are graded at the same step,
        # we only need to pass the last batch of documents
        contexts = state["contexts"].pop(-1)  # will be replaced with the filtered ones
        # grade each document separately and only keep the relevant ones
        relevant_contexts = []
        for context in contexts:
            print("Grading document:\n\n", context.page_content)
            verdict = self.context_relevant_grader.invoke(
                {
                    "context": context.page_content,    # this is a Document object
                    "input": query
                }
            )
            print(f"Verdict: {verdict.verdict}")
            print(f"\n\n=====================\n\n")
            if verdict.verdict:    # boolean value according to the Pydantic model
                relevant_contexts.append(context)
        return {
            "contexts": state["contexts"] + [relevant_contexts]  # could have also used `Annotated` here
        }
    
    def _contexts_sufficient_node(self, state: AdvancedRAGState) -> dict:
        query = state["messages"][-2].content   # that will be the reformulated user query, -1 is the generated search query
        all_contexts = state["contexts"]
        # flatten and transform the list of lists into a single list
        contexts = [context for sublist in all_contexts for context in sublist]
        contexts_str = "\n\n".join([context.page_content for context in contexts])
        print("Deciding whether the documents are sufficient")
        verdict = self.contexts_sufficient_grader.invoke(
                {
                "contexts_str": contexts_str,    # this is a Document object
                "input": query
            }
        )
        print(f"Verdict: {verdict.verdict}")
        print(f"\n\n=====================\n\n")
        if not verdict.verdict and len(all_contexts) == self.max_generations:
            return  # will route to END
        return verdict.verdict
    
    def _transform_query_node(self, state: AdvancedRAGState) -> dict:
        # since we will be replacing the user query with the transformed one,
        # we need to remove the old query
        search_query = state["messages"].pop(-1).content   # this is the generated search query
        # the the reformulated user query is the last message
        user_query = state["messages"][-1].content
        all_contexts = state["contexts"]
        # flatten and transform the list of lists into a single list
        contexts = [context for sublist in all_contexts for context in sublist]
        contexts_str = "\n\n".join([context.page_content for context in contexts])
        prompt = self.transform_query_prompt.invoke(
            {
                "contexts_str": contexts_str,
                "query": search_query,
                "input": user_query
            }
        )
        transformed_search_query = self.llm.invoke(prompt)
        return {
            "messages": transformed_search_query   # this will append the transformed query to the messages
        }
    
    def run(self):
        input = {"messages": [], "contexts": [], "suka": 0}
        for event in self.chatbot.stream(input, stream_mode="values"):
            if event["messages"]:
                event["messages"][-1].pretty_print()
                print("\n")
# make small k to ensure one retrieval is not enough
iterative_rag_chatbot = IterativeRAGChatbot(llm, k=2)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
iterative_rag_chatbot.run()
================================ Human Message =================================

what are the key point of the next lecture after the one on 13.05


================================== Ai Message ==================================

What are the key points of the next lecture after the one scheduled for May 13th?


================================== Ai Message ==================================

lecture schedule May 13th next lecture key points


================================== Ai Message ==================================

lecture schedule May 13th next lecture key points


Grading document:

 12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 1 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
To p i c s  O v e r v i e wThe schedule is preliminary and subject to changes!
The reading for each lecture is given as references to the sources the respective lectures base on. Youare not obliged to read anything. However, you are strongly encouraged to read references marked bypin emojis
Verdict: False


=====================


Grading document:

 12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 8 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
Week 8: Having Some Rest10.06.Ausfalltermin
12.06.Ausfalltermin
Week 917.06. Pitch: RAG Chatbot
On material of session 06.05 and session 13.05
Verdict: False


=====================


Deciding whether the documents are sufficient
Verdict: False


=====================


================================== Ai Message ==================================

lecture schedule May 13th next lecture key points


================================== Ai Message ==================================

next lecture after May 13th key points summary


================================== Ai Message ==================================

next lecture after May 13th key points summary


Grading document:

 12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 1 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
To p i c s  O v e r v i e wThe schedule is preliminary and subject to changes!
The reading for each lecture is given as references to the sources the respective lectures base on. Youare not obliged to read anything. However, you are strongly encouraged to read references marked bypin emojis
Verdict: False


=====================


Grading document:

 : those are comprehensive overviews on the topics or important works that are beneficialfor a better understanding of the key concepts. For the pinned papers, I also specify the pages span foryou to focus on the most important fragments. Some of the sources are also marked with a popcornemoji
Verdict: False


=====================


Deciding whether the documents are sufficient
Verdict: False


=====================


================================== Ai Message ==================================

next lecture after May 13th key points summary


================================== Ai Message ==================================

next lecture after May 13th summary of key points


================================== Ai Message ==================================

next lecture after May 13th summary of key points


Grading document:

 : those are comprehensive overviews on the topics or important works that are beneficialfor a better understanding of the key concepts. For the pinned papers, I also specify the pages span foryou to focus on the most important fragments. Some of the sources are also marked with a popcornemoji
Verdict: False


=====================


Grading document:

 12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 1 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
To p i c s  O v e r v i e wThe schedule is preliminary and subject to changes!
The reading for each lecture is given as references to the sources the respective lectures base on. Youare not obliged to read anything. However, you are strongly encouraged to read references marked bypin emojis
Verdict: False


=====================


Deciding whether the documents are sufficient
Verdict: False


=====================


================================== Ai Message ==================================

next lecture after May 13th summary of key points


================================== Ai Message ==================================

next lecture after May 13th key points summary notes


================================== Ai Message ==================================

next lecture after May 13th key points summary notes


Grading document:

 12.05.25, 17:28Topics Overview - LLM-based Assistants
Page 1 of 12https://maxschmaltz.github.io/Course-LLM-based-Assistants/infos/topic_overview.html
To p i c s  O v e r v i e wThe schedule is preliminary and subject to changes!
The reading for each lecture is given as references to the sources the respective lectures base on. Youare not obliged to read anything. However, you are strongly encouraged to read references marked bypin emojis
Verdict: False


=====================


Grading document:

 : those are comprehensive overviews on the topics or important works that are beneficialfor a better understanding of the key concepts. For the pinned papers, I also specify the pages span foryou to focus on the most important fragments. Some of the sources are also marked with a popcornemoji
Verdict: False


=====================


Deciding whether the documents are sufficient
Verdict: False


=====================


================================== Ai Message ==================================

next lecture after May 13th key points summary notes