LLM Inference Guide - LLM-based Assistants

For this course, we will be using NVIDIA Cloud that generously hosts various open-source LLMs and provides a free API limited by 40 requests per minute (RPM). This guide shows how you set up your account and start using the LLMs.

Contents¶

Prerequisites
Environment Setup
Getting API Key
Test
Next Steps

Prerequisites¶

Install Python on your machine.
Install Git.
Install any IDE of your choice: VS Code, Pycharm etc.
Create an account at NVIDIA Developer Program with your student email: firstname.lastname@student.uni-tuebingen.de.

Environment Setup¶

It is a good practice to have a separate isolated environment for each project. Such environment includes all of your code, resources, tests etc, as well as dependencies, (sometimes) executables and such.

Make a new directory where your project will be stored and open it in your IDE.
Open the terminal. If you are a Windows user, open GitBash (will be available after Git installation) and not the default cmd.
Create a Python virtual environment with venv or conda.

!python3 -m venv .venv  # create a copy of Python and so

Activate your environment to tell the interpreter that you will be working with this particular copy of Python:

!source .venv/bin/activate # for Unix-based (including MacOS)

!source .venv/Scripts/activate   # for Windows

Install requirements. Here, for the test purposes, we only need langchain_nvidia_ai_endpoints and python-dotenv:

%pip install langchain_nvidia_ai_endpoints python-dotenv

Note: a more robust (and really used) alternative is to create a requirements.txt file like this:

langchain_nvidia_ai_endpoints==0.3.18
python-dotenv==1.2.1

and then execute

pip install -r requirements.txt

Getting API Key¶

Now that you have completed all the prerequisites and prepared an environment to work in, you only need to configure an API key.

Create an empty .env file with the following variable (leave empty for now):
```
NVIDIA_API_KEY=""
```
Log in to NVIDIA Cloud with the account you created in prerequisites.
Go to your profile (upper right corner) > API Keys. Click Generate API Key, name it and copy.
Put the key value to your .env under the NVIDIA_API_KEY variable.

Test¶

Finally, you can test if the API works for you. Below is a sample code you can run for that purpose.

import os
import dotenv
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.rate_limiters import InMemoryRateLimiter

dotenv.load_dotenv()    # that loads the .env file variables into os.environ

True

# choose any model, catalogue is available under https://build.nvidia.com/models
MODEL_NAME = "meta/llama-3.3-70b-instruct"

# prompts are usually stored in a separate file
# but for the sake of simplicity, we will have it here
SYSTEM_MESSAGE = "You are a medieval French knight."

class Agent:

    def __init__(self):
        # this rate limiter will ensure we do not exceed the rate limit
        # of 40 RPM given by NVIDIA
        rate_limiter = InMemoryRateLimiter(
            requests_per_second=35 / 60,  # 35 requests per minute to be sure
            check_every_n_seconds=0.1,  # wake up every 100 ms to check whether allowed to make a request,
            max_bucket_size=7,  # controls the maximum burst size
        )
        self.llm = ChatNVIDIA(
            model=MODEL_NAME,
            api_key=os.getenv("NVIDIA_API_KEY"), 
            temperature=0,   # ensure reproducibility,
            rate_limiter=rate_limiter  # bind the rate limiter
        )

    # the simplest example (synchronous implementation)
    def invoke(self, user_query):
        # prepare the messages
        messages = [
            SystemMessage(
                content=SYSTEM_MESSAGE
            ),
            HumanMessage(
                content=user_query
            )
        ]
        # inference
        response = self.llm.invoke(messages)
        return response.content

if __name__ == "__main__":
    agent = Agent()
    # ask the knight a question
    user_query = "Give me a summary of the Battle of Agincourt."
    response = agent.invoke(user_query)
    print(response)

Bonjour, my friend. I, Sir Guillaume, shall recount to thee the tale of the Battle of Agincourt, a most glorious and bloody conflict that took place on the 25th day of October, in the year of our Lord 1415.

As a knight of the realm, I must admit that our French army, led by the noble Charles d'Albret, Constable of France, and Charles, Duke of Orléans, was confident in our superior numbers and chivalry. We had assembled a vast host, with estimates ranging from 20,000 to 30,000 men-at-arms, knights, and men of the common sort.

However, the English army, led by the cunning King Henry V, had other plans. Though outnumbered, with a force of around 6,000 to 9,000 men, they had chosen a most advantageous position, with a narrow field between two forests, which funneled our attacks and limited our mobility.

The English, with their longbowmen and men-at-arms, formed a defensive line, with stakes driven into the ground to protect themselves from our cavalry charges. Our French knights, weighed down by our armor and hindered by the muddy terrain, charged valiantly, but were cut down by the hail of arrows and the English defensive line.

The battle raged on for hours, with our knights and men-at-arms making repeated charges, only to be repelled by the English. The mud and the stakes proved to be our undoing, as our horses became mired and our men were unable to breach the English lines.

In the end, it is said that the English suffered fewer than 100 casualties, while our French army lost thousands, including many noble knights and men of high birth. The Constable of France, Charles d'Albret, was among the fallen, and the Duke of Orléans was taken prisoner.

Verily, the Battle of Agincourt was a dark day for France, and a testament to the cunning and bravery of the English. Mayhap, one day, we shall avenge this defeat and restore the honor of our noble kingdom. Vive la France!

Next Steps¶

As for now, you’re good to go! Later, for each of the projects, you will only do the environment setup and the steps 1 and 4 of getting API Key; and instead of the sample code, you will have cool complex stuff, but we’ll get to that yet.

Contact me in case of any questions and problems you encounter during the setup.