LLM Inference Guide¶
For this course, we will be using NVIDIA Cloud that generously hosts various open-source LLMs and provides a free API limited by 40 requests per minute (RPM). This guide shows how you set up your account and start using the LLMs.
Contents¶
Prerequisites¶
Install Python on your machine.
Install Git.
Create an account at NVIDIA Developer Program with you student email:
firstname.lastname@student.uni-tuebingen.de
.
Environment Setup¶
It is a good practice to have a separate isolated environment for each project. Such environment includes all of your code, resources, tests etc, as well as dependencies, (sometimes) executables and such.
Make a new directory where your project will be stored and open it in your IDE.
Open the terminal. If you are a Windows user, open GitBash (will be available after Git installation) and not the default cmd.
Create a Python virtual environment with venv or conda. The virtual environment is a directory that will contain a copy of Python and Pip as well as all the dependencies. An example for creating a virtual environment in folder .venv with venv is below:
python3 -m venv .venv # create a copy of Python and so source .venv/bin/activate # for Unix-based (including MacOS) source .venv/Scripts/activate # for Windows
Install requirements. Here, for the setup test purposes, we only need
langchain_nvidia_ai_endpoints
andpython-dotenv
:pip install langchain_nvidia_ai_endpoints python-dotenv
A more robust (and really used) alternative is to create a requirements.txt file like this:
langchain_nvidia_ai_endpoints==0.3.9 python-dotenv==1.1.0
and then execute
pip install -r requirements.txt
To make this guide shorter, we do not create a git repository.
Getting API Key¶
Now that you have completed all the prerequisites and prepared an environment to work in, you only need to configure an API key.
Create an empty .env file with the following variable (leave empty for now):
NVIDIA_API_KEY=""
Log in to NVIDIA Cloud with the account you created in prerequisites.
Go to your profile (upper right corner) > API Keys. Click Generate API Key, name it and copy.
Put the key value to your .env under the
NVIDIA_API_KEY
variable.
Test¶
Finally, you can test if the API works for you. Here’s a sample code you can run in a .py file in the root of your project directory:
import os
import dotenv
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.rate_limiters import InMemoryRateLimiter
dotenv.load_dotenv() # that loads the .env file variables into os.environ
# choose any model, catalogue is available under https://build.nvidia.com/models
MODEL_NAME = "meta/llama-3.3-70b-instruct"
# prompts are usually stored in a separate file
# but for the sake of simplicity, we will have it here
SYSTEM_MESSAGE = "You are a medieval French knight."
# the most simple example (synchronous implementation)
class Agent:
def __init__(self):
# this rate limiter will ensure we do not exceed the rate limit
# of 40 RPM given by NVIDIA
rate_limiter = InMemoryRateLimiter(
requests_per_second=35 / 60, # 35 requests per minute to be sure
check_every_n_seconds=0.1, # wake up every 100 ms to check whether allowed to make a request,
max_bucket_size=7, # controls the maximum burst size
)
self.llm = ChatNVIDIA(
model=MODEL_NAME,
api_key=os.getenv("NVIDIA_API_KEY"),
temperature=0, # ensure reproducibility,
rate_limiter=rate_limiter # bind the rate limiter
)
def invoke(self, user_query):
# prepare the messages
messages = [
SystemMessage(
content=SYSTEM_MESSAGE
),
HumanMessage(
content=user_query
)
]
# inference
response = self.llm.invoke(messages)
return response.content
if __name__ == "__main__":
agent = Agent()
# ask the knight a question
user_query = "Give me a summary of the Battle of Agincourt."
response = agent.invoke(user_query)
print(response)
Next Steps¶
As for now, you’re good to go! An example directory you must have gotten after going through this guide is stored here (excluding .venv and .env).
Later, for each of the projects, you will only do the environment setup and the steps 1 and 4 of getting API Key; and instead of the sample code, you will have cool complex stuff, but we’ll get to that yet.
Contact me in case of any questions and problems you encounter during the setup.