API Calls

AI Overview

Author

Kerry Back

Published

May 11, 2025

This HBS case, about implementing generative AI at Deloitte Canada, is a useful tool for teaching the benefits, risks, and strategies of implementing AI in a corporate environment. Efficiency is of course the key benefit, and data security, client trust, and compliance are key concerns. The general strategy for implementation involves creating a custom channel through which all employees chat with an LLM, a channel that passes relevant corporate standards to the LLM in each chat, that allows monitoring, and that includes a guarantee of data privacy. To complement the case, I show students that they can use Julius to create a mini version of this, namely a custom chatbot that adds whatever context they want to each prompt. This can be done with a custom GPT from OpenAI, but seeing code that produces a custom chatbot makes everything more transparent.

To create this, we need an API key from an LLM provider. As of this writing, OpenAI provides keys even on free accounts. Billing is on a per-use basis, and it is miniscule at demonstration scales. As I will explain, there are also API keys for LLMs that are completely free. Julius provides a facility for storing an API key as a “secret key.” It is not exposed to the LLM during the coding that creates the custom chatbot.

To start the exercise, it is useful to show students the code that executes an API call without all the surrounding code that creates an app. I gave Julius the following example prompt:¹ “send the following to openAI GPT-4 using my api key that is stored as a secret key - What is the best type of pizza? Answer in pig Latin.” Here is the code that Claude 3.5 wrote in response:

from openai import OpenAI
import os

# Initialize the OpenAI client with the API key
client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

# Send the query
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "What is the best type of pizza?  \
        Answer in pig Latin."}
    ]
)

# Print the response
print("Response:", response.choices[0].message.content)

We can see from the code that the messages will be sent to OpenAI’s GPT-4 (note that the LLM you’re using in Julius will balk if you ask for an API call to an LLM that appeared after the training of your Julius LLM - for example, you can’t get GPT-o3 using Claude 3.5 - but you can edit the code in Julius to select the model you want). The “messages” object is a list containing a single dictionary (it can have many more, as we will see in a moment). The dictionary has two keys: “role” and “content.” The response object contains the LLM’s response, and the content of the response is accessed as response.choices[0].message.content. When the code is executed, the computer executing it sends the messages, gets the response, and prints the content of the response. All of the work is done by the “client” object, which is created by the OpenAI function, using my API key that the os.environ.get function reads from its “secret key” storage. What I got from Julius after the code executed was “The GPT-4 model responded with: Response: Epperonipay.””

To create a custom chatbot, it is enough to say something like “Create a custom chatbot as a streamlit app. It should send prompts to GPT-4 using my openai api key that is stored as a secret key. The following additional context should be provided: You are a business professional. Answer in a business-like fashion.” Of course, you could also do more interesting things, like requesting an answer in Pig Latin perhaps. The heart of the code that Claude 3.5 wrote for me is

messages = [
        {"role": "system", "content": 
        "You are a business professional.  \
        Answer in a business-like fashion."}
    ]
    
    while True:
        # Get user input
        user_input = input("You: ").strip()
        
        # Check for quit command
        if user_input.lower() == 'quit':
            break
        
    # Add user message to history
    messages.append({"role": "user", "content": user_input})
    

        # Get response from OpenAI
        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            stream=True
        )
        
        # Print assistant response
        print("Bot: ", end="", flush=True)
        full_response = ""
        
        # Stream the response
        for chunk in response:
            if chunk.choices[0].delta.content is not None:
                content = chunk.choices[0].delta.content
                print(content, end="", flush=True)
                full_response += content
        
        print()  # New line after response
        
        # Add assistant response to history
        messages.append({"role": "assistant", "content": full_response})

This initializes the message list with the context I provided about being a business professional. Then it sets up a “while” loop that runs until the app is exited. Within the loop, each new prompt is added to the message list and the now longer list is sent to the LLM. The LLM response is printed each time and added to the message list also. In this way, the LLM receives the entire chat history following each user prompt.

On the first occasion that I requested this Streamlit app from Julius, I received instead a python script (a .py file) that when executed in my terminal created a chatbot running in the terminal. That is interesting too. If/when you do get a Streamlit app and if you want to deploy it to the cloud, take advantage of the cloud provider’s secret key procedure to maintain security of your API key.

We can get free access to a number of open source LLMs via services like Hugging Face and Open Router. Open Router describes itself as “The Unified Interface for LLMs.” It provides access to over 300 LLMs as of this writing. It is worth visiting Open Router in class just to show students the number of LLMs that have been created.

The essence of the Open Router service is that we get an API key from the provider(s) we want to use, store the key(s) at Open Router, and also get an API key from Open Router. We direct all our API calls to Open Router, specifying an LLM to use, and Open Router forwards the calls to the LLM. This provides a unified syntax for API calls (Open Router uses the OpenAI standard). For someone who is not a heavy user of API calls, the prime benefit of using Open Router is that it provides free access to many open source LLMs.

I take this discussion as an opportunity to show students Replit, which is an AI + coding platform focused on app creation. Before class, as an illustration, I created a chatbot app that scrapes headlines from The Guardian and sends them to the free LLama 4 Scout model for a determination as to whether it is a good, bad, or mixed news day and then allows for additional chatting. It is, of course, just a toy, but it is a concrete example of the chatbot creation process. By the way, I was happy I did this before class rather than during class, because Replit experienced a frustrating amount of confusion between OPENAI_API_KEY and OPEN_ROUTER_API_KEY. LLMs are usually very smart but can sometimes be surprisingly dumb.

Also on substack at kerryback.substack.com

Footnotes

Pig Latin is a very old English language variation, mostly of interest to children, that moves any initial consonant phoneme of a word to the end and also adds “ay” to the end of each word. I think I should credit this example to some blog post that I read sometime, but which post escapes me.↩︎