How OpenAI Works in a B2B AI Assistant

When an employee asks "What is our parental leave policy?" in Microsoft Teams, a lot happens before OpenAI sees the question. Understanding the full flow helps you evaluate AI vendors, design better internal tools, and ask the right security questions.

OpenAI is layer 4 of 6Auth, permissions, and retrieval all happen before the model sees the question
RAG grounds answers in your documentsWithout retrieval, the model answers from general training data only
The wrapper determines data securityPermissions, logging, and key handling are all application-layer concerns
Model choice matters less than architectureA well-built wrapper on GPT-4o beats a poorly built one on any model
The Full Request Flow

What happens when an employee asks a question

Let us trace a single question from a new hire in Google Chat: "Where do I submit my equipment request?"

  1. 1

    Message delivery

    Google Chat receives the message and sends it to the registered bot webhook URL. The message payload includes the text, sender identity, and conversation context. This is a raw HTTP POST to the AI tool's backend.

  2. 2

    Authentication and permission lookup

    The backend validates that the sender is an authorized user (via SSO token or Google identity). It then looks up which department they belong to - in this case, Onboarding - and which knowledge base and instructions apply to them.

  3. 3

    RAG retrieval from the permitted knowledge base

    The question is converted into a vector embedding. The system searches the Onboarding department's vector store for the most relevant document chunks - in this case, sections of the onboarding handbook covering equipment requests. Only documents in the Onboarding store are searched; HR or Finance documents are inaccessible from this query path.

  4. 4

    Prompt assembly

    The system constructs the final prompt sent to OpenAI. It combines: the system instructions (tone, format, limitations), the retrieved document chunks as context, the conversation history if relevant, and the user's question. This assembled prompt is what OpenAI actually receives.

  5. 5

    OpenAI model call

    The assembled prompt is sent to the OpenAI API (using your BYOK key or the vendor's key). GPT-4o reads the context and generates a response grounded in the retrieved document content. The model does not have access to documents you have not uploaded - it answers based on what was retrieved in step 3.

  6. 6

    Response delivery and logging

    The response is formatted and sent back to Google Chat. The system logs the query, the retrieved sources, the response, and the user identity in the audit log. The employee sees the answer without ever leaving their chat app.

Why the Wrapper Matters

What you lose when the application layer is weak

No auth: anyone can query

Without proper authentication, anyone who discovers the bot endpoint can ask questions and receive answers from your internal documents. Auth is not optional - it is the first security gate.

No permissions: wrong documents surface

Without source-level permissions in the retrieval step, a Finance employee can ask a question and receive an answer grounded in HR documents they should not see. The model does not know your org structure - the wrapper does.

No RAG: generic answers only

Without document retrieval, the model answers from its general training data. Ask about your PTO policy and you get a generic answer about typical PTO policies - not your actual handbook. RAG is what makes the AI useful for internal knowledge.

No audit log: no accountability

Without logging, you cannot answer "what did the AI tell employees last month?" or investigate a report that the AI gave incorrect advice. Audit logging is also required by many compliance frameworks once AI touches employee or financial data.

FAQ

How B2B AI assistants work - common questions

No. The retrieval step selects only the most relevant chunks from your document library - typically 3 to 10 passages. Only those chunks are included in the prompt sent to OpenAI. The full library is never sent as context. This is more efficient (fewer tokens, lower cost) and also means the model's answer is grounded in specific retrieved content rather than vague general knowledge.
The system prompt typically includes: the persona or role of the assistant (e.g. "You are the HR assistant for Acme Corp"), behavioral instructions (tone, format, length), constraints (e.g. "only answer questions based on the provided context"), and the retrieved document chunks as context. It is assembled by the application layer for each query - not a static one-size-fits-all instruction.
Yes, though good system prompt design significantly reduces it. RAG provides grounding context but the model can still generate statements that go beyond the retrieved text. The best mitigation is a system prompt that instructs the model to answer only from the provided context and to say it does not know when the retrieved documents do not contain a clear answer. Citation of retrieved sources also allows users to verify answers.
ChatGPT uses only the model's general training data plus anything you paste into the conversation. A B2B AI assistant adds: automatic retrieval from your private documents, authentication so only authorized employees can use it, per-department isolation, system prompt customization, audit logging, and delivery inside tools like Teams or Google Chat. The model underneath may be the same - the wrapper is what makes it useful and safe for enterprise use.

The full application layer, ready to deploy.

ChatGridAI handles auth, RAG, permissions, BYOK, and audit logging. You just upload your documents.

$5/seat/month - 14-day free trial - no credit card required