Protocol specification
HI - Human Interaction Protocol For Agents/Bots
Overview
With the increasing use of AI agents in decision-making workflows, ensuring smooth, transparent, and standardized interactions between agents and human stakeholders is critical. The effective use of AI agents requires humans and AI working together as a team.
This document outlines key recommendations for designing agent-to-human interactions, integrating existing standards and protocols, and includes implementation examples using JavaScript (Node.js).
Scope and principles
HumanInteraction defines the standard for the Agents (LLM/thinking layer) to express what they expect from humans. The protocol specifies who should receive what, what is expected, some delivery constraints, and some context to execute the interaction.
- Supports only Agent-Human interaction, it does not support bots or inter-agent communication.
- The spec must be reusable for various delivery channels chat, email, and UI.
- The spec must handle privacy, content variants, and intent, but not delivery orchestration.
- Message visibility and personalization are critical — especially when messages fan out to multiple participants.
- The spec should cover context to ensure the message is not stale because of the delay in response or action or system state change or the change in conversation context
- The spec should provide info about expectation of response from the humans and a high-level context about how and when this expectation will be considered resolved. This can be non-deterministic for most cases, but the spec should be extensible to make it more deterministic.
- The spec data format must be simple for the LLMs to generate, reducing errors due to hallucination
- It should be easy to apply
HumanInteractionas a wrapping layer on top of current AI input/response structure, making its adoption easier
1. Core Design Principles
1.1 Human-in-the-Loop (HITL)
Agents should be able to:
- Pause workflows awaiting human decision.
- Escalate low-confidence outputs.
- Accept feedback and apply corrections.
1.2 Transparency & Auditability
Interactions must be:
- Logged with time stamps.
- Traceable (who approved what and why).
- Explainable (why did the agent reach its output?).
1.3 Usability (based on ISO 9241)
- Provide control to users.
- Display clear options (e.g., approve, revise, reject).
- Maintain a consistent and predictable interface.
1.4 Governance & Human Awareness (based on NIST HITL)
- Include metadata for reviewer identification and role-based accountability.
- Support fields that enable tracking of human factors such as trust and cognitive load.
- Accommodate multi-human collaboration, consensus resolution, and traceable roles.
2. Specification: Agent-Human Interaction Protocol
2.1 Purpose
To define a standard message schema and interaction flow between autonomous agents and one or more human reviewers.
2.1.1 We propose HumanInteraction, a specification of Agent’s request for interaction with humans to deliver the best outcome.
2.1.2 Agent prescribes the one or more HumanInteraction and the Human provides the HumanFeedback.
2.1.3 Agent may prescribe multiple HumanInteraction in a single interaction.
2.1.4 The result of HumanInteraction (the HumanFeedback) can be consumed by the Agent to deliver the best outcome.
2.1.5 These interactions are logged and can be used for accountability and governance.
2.1.6 These interactions can be as simple as a single question to one human or as complex as approval from multiple people in a sequence, notification to others, and optional input from some. These multi-party interactions may lead to conflicts and require resolution.
2.1.7 The HumanInteraction can be a necessary step (e.g. clarification or approval) to complete the human’s initial request and generally require Agent to leverage this human feedback to move forward with the request. HumanInteraction can be the final outcome as well e.g. notifying humans about the outcome of the request.
2.1.8 For the simplicity, the scope of HumanInteraction is limited to the immediate interaction after an event (agent’s output), not the series of interactions with one human. In simpler words, HumanInteraction represents the smallest unit of interaction prescribed by the Agent.
2.2 Message Format (JSON)
A generic interaction message supporting unified collaboration can be represented in JSON:
{ "interactionId": "abc123", "groupId": "grp789", "coordinationMode": "centralized", "resolutionPolicy": "majority", "timestamp": "2025-06-24T10:00:00Z", "agentInstanceId": "agent-XZ01:exec-5589", "participants": [ { "id": "user-108", "role": "compliance_analyst" }, { "id": "user-222", "role": "finance_approver" } ], "agentOutput": { "summary": "This invoice exceeds the threshold.", "confidence": 0.62 }, "type": "approval", "urgency": "blocking", "userActions": ["approve", "edit", "reject", "submit_text"], "status": "awaiting_input", "humanFactors": { "trustLevel": 0.8, "fatigueScore": 0.2 }, "history": []}Field Reference Table
| Field | Type | Purpose |
|---|---|---|
interactionId | string | Unique ID for the interaction instance |
groupId | string | Optional. Used to link related interactions (for distributed mode) |
coordinationMode | string | centralized or distributed—indicates how multiple participants are handled |
resolutionPolicy | string | first_response, majority, all_required, etc. |
timestamp | ISO8601 string | Time when interaction was created |
agentInstanceId | string | Unique id for the agent instance |
participants | array | List of participant metadata (ID and role) |
agentOutput | object | Summary and confidence from the agent |
type | string | Type of interaction (e.g., approval, notification) |
urgency | string | Urgency level (e.g., blocking, recommended) |
userActions | array | Permitted actions for the human participants |
status | string | Current state of the interaction |
humanFactors | object | Trust and fatigue levels, if applicable |
history | array | Logs of participant responses and decisions |
2.3 Interaction Workflow
- Agent sends output with metadata, type, and urgency.
- System evaluates whether human input is required.
- If required, each participant may act based on their role.
- Human responses are logged independently in the
history. - The system determines resolution logic (e.g., first response, majority, all must act).
- Feedback can also be initiated unsolicited by any user.
- System updates status and logs results.
2.4 Error Handling
- Invalid or missing interaction ID (when expected): return 404.
- Unsupported action: return 400.
- Allow for unsolicited feedback with synthetic interaction ID.
- Detect and manage conflicting inputs.
- Log all interaction attempts for auditing.
3. Interaction Scenarios and State Transitions
This section describes real-world scenarios using the unified interaction spec, showing how state changes throughout the interaction lifecycle.
Scenario 1: Agent-Initiated Single Human Interaction
-
Trigger: Agent detects low confidence or high-risk condition.
-
Creates: One interaction with a single participant.
-
State flow:
status = awaiting_input- Participant submits feedback →
history[]updated - System applies resolutionPolicy (e.g.,
first_response) →status = resolved
Scenario 2: Agent-Initiated Multi-Human Interaction
-
Trigger: Agent requires input from multiple roles (e.g., compliance + finance).
-
Creates: One centralized interaction (or multiple linked ones with
groupId) -
State flow:
status = awaiting_input- Participants submit feedback independently →
history[]accumulates - Resolution policy applied (e.g.,
majority,all_required) →status = resolved
Scenario 3: Human-Initiated Agent Interaction
-
Trigger: A user wants to override, revise, or raise a new issue not prompted by the agent.
-
Creates: A feedback-only interaction where:
agentOutput = nulltype = feedback,urgency = optional
-
State flow:
- System logs feedback as
status = received - Agent (or participant dashboard) can reference this input for future decisions
- System logs feedback as
Each scenario uses the same interaction schema, enabling consistent logging, UI behavior, and downstream auditing.
4. Node.js Implementation Example
The following Node.js example demonstrates a full interaction lifecycle:
4.1 Step-by-Step Workflow
- Agent sends a request for human input via
/agent-output. - The system stores this interaction and returns it to the agent or UI client.
- One or more human participants submit feedback through
/human-feedback. - The system tracks individual responses and evaluates them based on
resolutionPolicy. - Once the resolution is met (e.g., majority approval), the system updates the status and notifies the agent or downstream process.
4.2 Agent returns human interaction requirement
// Agent running...// Needs human feedback...// Creates a new HumanInteraction request...// Pauses execution (saves the state with link to the HI request)...const interaction = { interactionId, groupId: groupId || null, coordinationMode: coordinationMode || 'centralized', resolutionPolicy: resolutionPolicy || 'first_response', timestamp: new Date().toISOString(), agentInstanceId, participants, agentOutput, type, urgency, userActions: ["approve", "edit", "reject"], status: 'awaiting_input', humanFactors: {}, history: []};4.3 POST /human-feedback — Human submits feedback
app.post('/human-feedback', (req, res) => { const { interactionId, participantId, action, reason, inputText } = req.body; const interaction = interactionLog.find(i => i.interactionId === interactionId); if (!interaction) return res.status(404).send('Interaction not found'); interaction.history.push({ participantId, action, reason: reason || null, inputText: inputText || null, timestamp: new Date().toISOString() }); // Resolution logic example: auto-resolve on first valid response if (interaction.resolutionPolicy === 'first_response' && interaction.history.length > 0) { interaction.status = 'resolved'; } else if (interaction.resolutionPolicy === 'majority') { const counts = interaction.history.reduce((acc, h) => { acc[h.action] = (acc[h.action] || 0) + 1; return acc; }, {}); if (counts['approve'] >= Math.ceil(interaction.participants.length / 2)) { interaction.status = 'resolved'; } } res.status(200).json({ message: 'Feedback recorded', interaction });});5. Frontend Integration Example
To use the interaction specification on the frontend, developers should render interaction messages as tasks or notifications with the ability for human reviewers to respond. Here’s a basic React-based example:
5.1 React UI Example (Simplified)
import React, { useState, useEffect } from 'react';import axios from 'axios';
const InteractionReview = ({ interactionId }) => { const [interaction, setInteraction] = useState(null); const [action, setAction] = useState(''); const [inputText, setInputText] = useState('');
useEffect(() => { axios.get(`/api/interactions/${interactionId}`) .then(res => setInteraction(res.data)) .catch(console.error); }, [interactionId]);
const submitFeedback = async () => { await axios.post('/human-feedback', { interactionId, participantId: 'user-108', action, reason: 'Reviewed by analyst', inputText }); alert('Feedback submitted.'); };
if (!interaction) return <p>Loading...</p>;
return ( <div> <h2>Interaction Review</h2> <p><strong>Summary:</strong> {interaction.agentOutput.summary}</p> <p><strong>Confidence:</strong> {interaction.agentOutput.confidence}</p> <label>Action: <select value={action} onChange={e => setAction(e.target.value)}> {interaction.userActions.map(a => <option key={a}>{a}</option>)} </select> </label> <br /> <label>Comment: <textarea value={inputText} onChange={e => setInputText(e.target.value)} /> </label> <br /> <button onClick={submitFeedback}>Submit Feedback</button> </div> );};
export default InteractionReview;5.2 Notes
- This frontend consumes the
/agent-outputAPI to render content and/human-feedbackto submit responses. - You can enhance this with user authentication, reviewer role display, or timeline/history views.
6. Recommendations
- Use structured JSON messages with clearly defined fields.
- Include
agentInstanceId, and a list ofparticipantsfor multi-party accountability. - Use
groupId,coordinationMode, andresolutionPolicyto unify centralized and distributed workflows. - Track individual participant responses in
historyfor transparency and resolution logic. - Accommodate cognitive metrics (
trustLevel,fatigueScore) to inform adaptive UX. - Support unsolicited feedback to enhance adaptability.
- Implement secure, API-driven collaboration flows with audit trails.
7. Relevant Standards & Specifications
7.1 NIST HITL Guidelines
- Standardized modular components for interaction.
- Clearly structured data messages for approvals and annotations.
- Governance fields for accountability.
- Human-awareness tracking to prevent overtrust or fatigue.
7.2 Model Context Protocol (MCP)
- Proposed for interoperable agent-to-human interactions.
- Useful for secure data exchange, context sharing.
7.3 ISO 9241-210 & 110
- Human-centered design principles for interactive systems.
- Dialogue principles for user interaction models.
8. Conclusion
By adopting these recommendations, developers can create human-centric, reliable, and accountable AI systems. The integration of HITL principles, support for multi-human collaboration, and standardized protocols ensures that human oversight remains effective and scalable in AI-driven workflows.
9. References
- National Institute of Standards and Technology (NIST), Special Publication 1270, Human-Centered Artificial Intelligence (Draft), 2022. https://doi.org/10.6028/NIST.SP.1270-draft
- Model Context Protocol (MCP)
- International Organization for Standardization, ISO 9241-210:2019 – Ergonomics of human-system interaction — Part 210: Human-centred design for interactive systems.
- International Organization for Standardization, ISO 9241-110:2020 – Ergonomics of human-system interaction — Interaction principles.
- Azorus: Commitments over Protocols for BDI Agents
- Chain of Thought Explanation for Dialogue State Tracking
- Agent Commitments and Ranking of Commitment Protocols
Three Abstraction Levels to Choose From
Option A: Minimal Schema (Core Interaction Only)
Focuses only on task-level coordination.
Includes:
- Interaction type, status, participants
- Resolution policies
- Agent output + human feedback
Excludes:
- Message content
- Visibility rules
- Personalization
Pros: Very clean, interoperable.
Cons: Not enough for chat/email personalization or privacy handling.
Option B: Intent-Aware with Privacy Recommendations (Recommended direction)
Includes everything needed for communication-aware interaction, but leaves final delivery formatting to client logic.
Adds:
initiatedByAgent: who started the interactionaudience: who is part of the interaction and in what rolemessageVariants: personalized content per recipientvisibilityPreferences: privacy guidance (e.g. “don’t reply all”)
This empowers the LLM/thinking layer to encode:
- Who needs to be contacted
- What content is meant for each person
- Whether threads should be isolated
- Whether observers (e.g. CCs) can see full message content
Pros: Balances structure with flexibility; leaves delivery decisions to infra; enables personalized multi-human collaboration.
Cons: Slightly more complex schema.
Option C: Delivery-Prescriptive (Too Heavy)
Starts modeling actual channels, delivery modes, and threading (e.g.,
"sendAs": "direct_email","threadId")
This creates a communication protocol on top of the interaction spec.
Cons: Oversteps the schema’s domain; inflexible across platforms; risks tight coupling with delivery channels.
<--->