Voice Task 📞
Conducts an AI-powered voice conversation over a phone call.
Node type: voiceTask
Category: AI
Actor: voiceTask (2 threads)
Description​
The Voice Task initiates or receives a phone call and runs an AI voice conversation. It combines:
- Text-to-Speech (TTS) — converts AI responses to audio
- Speech-to-Text (STT) — transcribes what the caller says
- LLM — the AI brain that decides what to say
- DTMF detection — touch-tone keypad input support
The Voice Task is designed for automated phone agents, IVR replacements, appointment reminders, outbound campaign calls, and voice-based intake flows.
Properties​
| Property | Type | Required | Description |
|---|---|---|---|
aiProviderConnectionId | string | Yes | AI provider for conversation intelligence |
smsConnectionId | string | Yes | SMS/voice provider integration (e.g., Twilio) |
toPhoneNumber | text | Yes | Phone number to call in E.164 format. Supports {varName} |
voiceProvider | select | No | Voice infrastructure provider (default: configured integration) |
ttsProvider | select | No | TTS engine: ELEVENLABS, GOOGLE, AWS_POLLY, OPENAI |
ttsVoice | text | No | Voice ID or name for the TTS engine |
language | text | No | BCP-47 language code for STT/TTS (e.g., en-US, es-ES) |
transcriptionProvider | select | No | STT engine: DEEPGRAM, GOOGLE, WHISPER |
speechModel | text | No | Specific speech model ID |
interruptible | checkbox | No | Allow caller to interrupt the AI while speaking (default: false) |
interruptSensitivity | select | No | low, medium, high — how easily speech interrupts TTS |
dtmfDetection | checkbox | No | Enable DTMF (keypad press) detection |
systemPrompt | textarea | No | System context for the AI voice agent |
agentName | text | No | Name the AI agent introduces itself as |
agentRole | text | No | Role description read to the AI as context |
greeting | text | No | First thing the AI says when the call connects |
Inputs​
toPhoneNumber: {customerPhone}
greeting: Hello! This is Alex from {companyName}. I'm calling about your appointment on {appointmentDate}.
systemPrompt: You are Alex, a friendly appointment reminder agent for {companyName}.
The customer is {customerName}. Their appointment is on {appointmentDate} at {appointmentTime}.
Your goal: confirm or reschedule the appointment.
agentName: Alex
agentRole: Appointment Reminder Agent
Outputs​
When the call completes:
| Variable | Type | Description |
|---|---|---|
{callTranscript} | string | Full transcript of the voice conversation |
{callDuration} | number | Call duration in seconds |
{callStatus} | string | COMPLETED, NO_ANSWER, BUSY, FAILED |
{callOutcome} | string | AI-determined outcome (set by AI's final response/tool call) |
{callSid} | string | Provider call SID for tracking |
The AI can set custom outcome variables by returning structured JSON in its final turn, similar to AI Task structured output.
Call Flow​
[Voice Task node reached]
↓
[Engine calls phone provider → initiates call to {toPhoneNumber}]
↓
[Call connects → TTS plays greeting]
↓
[STT transcribes caller speech → sent to LLM]
↓
[LLM generates response → TTS plays response]
↓
[Conversation continues until AI ends call or caller hangs up]
↓
[Call ends → transcript and outcome stored]
↓
[Workflow continues with {callTranscript}, {callStatus}, {callOutcome}]
Connections​
| Connection | Description |
|---|---|
sequenceFlow (incoming) | Arrives from previous node |
successFlow | Call completed (regardless of call outcome) |
errorFlow | Call could not be initiated or provider error |
timeoutFlow | Call timed out |
DTMF Support​
When dtmfDetection is enabled, the caller can press keypad keys during the call. The AI is informed of DTMF input and can branch the conversation accordingly.
Example prompt handling DTMF:
systemPrompt: If the caller presses 1, confirm the appointment. If they press 2, start the reschedule flow. If they press 0, transfer to a human agent.
Supported Providers​
| Type | Providers |
|---|---|
| Voice/Telephony | Twilio, Vonage |
| Text-to-Speech | ElevenLabs, Google Cloud TTS, Amazon Polly, OpenAI TTS |
| Speech-to-Text | Deepgram, Google Cloud STT, OpenAI Whisper |
| AI Brain | OpenAI, Anthropic, Google Gemini (via AI Provider integration) |
Mix and match providers — e.g., use Twilio for telephony + Deepgram for transcription + GPT-4o for AI.
Example: Appointment Reminder​
{
"nodeId": "voice-reminder-1",
"name": "Call Patient for Appointment Reminder",
"nodeType": "voiceTask",
"properties": {
"aiProviderConnectionId": "int_openai",
"smsConnectionId": "int_twilio",
"toPhoneNumber": "{patientPhone}",
"greeting": "Hello, may I speak with {patientName}? This is a reminder call from {clinicName}.",
"systemPrompt": "You are a polite appointment reminder agent for {clinicName}. The patient's name is {patientName}. Their appointment is with Dr. {doctorName} on {appointmentDate} at {appointmentTime}. Confirm the appointment and offer to reschedule if needed. Be brief and professional.",
"agentName": "Scheduling Assistant",
"language": "en-US",
"ttsProvider": "ELEVENLABS",
"ttsVoice": "rachel",
"transcriptionProvider": "DEEPGRAM",
"interruptible": true,
"dtmfDetection": true
},
"timeout": {
"duration": 5,
"durationUom": "MINUTES",
"action": "FAIL"
}
}
Best Practices​
- Always set a timeout — calls can last indefinitely without one
- Enable
interruptiblefor natural conversation feel - Test in DEVELOPMENT environment with your own phone number before going live
- Keep
systemPromptfocused on one goal — multi-purpose voice agents confuse callers - Capture
{callTranscript}for compliance and quality review - Connect
errorFlowto handle failed calls (busy, no answer) with a follow-up SMS