Stream Audio (WebSocket) - Eka Developer Platform APIs

When a session is created with upload_type: stream, Create Session returns a wss:// URL in the upload_url field instead of an HTTPS upload URL. Connect to it and stream audio in real time.

This is the alternative to Upload Audio (HTTP binary). A session uses one path or the other, decided by upload_type:

chunked → HTTP binary upload
stream → WebSocket (this page)

1. Get the WebSocket URL

// POST /voice/v1/sessions  (upload_type: "stream")  → 201
{
  "session_id": "ses_abc123def456",
  "status": "created",
  "upload_url": "wss://api.eka.care/voice/v1/stream/sessions/str_a1b2c3d4e5f6/audio",
  ...
}

The upload_url is a full wss:// endpoint bound to this session. It contains an internal stream_id — connect to it exactly as returned.

2. Connect and stream

The server auto-detects the wire format from your first frame. Use whichever fits your client:

Option A — Raw binary PCM

Send raw 16-bit little-endian, mono, 16 kHz PCM audio as binary WebSocket frames:

const ws = new WebSocket(uploadUrl);
ws.binaryType = "arraybuffer";
ws.onopen = () => {
  // stream PCM chunks as they are captured
  ws.send(pcmChunk);   // ArrayBuffer / Uint8Array of 16-bit PCM
};

Option B — JSON envelope (Twilio / Vobiz Media Streams compatible)

Send text frames using the Media Streams envelope. Audio payloads are base64-encoded PCM:

{ "event": "start", "start": { "mediaFormat": { "encoding": "audio/x-l16", "sampleRate": 16000 } } }
{ "event": "media", "media": { "payload": "<base64-encoded PCM>" } }
{ "event": "media", "media": { "payload": "<base64-encoded PCM>" } }
{ "event": "stop" }

Declare your sample rate in the start event’s mediaFormat.sampleRate if it differs from the 16 kHz default. The server applies Voice Activity Detection (VAD) and accumulates speech-boundary-aware chunks (~10–25s) which are written to storage automatically.

3. Finish: close, then end the session

When the audio is done:

Stop streaming

Send a stop event (JSON mode) or simply close the WebSocket. The server flushes any buffered audio to storage.

Call End Session

Call End Session (POST /voice/v1/sessions/{session_id}/end). For protocol streaming sessions this is the single, canonical finalize trigger — closing the socket flushes audio but does not start processing on its own.

Poll for results

Poll Get Session at ~1-second intervals until the status is no longer 202.

Format summary

Property	Value
Transport	WebSocket (`wss://`)
Audio encoding	16-bit signed PCM, little-endian, mono
Default sample rate	16000 Hz
Frame modes	Raw binary or JSON envelope (`start` / `media` / `stop`)
Chunking	Server-side VAD, ~10–25s speech-aware chunks
Finalize	`stop` / socket close → flush; End Session → process

​1. Get the WebSocket URL

​2. Connect and stream

​Option A — Raw binary PCM

​Option B — JSON envelope (Twilio / Vobiz Media Streams compatible)

​3. Finish: close, then end the session

​Format summary

1. Get the WebSocket URL

2. Connect and stream

Option A — Raw binary PCM

Option B — JSON envelope (Twilio / Vobiz Media Streams compatible)

3. Finish: close, then end the session

Format summary