This workflow is designed specifically to measure end-to-end latency for voice-based AI interactions, focusing on benchmarking system response times rather than providing full conversational answers. Incoming audio samples (in Kikuyu or other selected languages) are transcribed, processed by an AI assistant with a maximum output of 10 tokens, and then synthesized back to audio. The workflow compares two different Kikuyu audio-to-audio (A2A) translation pipelines by processing input samples from a Google Sheet and logging runtime, price, and output URLs for each. This setup enables reliable benchmarking and optimization of transcription, AI processing, and text-to-speech components, helping teams evaluate latency and cost across different ASR models for the Kikuyu language.