On-premise AI: safeguarding privacy in voicemail transcriptions

As a privacy-focused company, we implemented an on-premise transcription service instead of a hosted AI solution. Here's why.

author image

Kaweh Ebrahimi-Far

12 September 2024
Clock 6 min

Artificial Intelligence (AI) is transforming industries, and we’ve taken our first step by introducing AI-powered voicemail transcriptions in our communications platform, Freedom. This feature is just the beginning of a broader AI roadmap, and it serves as a pilot project for the other innovations we plan to roll out. However, unlike many companies, we’re prioritizing privacy over speed. This approach allows us to explore options that provide greater data control, such as on-premise AI solutions.

In this blog post, we’ll share insights from our experience implementing an on-premise transcription service, discuss the pros and cons of hosted vs. on-premise AI solutions, and explain why privacy-focused companies should consider an on-premise approach for their AI needs.

SaaS vs. on-premise AI: which is right for you?

When it comes to integrating AI into your product, businesses generally have two options: Software as a Service (SaaS) or on-premise solutions. Each has its own benefits and drawbacks, depending on your organization’s priorities.

Hosted SaaS (Software as a Service)

Hosted SaaS solutions are widely popular. Major providers offer APIs that process and transcribe audio, making it easy for businesses to add AI features with minimal effort.

Advantages:

  • Quick integration: You can start almost immediately with just an API key.
  • Low maintenance: The service provider handles updates and infrastructure.

Disadvantages:

  • Limited control: You have no visibility into or influence over the underlying processes.
  • No customization: There is little room for tailoring the service to your specific needs.
  • Privacy concerns: Your data is processed on third-party servers, raising potential privacy issues.

On-Premise AI hosting

On-premise hosting allows you to run AI models on your own infrastructure, giving you complete control over data privacy and processing.

Advantages:

  • Enhanced privacy: You retain full control over your data, ensuring compliance with privacy regulations.
  • Complete data management: You oversee the entire lifecycle of your data, from collection to processing.
  • Guaranteed commitments: You can confidently make specific promises to your customers about how their data is handled.

Disadvantages:

  • Older technology: On-premise solutions may not always have the latest AI advancements.
  • Higher initial costs: You’ll need to invest in hardware, development tools, and ongoing maintenance.

Choosing between SaaS and on-premise depends on your specific needs regarding privacy, control, and costs. For us, privacy was the deciding factor, leading us to select an on-premise solution.

Our on-premise transcription service: a closer look

We implemented our transcription service using Holodeck, our Kubernetes-based microservice platform. This setup allowed us to quickly build and test the new service. We chose to incorporate Whisper from OpenAI due to its strong performance, ease of use, and MIT license.

However, in practice, we discovered that the base OpenAI Whisper model wasn’t fast enough to meet our needs. To process around 45 minutes of voicemail per minute, the original model required significant GPU resources, leading us to explore faster alternatives.

We found success by transitioning to Faster-Whisper, which provided a 5x speed improvement over the original model. By modifying Whisper-large-v3 and converting it to a CTranslate2 model, we achieved our performance goals without needing an extensive GPU farm.

Optimizing the model deployment

To avoid frequent container rebuilds when updating our microservice, we separated the model from the service and dataset repository. This approach allows us to update the model using the dataset repository without constantly rebuilding large containers.

Additionally, we opted out of Git LFS in favor of Blob storage as our primary version control system. This decision was driven by cost-effectiveness and the infrequency of model updates, making Blob storage a practical solution.

Although we didn’t train the model ourselves – OpenAI handled that – there are still opportunities for improvement. One potential enhancement is to fine-tune the model using transformers from Huggingface. However, we found that the cost of allocating GPUs for minor Word Error Rate (WER) improvements didn’t justify the investment.

Another potential optimization is refining our tokenizer to give more weight to proper names in voicemail messages. This could help the model more accurately identify names, rather than making incorrect guesses.

Word Error Rate (WER): measuring speech recognition accuracy

Ever been frustrated when your phone’s voice assistant misunderstands you? The accuracy of speech recognition is measured using Word Error Rate (WER), which calculates the percentage of words that were incorrectly transcribed.

WER is calculated based on:

  • Substitutions (S): Words that were replaced incorrectly (e.g., “beach” instead of “peach”)
  • Deletions (D): Words that were omitted from the transcription
  • Insertions (I): Extra words that were added but not spoken
  • Correct Words (C): Words that were transcribed accurately

WER provides a general sense of a model’s accuracy, but it’s not perfect. It treats all mistakes equally, whether they’re minor or significant, and doesn’t account for regional accents or dialects. Improving WER remains a goal as we continue refining our service.

Running the AI model on Kubernetes

Our AI model runs on AWS Kubernetes nodes with NVIDIA T4 GPUs, optimized for AI inferencing. To ensure these nodes are dedicated to transcription tasks, we reserved them solely for transcriber workloads.

We use NVIDIA’s k8s-device-plugin to expose the GPUs to the pods, which also detects the number of GPUs available and communicates this information to the Kubernetes scheduler.

To optimize performance, we plan to implement a Horizontal Pod Autoscaler that adjusts the number of worker pods based on the queue size of pending voicemail messages. The cluster-autoscaler service will then scale the number of nodes accordingly, ensuring efficient use of resources.

Cost comparison: on-premise vs. hosted AI solutions

Currently, we are transcribing 65,800 minutes of voicemails per month, with server costs totaling $960.68. Here’s how the costs break down across different solutions:

While the OpenAI API offers the lowest cost per minute, our on-premise solution still provides a competitive alternative when considering total capacity. However, if you can’t fully utilize on-premise capacity, it can become one of the most expensive options.

Real-world implementation always brings new insights, and our journey with on-premise AI was no exception. Here are a few key learnings:

  • Excess capacity: Services requiring near real-time responses must handle spikes in demand, leading to idle server time. To minimize idle time, we’re exploring ways to plan workloads during off-peak hours or scale back nodes when demand is low.
  • Dialects are difficult: Word Error Rate does a good job at predicting accuracy on average but struggles with specific dialects. For example, our model had difficulty understanding the Limburg dialect in the Netherlands, leading to less accurate transcriptions.

Conclusion: is on-premise AI worth it?

Given the choice between the OpenAI API and our on-premise solution, the API offers 2.4x lower running costs. However, as developers and entrepreneurs shaping the future of the internet, we have a responsibility to consider the broader implications of our technology choices. Supporting the growth of large AI monopolies may not be in the best interests of our users.

For us, on-premise AI is worth the additional effort and cost because it enables us to prioritize privacy without sacrificing user convenience. While it may have been cheaper to use a hosted API, the long-term benefits of safeguarding user data make on-premise the right choice for our company.

As an independent company facing competitive pressures, it’s tempting to adopt the quickest and cheapest solutions available. However, I urge other companies to consider on-premise AI options that prioritize privacy and provide better long-term outcomes for users. By making these choices today, we can create a more secure and equitable future for all.

More stories to read

On our blog we post about a lot of stuff, just go for it and read some posts for your own fun.

from 16 August 2024

Accessibility: this is how we make our products accessible to customers with disabilities

from 31 August 2023

Safety first: How our Bug Bounty Program secures your company