Sunsetting Serverless GPUs

February 1, 2024

Feb 1, 2024

Hello everyone, today we’re announcing the sunsetting of the Banana Serverless GPU platform.

On March 31st, in two months, Banana infrastructure will be shut down at noon PST. Please ensure that your GPU services are migrated to a new provider by this time.

Later in this article, I’ll provide a guide for a clean migration.

We wish we could have made it work.

With the advancement of AI, we’re all in the most exciting times of our lives, working with technology that will change the world. It has been an absolute thrill to be a part of it.

Serverless GPUs are inevitable. Given there are workloads that must be ran on GPUs, there will be a platform orchestrating those workloads.

The outstanding questions, then, are:

is it reliable enough
is it affordable enough
is it fast enough
is it easy enough

In 2022-2023, we saw a taste of product market fit, and held onto it desperately, afraid to let it slip away. I called it “Promise market fit”; the idea that users will absolutely pay for a product, given a certain spec is hit. Reliable, cost-effective, fast, and easy. Simple spec.

Unfortunately, the realities of business have turned their head. Given current runway, traction, retention, shifting AI macro trends, supply-constrained GPU markets, and a deeper understanding of the engineering required, we’ve realized that we do not have the time and resources to hit that spec.

I’d like to write up a much more detailed blog about these business dynamics, but for now, my focus is dedicated to leading Banana through a successful pivot.

Migration.

Yes, it’s a hassle you’d rather avoid.

We’re software engineers too, and we know how annoying it can be to be to have to spend time reacting to vendor changes. Especially a vendor as fundamental as compute. As a result, we delayed this decision until it was absolutely obvious it needed to happen.

Now, you’re looking for alternatives.

Life hack: Can it just be an API?

Often, users on Banana would deploy simple Huggingface base model Whisper or Stable Diffusion models. It is great to feel a sense of control, but in many cases, there are model-as-an-API providers serving those models in highly optimized, multi-tenant environments which allow them to be significantly faster and cheaper than running your own deployment on Banana.

If you don’t have a good reason for hosting custom code, such as having a nonstandard finetune or special pre/post processing logic, you may be pleasantly surprised by the quality of managed APIs. In this case, check out:

OpenAI - an obvious LLM provider, but many don’t know that they have an api for Whisper too!
Replicate - A well-loved API provider serving the best models across LLMs, Image, Audio, and more. They quickly publish the newest models as APIs, and in some cases, you can swap in your own finetunes for known architectures.
Anyscale Endpoints, Together.ai, Fireworks.ai - all incredibly cost competitive and fast LLM APIs

If custom code is the way:

Thankfully, products within the “Serverless GPU” market are all subject to the same infra constraints, so code running on one provider looks structurally identical to code on another.

All providers have had to:

Allow arbitrary container environments. Within the container:
- Run an init function to “warm up” heavy resources on boot
- Run an inference/handler function that uses the prewarmed resources at request time
Allow simple HTTP POST requests, with arbitrary JSON in and out.
Provide a public endpoint to call your server, which load balances.
Horizontally autoscale containers, from zero

Recommended Serverless GPU providers:

Runpod Serverless - The easiest, most “banana-like” experience. You’d need to change very little to move to them, and for most projects can be done in a day. It’s a python http server, in a container, autoscaled from zero behind a load balancer. With incredibly competitive prices too. Bonus feature: they also have an on-demand VM cloud.
Replicate - In addition to their APIs, Replicate also support custom deployments via Cog.
Modal - Modal is a python/data platform that’s recently become adored by its users for GPU hosting. It takes heavier lifting to move to Modal from Banana, because there is very much a “Modal way” of doing things through their more opinionated SDK/API, but you’re rewarded with a high replica ceiling, fast cold boots, and fantastic dashboards. If I were to choose any product on this list, it’d be Modal, but you will need to put some effort in.
Sagemaker - If you’re tired of running on early stage startups and just want some damn stability, go to the tried and true provider: AWS. Despite me running a serverless GPU product, I’m still not totally sure what Sagemaker is… But hey, at least it’s stable!

If you’re ambitious and want to hand-roll infra, check out these Open Source projects:

KServe - A system built on Kubernetes to run serverless inference workloads. This is the most “banana-like” open source equivalent. Pairs nicely with Coreweave (closed beta) as the VM backend.
Ray - a framework for distributed compute, built with AI in mind. Built by the team at Anyscale.

Or, if you want to simply stand up one or more always-on VMs, check out our friends at:

Shadeform.ai - an on-demand cloud marketplace, allowing you to see machine availability across a dozen providers, and launch one as a VM.
Brev.dev - interractive GPUs for development, great for training but they also offer HTTPS on exposed ports so it's a secure way to serve as well.

The ultimate “quick and dirty” way to get a functional inference endpoint would be to:

take your existing Banana project
build the docker image
run the built image image on a Shadeform or Brev instance with the potassium port exposed
configure your clientside http client or Banana SDK to point at that instance’s IP:port To be clear, this is not recommended. This is an always-on instance so will cost significant money, it is not autoscaled, and the container can crash and need manual intervention. But if that’s your vibe, it gets the job done.

Final Notes

We’ll try to be responsive in our Discord for quick questions. We cannot promise any level of support beyond pointing you in the right direction, as we’re a small team with many customers, and need to focus on what’s next for us.

Once you’ve migrated, please contact me at erik@banana.dev and we’ll cash out any balance you had remaining in your Banana account (excluding free credit deals).

I am deeply appreciative to you for choosing Banana as a provider, and I wish you the best of luck.

Godsend, programmers

Erik Dunteman, Banana cofounder