Changelog #036
October 21, 2023Follow along on Twitter for detailed updates and improvements made to Banana.
Server Prewarming
Starting with Potassium version >= 0.3.0
, servers on Banana come with a built-in endpoint to prewarm the server in advance of latency-critical inference calls.
This is great for applications where you can us an in-app event to predict upcoming inferences on Banana, such as a user login or the completion of a prior inference.
Prewarming is not required, of course. If your GPUs are cold at the time of inference, they'll just cold boot before handling the job and returning.
To enable it on your server, set the potassium version in your requirements.txt:
potassium>=0.3.0
and push to main (or banana deploy
) to build and deploy.
To call it:
HTTP Client
You can call it with a GET/POST request to your_url.run.banana.dev/_k/warmup
This will return a 200
when the model on the GPU is warm and ready to rock.
Banana Client SDKS:
The Python and Node SDKs also expose the prewarm feature as a .warmup() method on the clients. Upgrade your SDKs packages to the latest releases and call like this:
from banana_dev import Client
my_model = Client(url, api_key)
my_model.warmup()
Enjoy!
If you have any feature suggestions, improvements, or bug reports, send us a message or let us know in #support or #feature-requests on Discord.