Manual managing strings can be chaotic. Don’t worry, F-strings are here to help.
Building a Vision API with Magma8B
Vision models describe what’s in an image, but they can’t handle spatial references. Point at an object and ask “What color is this car?” and the model doesn’t know what you’re talking about. In this post we’ll learn about Set-of-Mark prompting and how vision models can see what you’re seeing π
Building a Production Ready Text-to-Speech API
Set yourself apart from other MLEs by learning how to work with audio and serve Text-to-Speech models.
Your AI Agents Are Racing: A Guide to Multi-Agent Concurrency
Many modern APIs (like AsyncOpenAI) have built-in concurrent execution features. But understanding these concepts can help you add your own concurrency controls. You’re not reinventing the wheel, you’re modifying it for your particular use case.
Random Sampling Is Sabotaging Your Models!
Did you know diversity can detect hallucinations? Ask an LLM “what’s the capital of France” five times and you’ll get some variation of “Paris” over and over again. But ask that same LLM about the “boiling point of a dragon’s scale” and you’ll get five different made-up answers.
Self-Hosting LLMs with vLLM
Get ready to save some money π°. In this post, you’ll learn how to set up your own LLM server using vLLM, choose the right models, and build an architecture that fits your use case.
Testing Your LLM Applications (Without Going Broke)
Traditional software is deterministic. Same input, same output, every time. LLMs? Non-deterministic by design. Same input, different output. Even with temperature=0, you get variations. However, there are ways to successfully test your code and in this post you’ll learn them.
Stop Guessing What Your LLM Returns
If you don’t know what datatypes your LLM function returns, you’re basically playing Russian roulette during runtime π¨
Your LLM Works in a Notebook. Now What?
Right now, there’s probably a data scientist waking up to a $3,000 OpenAI bill because a bot found their exposed API key and has been making calls to GPT-4 for 12 hours straight.
When State Space Models Learned to See Globally π
Mamba promised linear time complexity. Vision benchmarks said “prove it”, and pure Mamba models fumbled badly. Transformers kept winning until MambaVision showed up and rewrote the rules entirely ππ.
