Ollama is primarily a wrapper around llama.cpp, designed for local inference tasks. It's not... Tagged with cloudcomputing, rag, webassembly, go.

Awego's platform is  dedicated to providing insights and resources for developers and technology enthusiasts, focusing on web development, software engineering, and emerging technologies. Through articles, tutorials, and tech talks, Awego offers insights into building scalable and resilient software solutions. Developers can learn about best practices in software architecture, cloud computing, and DevOps to deliver high-quality software products.

Awesome Go

The post discusses scaling Ollama, a wrapper around llama.cpp for local inference tasks, from local development to a cloud environment. It explores transitioning from simple local setups to complex distributed cloud systems, emphasizing the role of serverless computing and WebAssembly in managing dependencies and scaling. The Tau framework and its Orbit plugin system simplify deployment and enable seamless integration of Ollama functions as cloud-ready endpoints. The post provides examples and steps to set up and test with Tau, culminating in the deployment of AI applications in production environments.

Building Ollama Cloud - Scaling Local Inference to the Cloud

<p>Very creative architecture but I wouldn’t ever be deploying this for production. vLLM is way more efficient than Ollama and can be run easily on k8s.</p>