More

segalord · 2026-03-24T14:42:57 1774363377

LiteLLM has like a 1000 dependencies this is expected https://github.com/BerriAI/litellm/blob/main/requirements.tx...

zahlman · 2026-03-24T22:09:27 1774390167

Oof. What exactly is supposed to be "lite" about this?

segalord · 2026-01-23T06:36:29 1769150189

I love your poetry on a phone project so muchhhh

segalord · 2026-01-16T06:13:40 1768544020

this is every data hoarders dream setup haha

segalord · 2025-07-16T07:04:31 1752649471

> The thing that I appreciate most is that the company is that it "walks the walk" in terms of distributing the benefits of AI. Cutting edge models aren't reserved for some enterprise-grade tier with an annual agreement.

That is literally how openAI gets data for fine-tuning it's models, by testing it on real users and letting them supply data and use cases. (tool calling, computer use, thinking, all of these were championed by people outside and they had the data)

segalord · 2025-07-11T07:54:33 1752220473

Man googles offerings are so inconsistent, batch processing has been available on vertex for a while now, I dont really get why they have two different offering in vertex and gemini, both are equally inaccessible

rockwotj · 2025-07-11T11:10:23 1752232223

It’s because vertex is the “entrrprise” offering that is hippa compliant, etc. That is why vertex only has explicit prompt caching and not implicit, etc. Vertex usage is never used for training or model feedback, but the gemini API does. Basically the Gemini API is Google’s way of being able to move faster like openai and the other foundational model providers, but still having an enterprise offering. Go check Anthropic’s documentation, they even say if you have enterprise or regulatory needs go use bedrock or vertex.

Deathmax · 2025-07-11T14:39:37 1752244777

Vertex's offering of Gemini very much does implicit caching, and has always been the case [1]. The recent addition of applying implicit cache hit discounts also works on Vertex, as long as you don't use the `global` endpoint and hit one of the regional endpoints.

[1]: http://web.archive.org/web/20240517173258/https://cloud.goog..., "By default Google caches a customer's inputs and outputs for Gemini models to accelerate responses to subsequent prompts from the customer. Cached contents are stored for up to 24 hours."

nikolayasdf123 · 2025-07-11T09:55:12 1752227712

omg I realized this is not Vertex AI face-palm

segalord · on Nov 30, 2024

Xd, Gotta love how your first question to a test a model is about a “ruliad”. It’s not even in my ios dictionary

segalord · on Oct 20, 2024

You’d still be reasoning using symbols, language is inherently an extension of symbols and memes. Think of a person representing a complex concept in their mind with a symbol and using it for further reasoning

segalord · on Oct 16, 2024

I use it exclusively for users on my personal website to chat with my data. I've given the setup tools to have read access my files and data

netdevnet · on Oct 16, 2024

Is this not something that you can with non-hosted LLMs like ChatGPT? If you expose your data, it should be able to access it iirc

worldsayshi · on Oct 16, 2024

You can absolutely do that but then you pay by the token instead of a big upfront hardware cost. It feels different I suppose. Sunk cost and all that.

segalord · on Oct 14, 2024

Noticing CF pushing for devs to use DO for eveything over workers these days. Even websocket connections on workers get timed out after ~30s and the recommended way is to use DO for them

rozenmd · on Oct 14, 2024

Durable Objects have always been the recommended way to do websocket connections on Cloudflare Workers? (as far as I remember, anyway)

The original chat demo dates back to 2020, using DOs + websockets: https://github.com/cloudflare/workers-chat-demo

segalord · on Sept 24, 2024

My man’s got a legal notice out against him