cross-posted from: https://programming.dev/post/51407459

Check what can you use and at what rate of token per seconds would it be… It has examples of many models and quantization levels. Huge resource!

  • LobsterJim@slrpnk.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    13 days ago

    I know I will invite ire with this, but I think a self hosted model is relatively acceptable. Get rid of the generative art and stick to things like code and evaluation via a model not being sourced by a massive data center (plus the capability to train a model in a way you may find even more acceptable than a default) and most if not all of the questionable aspects of LLMs fade away.