cross-posted from: https://programming.dev/post/51407459

Check what can you use and at what rate of token per seconds would it be… It has examples of many models and quantization levels. Huge resource!

    • LobsterJim@slrpnk.net
      link
      fedilink
      English
      arrow-up
      3
      ·
      13 days ago

      I know I will invite ire with this, but I think a self hosted model is relatively acceptable. Get rid of the generative art and stick to things like code and evaluation via a model not being sourced by a massive data center (plus the capability to train a model in a way you may find even more acceptable than a default) and most if not all of the questionable aspects of LLMs fade away.