The problem is simple: consumer motherboards don’t have that many PCIe slots, and consumer CPUs don’t have enough lanes to run 3+ GPUs at full PCIe gen 3 or gen 4 speeds.

My idea was to buy 3-4 computers for cheap, slot a GPU into each of them and use 4 of them in tandem. I imagine this will require some sort of agent running on each node which will be connected through a 10Gbe network. I can get a 10Gbe network running for this project.

Does Ollama or any other local AI project support this? Getting a server motherboard with CPU is going to get expensive very quickly, but this would be a great alternative.

Thanks

  • @[email protected]
    link
    fedilink
    English
    -152 months ago

    Wow, so you want to use inefficient models super cheap. I guarantee nobody has ever thought of this before. Good move coming to Lemmy for tips on how to do so. I bet you’re the next Sam Altman 🤣

    • @[email protected]OP
      link
      fedilink
      English
      42 months ago

      I don’t understand your point, but I was going to use 4 GPUs (something like used 3090s when they get cheaper or the Arc B580s) to run the smaller models like Mistral small.