

Hmm…it runs on a 1060…it’s a MoE not a dense. 24B is even lighter. Worth a shot.
https://www.youtube.com/watch?v=8F_5pdcD3HY
Else, if youre looking for a coding model (??) something like Sara or fara might suit


Hmm…it runs on a 1060…it’s a MoE not a dense. 24B is even lighter. Worth a shot.
https://www.youtube.com/watch?v=8F_5pdcD3HY
Else, if youre looking for a coding model (??) something like Sara or fara might suit


I mean…that entirely depends on your use case - and I hate saying that. For me and what I do, Qwen SLM (esp Qwen3-4B 2507 instruct and Qwen3.5-2B) are exceptional. But I’m not trying to do Claude at home.
Best bet? Spend $10 on OpenRouter and try different models. In a head to head with ChatGPT 5.4 mini (excellent for coding BTW), I’ve found Qwen 3.5 27B more than able to hold its own for coding tasks…IF you narrowly gate it/confine it. The last batch of Qwen’s really are something. Dunno about the 3.7 series.
Having said ALL that, I’m really tempted to go back in time and code myself a deterministic expert system, with user updatable knowledge cascade, tool calling and a minimal amount of Markov chain word garnish for flavour. I think we use to just call that “a program” lol.
Really tempted actually, because if 50% of llm use case is basically Super Google but not shit…well, I can make that myself. I just need to point my autism at it.
PS: this might help


Numbers about 3-4x. The P100 is near 800 GB/s. The 1080 is what… 192GB/s? Hell, even if it were double that, HBM2 simply has larger bandwidth. The 1080 was a gaming card; the P100 is a server / number cruncher.


Just for sake of completion
Pros
Mature project (around since the early 2000s)
Lightweight compared to Immich
Designed as a photo library first, not an AI platform
Albums, tags, metadata, permissions
Huge plugin ecosystem
Runs happily on modest hardware
Can manage very large collections
Doesn’t demand phone-app-centric workflows (though of course it has a phone to computer app / sync)
Cons
Feels more like a traditional photo archive than Google Photos
Mobile experience is functional rather than slick
No fancy AI search or face recognition by default (though can add easy enough)
UI is a bit “classic web”


Huh - cheaper than the P40s (though less VRAM) but larger bandwidth due to HBM2. Good looking out


Good tips - thanks!
PS: sad to report the 24GB Tesla p40s are now around $250 USD on eBay, so not quite as cheap as I remembered. P4s are still cheap tho, though frankly if you’re going that end of town, a 1080 is about on par, less fussy and probably cheaper - it just won’t fit in a uSFF.


You probably could. A Tesla P4 or P40 (old data centre cards) are more than up to the job. My Lenovo tiny hosts a P4 (card cost $100 on eBay; the lenovo itself was $200ish) and runs Qwen3.5-35B-A3B at about 20 tok/s. Smaller models are even faster.
https://www.youtube.com/watch?v=8F_5pdcD3HY
If you’re not bound by the one liter shoebox design, then the P40 is still a great and inexpensive card.
I think I mentioned elsewhere but right now I’m trying to figure out if I can use a magic packet from the Raspberry Pi to wake up the Lenovo as needed rather than leaving it on all the time.


Agree. I know the Pi’s are out of favour these days…but they are a cool little machine. I got mine running DietPi and a bunch o crap (the usuals - JF, arr stack, pi hole, syncthing, yadda yadda) and running headless the footprint (power and memory wise) is tiny.
I joked about the 4xAA batteries thing but iirc, there is actually a Pi-HAT that creates a micro UPS that’ll run the pi for maybe three to five hours just on double A batteries.
Edit: yep
https://pimodules.com/product/ups-pico-hv4-0-advanced
or more sensibly


Agree. And re small models - very agree. In fact I made a ablated version of Qwen 3.5-2B for use with my pi, before thinking a bit harder and realising I can probably code something bespoke that doesn’t need a stochastic parrot as a squwake box at all.
https://huggingface.co/BobbyLLM/polaris-heretic-Q4_K_M-GGUF
Still, as a SLM, it’s perfectly cromulent and does well with tool calling etc which is what I wanted it for.


There’s an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.
PS: I’m actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.


Yep. But that would be 100% CPU, 100% of the time? Real life, it’s probably closer to 2w idle and maybe 5-7W under typical load.
More interesting…I think that technically means you could make a “UPS” for it using what…4xAA batteries?
Oh man…that would be cool. Stupid but cool.


They were, I think. Or we were just younger.


Yeah, same. Though at 3-5W … it really is just a very rough guess. Lemme ShitGPT it. Oh, I was way off
A realistic Pi 4B-only estimate is about A$8–A$12 per year in electricity, assuming it is on 24/7 and used for Jellyfin streaming around 10–12 hours per week.
Pi 4B measurements are typically around 2.7–2.85 W at idle, about 5.1 W under moderate server load, and around 6.4 W under full CPU stress. Using Perth/WA’s Synergy Home Plan A1 energy charge of 32.3719 c/kWh, excluding the daily supply charge, that works out very cheaply because the device uses only about 25–36 kWh/year.
Scenario Assumed usage Annual energy Approx. annual cost
Mostly idle 3 W 24/7 26.3 kWh A$8.51/year Idle + 12h/wk Jellyfin 2.7 W idle, 5.1 W streaming 25.1 kWh A$8.14/year Heavier Jellyfin/server use 2.7 W idle, 6.4 W streaming 26.0 kWh A$8.40/year Conservative wall-power estimate 4 W idle, 6.4 W streaming 36.5 kWh A$11.83/year
The bigger swing factor is storage, not the Pi. A USB SSD adds very little; a USB-powered 2.5" hard drive might add a few dollars per year; a powered 3.5" external drive left spinning 24/7 could push the total more into the A$15–A$30/year range.
So, for the Raspberry Pi 4B itself as a Jellyfin box: roughly A$10/year is a good mental estimate.


I remember it being a touch more …analog…back in the day. ATDT commands and all.
But yeah, Win 3.11+ trumpet winsock and Free Agent were the shit. Rec.martial.arts was home back then (along with mIRC).
Lemmy reminds me a bit of the old Usenet fora.


Torrent cache? As in seedbox?


Use to last me 2-3 months… but my media library is more or less complete now, with little churn. Also, I don’t ever go above 1080p.
I need to check if Radarr / Sonarr works with straight torrents (it must do; I haven’t used them for ages / have been using 1337 manually, but I seem to recall torrents being a source).


Debatable :) Torrents rely on seeders. I’ve downloaded movies and TV shows >5 yrs since initial upload via Usenet. Yes, things expire there too (eventually), but when the getting is good, it’s uniformly good / fast.
OTOH, 1337 has been pretty decent to me of late.
It’s tricky. On one hand, Jellyfin and the arr stack are what got me into self hosting. OTOH…torrents are simpler - I can plug my external SSD directly into my router, which streams to NovaPlayer on any android device - nothing else needed. Want a new show / movie? Grab the torrent, punt it across to ssd via samba share. It auto populates.
https://github.com/nova-video-player/aos-AVP
It’s…simpler. Arguably more elegant / less moving parts.
Dunno.


Yarrr! But it really mostly is Yarr these days. So don’t go firing up Trumpet winsock to check Forte Agent :)


I was tempted to say $0, but then I thought harder about the problem.
Technically I do have ongoing costs
https://usenet-news.net/index1.php?url=home
Electricity (whatever tiny amount raspberry pi sips). At a guess, maybe $50/yr.
So, amortised over time - very low but not zero. In theory, if I dropped Usenet, it would even lower. And theoretically, I could run the pi off a single solar panel and a diy solar kit but I’m not busy pretending to be Robinson Crusoe just yet. Though… It might be a cool project.
Respectfully, that’s not really how local LLMs work.
A GGUF model sitting on my hard drive has no ability to “send content back home” any more than a PDF or a JPEG does. If you’re running something like llama.cpp or Ollama entirely locally, the model weights are just data files.
The real privacy concerns are cloud APIs, telemetry in front-ends, browser extensions, analytics, update services, or accidentally exposing a service to the public internet.
“Self-hosted AI” isn’t one thing. There’s a huge difference between:
Firewalling internet-facing services is good advice. Assuming every local model is secretly uploading prompts is not.