Another reason to self host your own AI

SuspiciousCarrot78@aussie.zone · 2 days ago

Respectfully, that’s not really how local LLMs work.

A GGUF model sitting on my hard drive has no ability to “send content back home” any more than a PDF or a JPEG does. If you’re running something like llama.cpp or Ollama entirely locally, the model weights are just data files.

The real privacy concerns are cloud APIs, telemetry in front-ends, browser extensions, analytics, update services, or accidentally exposing a service to the public internet.

“Self-hosted AI” isn’t one thing. There’s a huge difference between:

Running ChatGPT through an API
Running a commercial AI appliance
Running a local Qwen/Mistral/Llama model on your own hardware

Firewalling internet-facing services is good advice. Assuming every local model is secretly uploading prompts is not.

SuspiciousCarrot78@aussie.zone · edit-2 2 days ago

Hmm…it runs on a 1060…it’s a MoE not a dense. 24B is even lighter. Worth a shot.

https://www.youtube.com/watch?v=8F_5pdcD3HY

Else, if youre looking for a coding model (??) something like Sara or fara might suit

https://huggingface.co/microsoft/Fara-7B

SuspiciousCarrot78@aussie.zone · edit-2 3 days ago

I mean…that entirely depends on your use case - and I hate saying that. For me and what I do, Qwen SLM (esp Qwen3-4B 2507 instruct and Qwen3.5-2B) are exceptional. But I’m not trying to do Claude at home.

Best bet? Spend $10 on OpenRouter and try different models. In a head to head with ChatGPT 5.4 mini (excellent for coding BTW), I’ve found Qwen 3.5 27B more than able to hold its own for coding tasks…IF you narrowly gate it/confine it. The last batch of Qwen’s really are something. Dunno about the 3.7 series.

Having said ALL that, I’m really tempted to go back in time and code myself a deterministic expert system, with user updatable knowledge cascade, tool calling and a minimal amount of Markov chain word garnish for flavour. I think we use to just call that “a program” lol.

Really tempted actually, because if 50% of llm use case is basically Super Google but not shit…well, I can make that myself. I just need to point my autism at it.

PS: this might help

https://www.youtube.com/watch?v=0AqpaFm11oI

SuspiciousCarrot78@aussie.zone · 4 days ago

Numbers about 3-4x. The P100 is near 800 GB/s. The 1080 is what… 192GB/s? Hell, even if it were double that, HBM2 simply has larger bandwidth. The 1080 was a gaming card; the P100 is a server / number cruncher.

SuspiciousCarrot78@aussie.zone · edit-2 4 days ago

Just for sake of completion

https://piwigo.org/

Pros

Mature project (around since the early 2000s)

Lightweight compared to Immich

Designed as a photo library first, not an AI platform

Albums, tags, metadata, permissions

Huge plugin ecosystem

Runs happily on modest hardware

Can manage very large collections

Doesn’t demand phone-app-centric workflows (though of course it has a phone to computer app / sync)

Cons

Feels more like a traditional photo archive than Google Photos

Mobile experience is functional rather than slick

No fancy AI search or face recognition by default (though can add easy enough)

UI is a bit “classic web”

SuspiciousCarrot78@aussie.zone · 4 days ago

Huh - cheaper than the P40s (though less VRAM) but larger bandwidth due to HBM2. Good looking out

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Good tips - thanks!

PS: sad to report the 24GB Tesla p40s are now around $250 USD on eBay, so not quite as cheap as I remembered. P4s are still cheap tho, though frankly if you’re going that end of town, a 1080 is about on par, less fussy and probably cheaper - it just won’t fit in a uSFF.

SuspiciousCarrot78@aussie.zone · 5 days ago

You probably could. A Tesla P4 or P40 (old data centre cards) are more than up to the job. My Lenovo tiny hosts a P4 (card cost $100 on eBay; the lenovo itself was $200ish) and runs Qwen3.5-35B-A3B at about 20 tok/s. Smaller models are even faster.

https://www.youtube.com/watch?v=8F_5pdcD3HY

If you’re not bound by the one liter shoebox design, then the P40 is still a great and inexpensive card.

I think I mentioned elsewhere but right now I’m trying to figure out if I can use a magic packet from the Raspberry Pi to wake up the Lenovo as needed rather than leaving it on all the time.

SuspiciousCarrot78@aussie.zone · 5 days ago

Agree. I know the Pi’s are out of favour these days…but they are a cool little machine. I got mine running DietPi and a bunch o crap (the usuals - JF, arr stack, pi hole, syncthing, yadda yadda) and running headless the footprint (power and memory wise) is tiny.

I joked about the 4xAA batteries thing but iirc, there is actually a Pi-HAT that creates a micro UPS that’ll run the pi for maybe three to five hours just on double A batteries.

Edit: yep

https://pimodules.com/product/ups-pico-hv4-0-advanced

or more sensibly

https://littlebirdelectronics.com.au/collections/raspberry-pi-power-hats/products/raspberry-pi-ups-hat

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Agree. And re small models - very agree. In fact I made a ablated version of Qwen 3.5-2B for use with my pi, before thinking a bit harder and realising I can probably code something bespoke that doesn’t need a stochastic parrot as a squwake box at all.

https://huggingface.co/BobbyLLM/polaris-heretic-Q4_K_M-GGUF

Still, as a SLM, it’s perfectly cromulent and does well with tool calling etc which is what I wanted it for.

SuspiciousCarrot78@aussie.zone · 5 days ago

There’s an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.

PS: I’m actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Another reason to self host your own AI

SuspiciousCarrot78@aussie.zone · 5 days ago

Yep. But that would be 100% CPU, 100% of the time? Real life, it’s probably closer to 2w idle and maybe 5-7W under typical load.

More interesting…I think that technically means you could make a “UPS” for it using what…4xAA batteries?

Oh man…that would be cool. Stupid but cool.

SuspiciousCarrot78@aussie.zone · 5 days ago

They were, I think. Or we were just younger.

SuspiciousCarrot78@aussie.zone · 5 days ago

Yeah, same. Though at 3-5W … it really is just a very rough guess. Lemme ShitGPT it. Oh, I was way off

A realistic Pi 4B-only estimate is about A$8–A$12 per year in electricity, assuming it is on 24/7 and used for Jellyfin streaming around 10–12 hours per week.

Pi 4B measurements are typically around 2.7–2.85 W at idle, about 5.1 W under moderate server load, and around 6.4 W under full CPU stress. Using Perth/WA’s Synergy Home Plan A1 energy charge of 32.3719 c/kWh, excluding the daily supply charge, that works out very cheaply because the device uses only about 25–36 kWh/year.

Scenario Assumed usage Annual energy Approx. annual cost

Mostly idle 3 W 24/7 26.3 kWh A$8.51/year Idle + 12h/wk Jellyfin 2.7 W idle, 5.1 W streaming 25.1 kWh A$8.14/year Heavier Jellyfin/server use 2.7 W idle, 6.4 W streaming 26.0 kWh A$8.40/year Conservative wall-power estimate 4 W idle, 6.4 W streaming 36.5 kWh A$11.83/year

The bigger swing factor is storage, not the Pi. A USB SSD adds very little; a USB-powered 2.5" hard drive might add a few dollars per year; a powered 3.5" external drive left spinning 24/7 could push the total more into the A$15–A$30/year range.

So, for the Raspberry Pi 4B itself as a Jellyfin box: roughly A$10/year is a good mental estimate.

SuspiciousCarrot78@aussie.zone · 5 days ago

I remember it being a touch more …analog…back in the day. ATDT commands and all.

But yeah, Win 3.11+ trumpet winsock and Free Agent were the shit. Rec.martial.arts was home back then (along with mIRC).

Lemmy reminds me a bit of the old Usenet fora.

SuspiciousCarrot78@aussie.zone · 5 days ago

Torrent cache? As in seedbox?

SuspiciousCarrot78@aussie.zone · 5 days ago

Use to last me 2-3 months… but my media library is more or less complete now, with little churn. Also, I don’t ever go above 1080p.

I need to check if Radarr / Sonarr works with straight torrents (it must do; I haven’t used them for ages / have been using 1337 manually, but I seem to recall torrents being a source).

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Debatable :) Torrents rely on seeders. I’ve downloaded movies and TV shows >5 yrs since initial upload via Usenet. Yes, things expire there too (eventually), but when the getting is good, it’s uniformly good / fast.

OTOH, 1337 has been pretty decent to me of late.

It’s tricky. On one hand, Jellyfin and the arr stack are what got me into self hosting. OTOH…torrents are simpler - I can plug my external SSD directly into my router, which streams to NovaPlayer on any android device - nothing else needed. Want a new show / movie? Grab the torrent, punt it across to ssd via samba share. It auto populates.

https://github.com/nova-video-player/aos-AVP

It’s…simpler. Arguably more elegant / less moving parts.

Dunno.

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

Yarrr! But it really mostly is Yarr these days. So don’t go firing up Trumpet winsock to check Forte Agent :)

SuspiciousCarrot78@aussie.zone · edit-2 5 days ago

I was tempted to say $0, but then I thought harder about the problem.

Technically I do have ongoing costs

PAYG costs for Usenet-news (iirc, $22USD for 500GB block)

https://usenet-news.net/index1.php?url=home

News indexer (I think…$60 every 5 years?)

https://www.nzbgeek.info/

Electricity (whatever tiny amount raspberry pi sips). At a guess, maybe $50/yr.

So, amortised over time - very low but not zero. In theory, if I dropped Usenet, it would even lower. And theoretically, I could run the pi off a single solar panel and a diy solar kit but I’m not busy pretending to be Robinson Crusoe just yet. Though… It might be a cool project.