@lily33

lily33@lemm.ee · edit-2 21 days ago

It’s not an article about LLMs not using dialects. In fact, they have learned said dialects and will use them if asked.

What they did was, ask the LLM to suggest adjectives associated with sentences - and it would associate more aggressive or negative adjectives with African dialect.

Seems like not a bias by AI models themselves, rather a reflection of the source material.

All (racial) bias in AI models is actually a reflection of the training data, not of the modelling.

lily33@lemm.ee · 3 months ago

15 hours for what period of time? The article mentions they’d refill in two days…

lily33@lemm.ee · 4 months ago

I like the idea, but I really hate that they’ve hardcoded the provider.

lily33@lemm.ee · edit-2 8 months ago

Bluesky users will be able to opt into experiences that aren’t run by the company

Yea, no, the biggest server not showing federated content by default is just pseuso-federation - being able to say you have it, while not really doing it.

lily33@lemm.ee · 8 months ago

Not for international (non-English) results.

lily33@lemm.ee · 8 months ago

skeptical that it’s technologically feasible to search through the entire training corpus, which is an absolutely enormous amount of data

Google, DuckDuckGo, Bing, etc. do it all the time.

lily33@lemm.ee · edit-2 8 months ago

The infraction should be in what’s generated. Because the interest by itself also enables many legitimate, non-infracting uses: uses, which don’t involve generating creative work at all, or where the creative input comes from the user.

lily33@lemm.ee · 8 months ago

I didn’t say anything about AIs being humans.

lily33@lemm.ee · 8 months ago

But AI isn’t all about generating creative works. It’s a store of information that I can query - a bit like searching Google; but understands semantics, and is interactive. It can translate my own text for me - in which case all the creativity comes from me, and I use it just for its knowledge of language. Many people use it to generate boilerplate code, which is pretty generic and wouldn’t usually be subject to copyright.

lily33@lemm.ee · 8 months ago

This is not REALLY about copyright - this is an attack on free and open AI models, which would be IMPOSSIBLE if copyright was extended to cover the case of using the works for training.
It’s not stealing. There is literally no resemblance between the training works and the model. IP rights have been continuously strengthened due to lobbying over the last century and are already absurdly strong, I don’t understand why people on here want so much to strengthen them ever further.

lily33@lemm.ee · 8 months ago

That said, you can use a third party service only for sending, but receive mail on your self-hosted server.

lily33@lemm.ee · edit-2 9 months ago

What do you mean thousands at a very gradual rate? I don’t think I’ve sent 1000 emails offer the last year. And even if some people send more, I can’t imagine it would be at a pace where that becomes a problem (at least if it’s for personal use)…

lily33@lemm.ee · edit-2 9 months ago

If you have a VPS with dedicated IP they you (and only you) have used for a while, would it still be blacklisted?

lily33@lemm.ee · edit-2 9 months ago

I disagree with the “limitations” they ascribe to the Turing test - if anything, they’re implementation issues. For example:

For instance, any of the games played during the test are imitation games designed to test whether or not a machine can imitate a human. The evaluators make decisions solely based on the language or tone of messages they receive.

There’s absolutely no reason why the evaluators shouldn’t take the content of the messages into account, and use it to judge the reasoning ability of whoever they’re chatting with.

lily33@lemm.ee · edit-2 9 months ago

No, I want a communal, collaboratively managed platform to recommend things to me based on an open source algorithm whose behavior I can adjust the way I want. Alas, this just isn’t a thing.

Just amongst the available options, the closed algorithm optimized for engagement has so far been better at showing me interesting things than an unfiltered chronological feed.

lily33@lemm.ee · edit-2 9 months ago

I know it’s a feature, and I know people on Mastodon care about it. And because of that it’s not for me. That’s fine. My point was, exactly because Mastodon is not for everyone, there’s no need to be derisive of the people who “flock to yet another corporate social media honeypot.”

lily33@lemm.ee · 9 months ago

Well, if you want me on Mastodon, implement a personalized recommendation feed. Until then, corporate platforms are the only option.

lily33@lemm.ee · edit-2 9 months ago

I don’t know why you would expect a pattern-recognition engine to generate pseudo-random seeds, but the reason OpenAI disliked the prompt is that it caused GPT to start repeating itself, and this might cause it to start printing training data verbatim.

lily33@lemm.ee · edit-2 1 year ago

I think the test for “free of copyrightable elements” is pretty simple - can you look at the new creation and recognize any copyrightable elements in it? The process by which it was created doesn’t matter. Maybe I made this post entirely by copy-pasting phrases from other people, who knows (well, I didn’t, only because it would be too much work), but it does not infringe either way…

lily33@lemm.ee · edit-2 1 year ago

From Wikipedia, “a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work”.

You can probably can the output of an LLM ‘derived’, in the same way that if I counted the number of 'Q’s in Harry Potter the result derived from Rowling’s work.

But it’s not ‘derivative’.

Technically it’s possible for an LLM to output a derivative work if you prompt it to do so. But most of its outputs aren’t.