Skip Navigation

jarfil @ jarfil @beehaw.org

Posts

9
Comments

1,038
Joined

2 yr. ago

6mo ago

Is it worth trying to fix a home printer?

BEWARE THE SPONGE!!!

Inkjets waste tons of ink into a sponge-diaper with every cleaning cycle, that over time gets saturated and become prone to leaks. You DO NOT want ink from it spilling all over wherever you put it.

Other than that, it's hard to tell. If you can get the nozzles clean, and it doesn't have software/firmware issues, a general disassembly, cleaning, and assembly, can bring a printer back to life. It usually takes an amount of time and effort that makes it not worth it... but if you have more spare time than change, it might be worth a shot.

6mo ago

Researchers created an open rival to OpenAI's o1 'reasoning' model for under $50 | TechCrunch

The other way around. They started with Alibaba's Qwen, then fine tuned it to match the thinking process behind 1000 hand picked queries on Google's Gemini 2.0.

That $50 proce tag is kind of silly, but it's like picking an old car and copying the mpg, seats, and paint job from a new car. It's still an old car underneath, only it looks and behaves like a new one in some aspects.

I think it's interesting that old models could be "upgraded" for such a low price. It points to something many have been suspecting for some time: LLMs are actually "too large", they don't need all that size to show some of the more interesting behaviors.

6mo ago

Someone intentionally drove their vehicle into my home

If you have your biometric data registered, chances are you might still have access to Swiss consular services. May want to contact them and at least stay updated, before someone decides to, for example, preemptively deport you to El Salvador. Stay safe.

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

There are several parts to the "spying" risk:

Sending private data to a third party server for the model to process it... well, you just sent it, game over. Use local models, or machines (hopefully) under your control, or ones you trust (AWS? Azure? GCP?... maybe).

All LLM models are a black box, the only way to make an educated guess about their risk, is to compare the training data and procedure, to the evaluation data of the final model. There is still a risk of hallucinations and deceival, but it can be quantified to some degree.

DeepSeek uses a "Mixture of Experts" approach to reduce computational load... which is great, as long as you trust the "Experts" they use. Since the LLM that was released for free, is still a black box, and there is no way to verify which "Experts" were used to train it, there is also no way to know whether some of those "Experts" might or might not be trained to behave in a malicious way under some specific conditions. It could as easily be a Troyan Horse with little chance of getting detected until it's too late.

it's being trained on the output of other LLMs, which makes it much more cheap but, to me it seems, also even less trustworthy

The feedback degradation of an LLM happens when it gets fed its own output as part of the training data. We don't exactly know what training data was used for DeepSeek, but as long as it was generated by some different LLM, there would be little risk of a feedback reinforcement loop.

Generally speaking, I would run the DeepSeek LLM in an isolated environment, but not trust it to be integrated in any sort of non-sandboxed agent. The downloadable smartphone app, is possibly "safe" as long as you restrict the hell out of it, don't let it access anything on its own, and don't feed it anything remotely sensitive.

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

While unfettered access is bad in general, DeepSeek takes it a step farther: the Mixture of Experts approach in order to reduce computational load, is great when you know exactly what "Experts" it's using, not so great when there is no way to check whether some of those "Experts" might be focused on extracting intelligence under specific circumstances.

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

That's not how LLMs work, and you know it. A model of weights is not a lossless compression algorithm.

https://www.piratewires.com/p/compression-prompts-gpt-hidden-dialects

if you're giving an LLM free reign to all of your session tokens and security passwords, that's on you.

There are more trade secrets than session tokens and security passwords. People want AI agents to summarize their local knowledge base and documents, then expand it with updated web searches. No passwords needed when the LLM can order the data to be exfiltrated directly.

6mo ago

‘Forbidden Words’: Github Reveals How Software Engineers Are Purging Federal Databases

Dropped branches get all their commit trees purged, unless the commits are also part of another branch.

6mo ago

‘Forbidden Words’: Github Reveals How Software Engineers Are Purging Federal Databases

First they came for the "master" branch name, but I didn't care what the "reference" branch was called.

Then they came for "blacklist", and while it was a BS mixup with the "black book", I didn't really care what the "deny/allow list" was called.

Now they're coming for "Remove-DEI" and "forbidden words"... and they're pushing it as a bigoted administrative mandate using the same censoring procedures in overdrive.

6mo ago

Deepfake videos are getting shockingly good | TechCrunch

Hands, hair, camera angles, and voice sync, still have some way to go... but they do seem like they're slowly climbing out of the Uncanny Valley.

Interesting times.

6mo ago

Trump says Palestinians have ‘no alternative’ but to leave Gaza

On one hand, you're not wrong.

On the other, getting a combo of "Congress + Senate + Supreme Court + Administration" controlled by the same people who walk around with Nazi flags, make Nazi salutes, call people "sub-human", plan mass deportations... kind of goes beyond "shitty corporate evil".

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

There are several "good" LLMs trained on open datasets like FineWeb, LAION, DataComp, etc. They are still "ethically dubious", but at least they can be downloaded, analyzed, filtered, and so on. Unfortunately businesses are keeping datasets and training code as a competitive advantage, even "Open"AI stopped publishing them when they saw an opportunity to make money.

What is the concern with only having weights? It's not abritrary code exectution

Unless one plugs it into an agent... which is kind of the use we expect right now.

Accessing the web, or even web searches, is already equivalent to arbitrary code execution: an LLM could decide to, for example, summarize and compress some context full of trade secrets, then proceed to "search" for it, sending it to wherever it has access to.

Agents can also be allowed to run local commands... again a use we kind of want now ("hey Google, open my alarms" on a smartphone).

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

What about these? Dozens of TB here:

https://huggingface.co/HuggingFaceFW

There is also a LAION-5B now, and several other datasets.

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

Open source requires giving whatever digital information is necessary to build a binary.

In this case, the "binary" are the network weights, and "whatever is necessary" includes both training data, and training code.

DeepSeek is sharing:

NO training data
NO training code
instead, PDFs with a description of the process
binary weights (a few snapshots)
fine-tune code
inference code
evaluation code
integration code

In other words: a good amount of open source... with a huge binary blob in the middle.

6mo ago

Trump tariffs will cost U.S. households $830 a year, study says

Needs an update for counter-tariffs, counter-sanctions, counter-counter-tariffs... and so on.

6mo ago

Soft spaces out, stick-fighting in: Dutch call for the return of risky play

Reminds me of this book: Fifty Dangerous Things (You Should Let Your Children Do)

6mo ago

Wyoming Tribes Push to Control Reservation Water as the State Proposes Sending it to Outside Irrigators

Water Wars have been going on for decades. Many conflicts around Africa and the Middle East can be traced directly or indirectly to the availability and control of fresh water sources. There are internal conflicts around fresh water distribution in pretty much every larger country.

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

Where's the training data?

6mo ago

Bill proposed to outlaw downloading Chinese AI models.

Define "open sourced model".

The neural network is still a black box, with no source (training data) available to build it, not to mention few people have the alleged $5M needed to run the training even if the data was available.

6mo ago

CDC orders mass retraction and revision of submitted research across all science and medicine journals. Banned terms must be scrubbed. (Including gender, transgender, pregnant person, LGBT, and more)

Ministry of Truth (Social) seal of approval.

6mo ago

Routing all network requests in proxy? Avoid use of the internet outside of proxy

Is this on the same machine, or multiple machines?

The typical/easy design for an outgoing proxy, would be to set the proxy on one machine, configure the client on another machine to connect to the proxy, and drop any packets from the client that aren't targeted at the proxy.

For a transparent proxy, all connections coming from a client could be rewritten via NAT to go to the proxy, then the proxy can decide which ones it can handle or is willing to.

If you try to fold this up into a single machine, I'd suggest using containers to keep things organized.