Skip Navigation

Posts
27
Comments
444
Joined
2 yr. ago

  • You can see it in the post thumbnail

    /s

  • I had to wait 5-6 seconds to visit that site.

    The internet got so much worse - in two ways of course.

  • How am I supposed to remember those?

    On word boundaries? But that would be way too predictable!

  • So why do you join them uninvited? /s

  • 3bitswalkintoabarandoneflips

  • So, is the account actually read-only?

  • Winamp Collaborative License

    A "source-available" license that says you can't fork the project, which means it is illegal to click the fork button on GitHub (which is a violation of the GitHub ToS) and also makes it impossible to create any pull requests without push access to the original repo. Source

    Oh yeah, I remember that whole fiasco from a few months back.

  • Who is Kate?

    Kate the editor? Or is there an IDE called Kate?

  • The collaborative sharing nature of these platforms is a big advantage. (Not just VS Code Marketplace. We have this with all extension and lib and program package managers.)

    Current approaches revolve around

    • reporting
    • manual review
    • automated review (checks) for flagging or removal
    • secured naming spaces

    The problem with the latter is that it is often not necessarily proof of trustworthyness, only that the namespace is owned by the same entity in its entirety.

    In my opinion, improvements could be made through

    • better indication of publisher identity (verified legal entities like companies, or of persona, or owned domain)
    • better indication of publisher trustworthiness (how did they establish themselves as trustworthy; long running contributions in the specific space or in general, long standing online persona, vs "random person", etc)
    • more prominent license and source code linking - it should be easy to access the source code to review it
    • some platforms implement their own build infrastructure to ensure the source code represents the published package

    Maybe there could be some more coordinated efforts of review and approval. Like, if the publisher has a trustworthiness indication, and the package has labeled advocators with their own trustworthiness indicated, you could make a better immediate assessment.

    On the more technical side, before the platform, a more restrictive and specific permission system. Like browser extensions ask for permissions on install and/or for specific functionality could be implemented for app extensions and lib packages too. Platform requirements could require minimal defaults and optional things being implemented as optional rather than "ask for everything by default".

  • Minification is a form of obfuscation. It makes it (much) less readable.

    Of course you could run a formatter over it. But that's already an additional step you have to do. By the same reasoning you could run a deobfuscator over more obfuscated code.

  • What makes you think only GitHub is celebrating?

  • ROCm is an implementation/superset of OpenCL.

    ROCm ships its installable client driver (ICD) loader and an OpenCL implementation bundled together. As of January 2022, ROCm 4.5.2 ships OpenCL 2.2

    Shaders are computational visual [post-]processing - think pixel position based adjustments to rendering.

    OpenCL and CUDA are computation frameworks where you can use the GPU for other processing than rendering. You can use it for more general computing.

    nVidia has always been focusing on proprietary technology. Introduce a technology, and try to make it a closed market, where people are forced to buy and use nVidia for it. AMD has always been supporting and developing open standards as a counterplay to that.

  • JWTs are standard authentication tools - who’s the security concern for? ByteDance? Or are you saying the JWTs are from the local machine?

    Yes, I read that as local project JWTs are being transmitted to their servers. As a concern, and not labeled as used for authentication, IMO it's clearly implied that they observed JWT tokens and auth data unrelated to any telemetry auth (if they even have any).

    JWT tokens and authentication data observed in multiple communication channels, presenting potential credential exposure concerns

  • The official Anthropic post/announcement

    Very interesting read

    The math guessing game (lol), the bullshitting of "thinking out loud", being able to identify hidden (trained) biases, looking ahead when producing text, following multi-step reasoning, analyzing jailbreak prompts, analysis of antihallucination training and hallucinations

    At the same time, we recognize the limitations of our current approach. Even on short, simple prompts, our method only captures a fraction of the total computation performed by Claude, and the mechanisms we do see may have some artifacts based on our tools which don't reflect what is going on in the underlying model. It currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words.

  • I would separate concerns. For the scraping, I would dump data as json onto disk. I would consider the folder structure I put them into, whether as individual files, or a JSON document per line in bigger files for grouping. If the website has good URL structure, the path could be useful for speaking author and or id identifiers in folders or files.

    Storing json as text is simple. Depending on the amount, storing plain text is wasteful, and simple text compression could significantly reduce storage size. For text-only stories it's unlikely to become significant though, and not compressing makes the scraping process, and potentially validating completeness of scraped data simpler.

    I would then keep this data separate from any modifications or prototyping I would do regarding modification or extension of data and presentation/interfacing.

  • https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell

    JSON is a much simpler (and consequently safer) format. It's also more universally supported.

    YAML (or TOML) is decent for a manually read and written configuration. But for a scraper output for storage and follow-up workflows being through code parsing anyway, I would go for JSON.

  • I think I need an AI to parse these confusing graphs and images for me.

  • Infrastructure configuration that is automatically applied to the cloud infrastructure. Like starting and stopping new instances and services, changing connections between them, etc. (I assume anyway.)