Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)EX
Posts
0
Comments
455
Joined
2 yr. ago

  • I understand what you're saying—I'm saying that data validation is precisely the purpose of parsers (or deserialization) in statically-typed languages. Type-checking is data validation, and parsing is the process of turning untyped, unvalidated data into typed, validated data. And, what's more, is that you can often get this functionality for free without having to write any code other than your type (if the validation is simple enough, anyway). Pydantic exists to solve a problem of Python's own making and to reproduce what's standard in statically-typed languages.

    In the case of config files, it's even possible to do this at compile time, depending on the language. Or in other words, you can statically guarantee that a config file exists at a particular location and deserialize it/validate it into a native data structure all without ever running your actual program. At my day job, all of our app's configuration lives in Dhall files which get imported and validated into our codebase as a compile-time step, meaning that misconfiguration is a compiler error.

  • You're just describing parsing in statically-typed languages, to be honest. Adding all of this stuff to Python is just (poorly) reinventing the wheel.

    Python's a great language for writing small scripts (one of my favorite for the task, in fact), but it's not really suitable for serious, large scale production usage.

  • No, you divide work so that the majority of it can be done in isolation and in parallel. Testing components together, if necessary, is done on integration branches as needed (which you don't rebase, of course). Branches and MRs should be small and short-lived with merges into master happening frequently. Collaboration largely occurs through developers frequently branching off a shared main branch that gets continuously updated.

    Trunk-based development is the industry-standard practice at this point, and for good reason. It's friendlier for CI/CD and devops, allows changes to be tested in isolation before merging, and so on.

  • I just found out about this debate and it's patently absurd. The ISO 80000-2 standard defines ℕ as including 0 and it's foundational in basically all of mathematics and computer science. Excluding 0 is a fringe position and shouldn't be taken seriously.

  • Sure... That"s what libraries are for. No one hand-rolls that stuff. You can do all of that just fine (and, actually, in a lot less code, mostly because Java is so fucking verbose) without using the nightmare that is Spring.

  • I know it's a joke, but just wanted to say that Uranium used for fuel is not something you can actually use for weaponry directly. It requires enrichment to increase the concentration of U-235 to weapons-grade levels.

  • Sure yeah, but like, I work remote and will always work remote (I live in a city with a pretty mediocre tech scene). On top of that, I work in a non-mainstream programming language (Haskell). So it's hard to envision what I could actually do.

    I'm very pro-union btw, it just seems like there are certain things that can sometimes make it more difficult to make happen

  • Generally agree with your points, even though I"m honestly not sure what a union would look like like in practice.

    But I just wanted to say that this job is definitely harder than plumbing. I usually do my own plumbing and it's not really that bad. It's not my favorite thing to do and can sometimes be a pain in the ass, but it's way less taxing imo.

    Teaching kids is hard as fuck though and good teachers are priceless. Honestly quality caregiving of any sort is massively underrated.

  • You do not understand how these things actually work. I mean, fair enough, most people don't. But it's a bit foolhardy to propose changes to how something works without understanding how it works now.

    There is no "database". That's a fundamental misunderstanding of the technology. It is entirely impossible to query a model to determine if something is "present" or not (the question doesn't even make sense in that context).

    A model is, to greatly simplify things, a function (like in math) that will compute a response based on the input given. What this computation does is entirely opaque (including to the creators). It's what we we call a "black box". In order to create said function, we start from a completely random mapping of inputs to outputs (we'll call them weights from now on) as well as training data, iteratively feed training data to this function and measure how close its output is to what we expect, adjusting the weights (which are just numbers) based on how close it is. This is a gross simplification of the complexity involved (and doesn't even touch on the structure of the model's network itself), but it should give you a good idea.

    It's applied statistics: we're effectively creating a probability distribution over natural language itself, where we predict the next word based on how frequently we've seen words in a particular arrangement. This is old technology (dates back to the 90s) that has hit the mainstream due to increases in computing power (training models is very computationally expensive) and massive increases in the size of dataset used in training.

    Source: senior software engineer with a computer science degree and multiple graduate-level courses on natural language processing and deep learning

    Btw, I have serious issues with both capitalism itself and machine learning as it is applied by corporations, so don't take what I'm saying to mean that I'm in any way an apologist for them. But it's important to direct our criticisms of the system as precisely as possible.

  • It's got nothing to do with capitalism. It's fundamentally a matter of people using it for things it's not actually good at, because ultimately it's just statistics. The words generated are based on a probability distribution derived from its (huge) training dataset. It has no understanding or knowledge. It's mimicry.

    It's why it's incredibly stupid to try using it for the things people are trying to use it for, like as a source of information. It's a model of language, yet people act like it has actual insight or understanding.

  • In my experience, your average software developer has absolutely terrible security hygiene. It's why you see countless instances of private keys copy/pasted into public GitHub repos or the seemingly daily occurrences of massive data breaches.

    My undergrad in CS (which I should point out, is still by far the most common major for software engineers) did not require a security course, and I'm fairly confident that this is pretty typical. To be honest, I wouldn't have trusted any of my CS professors to know the first thing about security. It's a completely different field and something that generally requires a lot of practical experience. The closest we ever got was an explanation of asymmetric vs. symmetric encryption. There was certainly no discussion of even basic things like how to properly manage secrets or authn best practices.

    Everything I know now as a senior software engineer about software security has come from experience on the job. I've been very fortunate to work at some places that take it very seriously (including a government contractor writing cybersecurity software for the Department of Defense) and learned a lot there. But a lot of shops don't have a culture that promotes good security hygiene, and it shows in the litany of insecure software out in the wild today.

  • It's not a terrible name, since it's derived from the mathematical construct of vectors as n-tuples. In the case of vectors in programming, n relates to the size of the underlying array, and the tuple consists of the elements of the vector.