Literature review on the benefits of static types, by Dan Luu
HaraldvonBlauzahn @ HaraldvonBlauzahn @feddit.org Posts 30Comments 139Joined 3 mo. ago
You can scientifically prove that one code snippet has fewer bugs than another though, and there's already mountains of evidence of static typing making code significantly less buggy on average.
Do you mean memory safety here? Because yes, for memory safety, this is proven. E.g. there are reports from Google that wide usage of memory-safe languages for new code reduces the number of bugs.
You can't scientifically prove that something is "better" than another thing because that's not a measurable metric.
Then, first, why don't the claims that statically compiled languages come with claims on measurable, objective benefits? If they are really significantly better it should be easy to come up with such measures?
And the second thing: We have at least one large-scale experiment, because Google introduced Go und used it widely in its own company to replace Python.
Now, it is clear that programs in Go run with higher performance than Python, no question.
But did this lead to productivity increases or better code because of Go being a strongly-statically typed language ? I have seen no such report - in spite of that they now have 16 years of experience with it.
(And just for fun, Python itself is memory safe and concurrency bugs in Pyhton code can't lead to undefined behaviour, like in C. Go is neither memory safe nor has it that level of concurrency safety: If you concurrently modify a hash table in two different threads, this will cause a crash.)
If you want to use dynamic typing without any type hints or whatever, go for it.
Oh, I didn't say that I want that, or recommend to use only dynamic languages. For my part, I use a mix, and a big part of the reason is that it can, depending on circumstances, be faster to develop something with dynamic languages.
Especially, the statement "there is no scientific and objective proof that using statically typed languages is generally better by some measure (except speed)", this does not mean that dynamically typed languages are generally or objectively better.
So, if type checking at compile time is universally good, why are there (to my knowledge) no modern and widely used Languages, perhaps with the exception of Pascal and Ada, where all arrays or vectors have a size that is part of their type?
If a program does type checking in advance of execution, it is by definition strongly typed.
Look here, under "typing discipline":
I don't know; I haven't caught up on the research over the past decade. But it's worth noting that this body of evidence is from before the surge in popularity of strongly typed languages such as Swift, Rust, and TypeScript.
Well, Lisp, Scheme and many more are strongly typed as well. The difference here is they are dynamically-strongly typed, where the evaluation acts as-if all types are not evaluated before run time.
This means essentially, that the type of a variable can change over its run time. And this is less relevant for functional or expression-oriented languages like Scheme, Scala or Rust, where a variable is in most cases rather a label for an expression and does not change its value at all.
In particular, mainstream "statically typed" languages still had
null
values rather thanOption
orMaybe
.
That again is more a feature of functional languages, where most things evaluate to expressions. Clojure is an example for this, it is dynamically - strongly typed and in spite of that it runs on the JVM, it does not raise NullPointerExeptions (the exception, so to speak, is when calling into Java).
And in most cases, said languages use type inference and also garbage collection (except Rust of course). This in turn results of course in clear ergonomic advantages, but they have little to do with static or dynamic typing.
An indisputable fact is that static typing and compilation virtually eliminate an entire class of runtime bugs that plague dynamically typed languages, and it's not an insignificant class.
Just another data point, for amusement: There is a widely known language that eliminated not only memory management bugs like "use after free", but also data races and similar concurrency bugs, in 2007. No, I am not talking about Rust, but Clojure. It does prevent data races by using so-called persistent data structures. These are like Python's strings (in that they can be input to operations but never change), but for all basic types of collections in Clojure, namely lists, vectors, dictionaries / hash maps, and sets.
And Clojure is dynamically typed. So, dynamically typed languages had that feature earlier. (To be fair, Rust did adopt its borrow checker from research languages which appeared earlier.)
"But", you could say, "Java was invented in 1995 and had already memory safety! Surely that shows the advantage of statically typed languages?!"
Well, Lisp was invented in 1960, and had garbage collection and was hence memory-safe.
I submit that laboratory-experiment-based understanding being valid in real-world use, in any domain, is itself a belief rather than knowledge
I dunnow, man. Did you use a plane recently? A computer? Something that contained electronics, like transistors? GPS? A weather forecast? All these are based in things like fluid physics, particle physics, quantum physics, electrodynamics, mathematics, and so on. Our modern world would simply not exist without it.
Granted, there are areas where applying the scientific method is harder. But we still do, for example in medicine. Why should this not be possible in software development?
Rust and OCaml use this.
you've re-invented a strongly typed, compiled language without adding any of the low hanging fruit gains of compiling.
This is aside from the main argument around static typing which claims advantages in developer productivity and eventual correctness of code.
Now, I think it is generally accepted that compilation with the help of static typing enables compilation to native code which leads to faster executables.
Now, did you know that sevral good Lisp and Scheme implementations like SBCL, Chez Scheme or Racket compile to native code, even when they are dynamically typed languages? This is done by type inference.
And the argument is not that these are as fast as C or Rust - the argument is that the difference might be significantly smaller than what many people believe.
So what scientific evidence has emerged in the mean time?
We know with reasonable certainty that memory-safety reduces memory bugs. This is valid for dynamically and statically typed languages.
However, under the assumption that dynamically typed programs do have a minimum amount of tests, we can't say that static type checking is generally a better or more efficient approach.
In my experience, tests can be a useful complement to specifications but they do not substitute them - especially, specs can give a bigger picture and cover corner cases more succintly.
And there are many things that tests can check which the respective type systems can't catch. For example, one can easily add assertions in C++ which verify that functions are not called in a thread-unsafe way, and tests can check them.
As far as I know, there is still no scientific evidence that static type checking is generally better by any objective measure.
Every time I try to introduce static type hints to Python code at work there are some weirdos that think it's a bad idea.
Saying that there is no objective evidence for something does not mean it is necessarily always a bad idea.
Myself, I've used C, C++, Go, Rust, Java, Python, and Python plus mypy at work, for example in areas like signal processing and research or industrial reak-time systems, as well as e.g. Forth, and have used Clojure, as well as Pascal, Common Lisp, Racket and Scheme at home, while also trying e.g. Scala and C#.
I think that dynamic languages have their place and that using them can be significantly quicker.
I tend to prefer statically typed languages when performance and / or strict real-time capabilities are an essential concern. But I have more than once combined them with tests in s dynamic language, like Python - because these are far quicker to write. At one point back in 2002, I have estimated that porting certain research code from C to Python took 1/14 of the time and lines of code compared to the original algorithm in C.
Also, there are areas which are better covered by tests than by the type systems I know. For example, while some languages like C, Rust or Pascal do have arrays where the array size is part of the type, many array processing packages like blitz++, Eigen, Numpy make array size and dimensions a dynamic property. And Rust does the same with vectors, which is usually the recommended way !
So, to sum it up, it is probably an area where blanket statements are not helpful.
As far as I know, there is still no scientific evidence that static type checking is generally better (in terms of developer productivity - not performance of a compiled program) by any objective measure.
I don't get the obsession with the scientific method.
Because actually knowing and understanding something is better than believing stuff. There is a lot of stuff that sounds plausible but is wrong.
Java is still interpreted. It compiles to bytecode for a virtual machine, which then executes a for a simulated CPU. The bytecode interpreter has gotten very good, and optimizes the bytecode as it runs
Modern JVM implementations use just-in-time (JIT) compilation of bytecode to native machine code. That can be faster than C code which is optimized without profiling (because the optimizer gets relevant additional information), and for example in the Debian computer languages benchmark game, for numerically-intensive tasks it runs typically at about half the speed of the best and most heavily optimized C programs.
And now, I have a bummer: These most heavily optimized C programs are not written in idiomatic C. They are written with inline assembly, heavy use of compiler intrinsics and CPU -dependent code, manual loop unrolling and such.
there's a reason why statically typed languages are considered more reliable and fast - it's because they are. There isn't some conspiracy to suppress Lisp.
Then why is the SBCL implementation of Common Lisp about as fast as modern Java? I linked the benchmarks.
An indisputable fact is that static typing and compilation virtually eliminate an entire class of runtime bugs that plague dynamically typed languages, and it's not an insignificant class.
Well, going back to the original article by Dan Luu and the literature he reviews, then why do we not see objective, reproducible advantages from this?
There are also in-between approaches. Python for example has optional typing (for type checks with mypy), whereas Lisp/SBCL allows type hints for performance. Racket and Clojure allow to add types as pre-conditions (Typed Racket and Clojure Soec).
And many modern languages like Scala or Rust mostly need types in the function signature - the rest of the time, types are usually inferred. Even languages which were rigorously typed in the past, like C++, have the auto keyword added which activates type inference.
So, Google was perhaps slightly terrified from the specter of an Internet without advertising, haha.
I was not meaning slices or array size checks at runtime.
The array size can be part of the static type, that is one that can be checked at compile time. Rust, Pascal, and Ada have such Arrays.
But if static typing is always better, why are these types rarely used?