Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)CT
ChubakPDP11+TakeWithGrainOfSalt @ ChubakPDP11 @programming.dev
Posts
0
Comments
70
Joined
1 yr. ago

  • Thanks! You know last night I fed a model several books and asked it a lot of stuff. This did not come up.

    What i do is I glance at the book, then ask the model to explain it to me. I call the model 'Urkel the Wiz Kid'.

    I fed both CS books and CE books. Like Sipsers, A Quantative Approach, Automata of Aho, Harris and Harris, etc.

  • Thanks a lot my good man. Is this Monte? https://github.com/monte-language/monte I can't find the source code?

    Can you tell me what you think of the code my implementation emits when you got the time? Is it good, bad, medicore etc?

    So when you're building a language, should you always use a builder for the AST specifications? Becuase I figure, you don't need that in a language like Haskell or OCaml right?

    I currently have 3 projects I ping-pong.

    1- Marsh, a POSIX shell in C; 2- Awk2X; a translator of AWK to several languages, like C, Python Rust etc. This one is in Haskell 3- Cephyr; a C compiler in OCaml

    For Marsh, I am stuck in the job control section. I just can't seem to find a flow for redirecting input and output. I think I am striking a balance though. I revised the job control like 10 times, literally 10 times.

    But for Awk2X and Cephyr, I am kinda stuck at defining the AST. I learned Haskell just last night and I made this AST. I am not 'stuck' really, I will move on from it soon.

     haskell
        
    data UnaryOp
      = Prefix PrefixOp
      | Postfix PostfixOp
      deriving (Show)
    
    data PrefixOp
      = PrefixIncr
      | PrefixDecr
      | Plus
      | Minus
      | Not
      deriving (Show)
    
    data PostfixOp
      = PostfixIncr
      | PostfixDecr
      deriving (Show)
    
    data BinaryOp
      = Add
      | Sub
      | Mul
      | Div
      | Mod
      | Eq
      | Ne
      | Gt
      | Ge
      | Le
      | Lt
      | And
      | Or
      deriving (Show)
    
    data Lvalue
      = Unfixed String
      | Fixed String String
      deriving (Show)
    
    data Factor
      = Constant Constant
      | Lvalue Lvalue
      deriving (Show)
    
    data Constant
      = String String
      | Integer Int
      | Float Float
      | Regex String
      deriving (Show)
    
    data Expr
      = Unary UnaryOp Factor
      | Binary Factor BinaryOp Factor
      | Ternary Expr Expr Expr
      | StrCat [Expr]
      | Delete Expr
      | Call String [Expr]
      | Assign Lvalue Expr
      deriving (Show)
    
    data Stmt
      = Express Expr
      | For Expr Expr Expr Stmt
      | ForIn Lvalue Lvalue Stmt
      | DoWhile Stmt Expr
      | While Expr Stmt
      | If Expr Stmt [(Expr, Stmt)] (Maybe (Expr, Stmt))
      | Return (Maybe Expr)
      | Break
      | Continue
      | Compound [Stmt]
      deriving (Show)
    
    data Pattern
      = Begin
      | End
      | Expr Expr
      | ExprPair Expr Expr
    
    data PatternAction
      = WithPattern Pattern [Stmt]
      | JustAction [Stmt]
    
    data FunctionDefn = FunctionDefn
      { name :: String,
        params :: [String],
        body :: [Stmt]
      }
    
    data Element
      = PattAct PatternAction
      | FnDef FunctionDefn
    
    newtype Program = Program [Element
    
      

    Now for Cephyr, this one is a bit more complex. Before Cephyr I attempted to make a C compiler several times, and every time I got stuck at the AST.

    The problem is, I wanna use OCaml's modules in the AST. But I don't know how? I just said fuck it and used top-level types. I am here currently:

     ocaml
        
    type const_expr
    type type_decl
    type unary_op
    type binary_op
    type expr
    type stmt
    
    type literal =
        ConstLiteral of const_lit | CompoundLiteral of compound_lit
    
    and const_lit =
      | IntConst of int * int_suffix option
      | FloatCOnst of float * float_suffix option
      | CharConst of char * charset_prefix option
      | StrConst of string * charset_prefix option
      | IdentConst of string * const_expr
    
    and int_suffix =
        Long | LongLong | Unsigned | UnsignedLong | UnsignedLongLOng
    
    and float_suffix =
        Double | LongDouble
    
    and charset_prefix =
        Local | U8 | U16 | U32
    
    and compound_lit =
      | ArrayCompound of init_type * init_item list
      | StructCompound of init_type * init_item list
      | UnionCompound of init_type * init_item list
    
    and init_item =
      | RegularInit of expr
      | DesignatedInit of designated_init * expr
    
    and designated_init =
      | ConstExpr of const_expr
      | ConstExprPair of const_expr * const_expr
      | Ident of string
    
    and init_type = type_decl
    
    
    type const_expr =
      | UnaryConstExpr of { factor: literal; op: unary_op; }
      | BinaryConstExpr of { left_factor: literal; right_factor: literal; op: binary_op; }
    
    
    type primary_factor =
        Identifier of string | NestedExpr of expr | Literal of literal
    
    and expr =
      | Primary of primary_factor
      | Unary of { factor: primary_factor; op: unary_op; }
      | Subscript of { factor: primary_factor; op: subscript_op; }
      | Binary of { left_factor: expr; right_factor: expr; op: binary_op; }
      | Ternary of { cond: expr; if_true: expr; if_false: expr; }
      | Assignment of { lvalue: expr; rvalue: expr; op: assign_op; }
    
    and prefix_unary_op =
      | Negate            
      | LogicalNegate     
      | BitwiseNot        
      | AddressOf         
      | Dereference       
      | PreIncrement      
      | PreDecrement      
    
    and postfix_unary_op =
      | PostIncrement     
      | PostDecrement
    
    and unary_op =
      | PrefixOp of prefix_unary_op
      | PostfixOp of postfix_unary_op
    
    and subscript_op =
      | Index of expr
      | DotMember of expr
      | ArrowMember of expr
      | FunctionCall of expr list
    
    and binary_op =
      | Add               
      | Subtract          
      | Multiply          
      | Divide            
      | Modulo            
      | BitwiseAnd        
      | BitwiseOr         
      | BitwiseXor        
      | ShiftLeft         
      | ShiftRight        
      | LogicalAnd        
      | LogicalOr         
      | Equal             
      | NotEqual          
      | LessThan          
      | LessThanOrEqual   
      | GreaterThan       
      | GreaterThanOrEqual 
    
    and assign_op =
      | Assignment        
      | AddAssign         
      | SubtractAssign    
      | MultiplyAssign    
      | DivideAssign      
      | ModuloAssign      
      | AndAssign         
      | OrAssign          
      | XorAssign         
      | ShiftLeftAssign   
      | ShiftRightAssign
    
      

    It's still incomplete. I just found out I can use .mli files.

    I think Cephyr is the 5th reincaation of my OCaml C compiler. I just spend hours at the AST and get tired of it.

    I found lcc, by Fraiser et al:

    https://github.com/drh/lcc

    And I have the book too. I like the book but it's kinda useless for me because I wanna do SSA. These kinda tree-rewriting mumbo jumbo is too 80s for my taste.

    So any help is appreciated. Thanks.

  • He's dumb, but unlike dumb politicians who say retarded shit and their fans play it out as 'haha they're being retarded to troll you!' --- this retardation cannot be denied when hard science is at play. I'm not saying political science is not real, but just like front end development, you don't need a degree in it to be even the highest-ranked member of a political system (jk about front end, I myself am a drop out --- 3 semesters :D ). What I basically mean is, for those with even basic understanding of his ventures, we all see the emperor with no clothes on, and his fans cannot really do the same thing they do to a stupid politician and play it out as 'trolling'.

  • For anyone seeking to write their own database, I have one recommendation: the Tokyo Cabinet Library.

    Tokyo Cabinet abstracts away all the needs of writing your own serializers and deserilizers for binary formats. You can have hashtable databases , B+ trees and everything else all prepared for you under one roof.

    Of course that is if you have brain, and don't use a text storage format like JSON. If you use shit like JSON and YAML --- and add potentially hundreds of millisceonds of parsing time just to serialize data from text into machine-readable binary, then please submit your name and address to my council so we can get rid of you when we own the world.

    Cleansing of the undesirables aside (seriously, give me ONE good thing about text storage formats! They are EXCHANGE formats, not STORAGE formats!), Tokyo Cabinet is written in C so you can easily bind it with SWIG. But there's probably bindings around if you look.

  • I remember reading something about recursive languages in Aho's Automata Book. I will have to check it again. So Regular Languages can't be recursive? Is that what 'recursive' language even means? I have to read the book again.

    Thanks for your help man.

  • btw this is the ASDL I wrote:

     
        
    %{
    
    #include "mukette.h"
    
    
    static Arena *ast_scratch = NULL;
    
    
    #define ALLOC(size) arena_alloc(ast_scratch, size)
    
    
    %}
    
    md_linkage = Hyperlink(md_compound? name, string url)
          | Image(string? alt, string path)
          ;
    
    
    md_inline = Italic(md_compound italic)
         | Bold(md_compound bold)
         | BoldItalic(md_compound bold_italic)
         | Strikethrough(md_compound strike_through)
         | InlineCode(string inline_code)
         | Linkage(md_linkage linkage)
         | RegularText(string regular_text)
         ;
    
    
    md_header_level = H1 | H2 | H3 | H4 | H5 | H6 ;
    
    
    md_line = Header(md_compound text, md_header_level level)
        | Indented(md_compound text, usize num_indents)
        | LinkReference(identifier name, string url, string title)
        | HorizontalLine
        ;
    
    
    md_compound = (md_inline* compound) ;
    
    
    md_table_row = (md_compound cells, size num_cell) ;
    
    
    md_table = (md_table_row* rows, size num_rows) ;
    
    
    md_ol_item = (int bullet_num, md_list_item item) ;
    
    
    md_ul_item = (char bullet_char, md_list_item item) ;
    
    
    md_list_item = TextItem(string text)
            | OrderedItem(md_ol_item ordered_item)
            | UnorderedItem(md_ul_item unordered_item)
            | NestedList(md_list nested_list)
            ;
    
    
    
    md_list = (md_list_item* items) ;
    
    
    
    md_block = Pargraph(md_compound* paragraph)
        | BlockQuote(md_compound* block_quote)2
        | CodeListing(identifier? label, string code)
        | Table(md_table table)
        | List(md_list list)
        | Line(md_line line)
        ;
    
    
    markdown = (md_block* blocks) ;
    
    
    %%
    
    
    static inline void init_tree_scratch(void) { ast_scratch = arena_init(AST_ARENA_SIZE); }
    static inline void free_tree_scratch(void) { arena_free(ast_scratch); }
    
      

    I had an easier time parsing ASDL with Yacc. I still can't tell whether a grammar is LR, LL or RE, but I can tell that Markdown is not CFG.

    I just updated ASDL: https://github.com/Chubek/ZephyrASDL

    Apologies if I am too late on the documetnation. I am still trying to improve it by using it myself. I also wish to add an Ocaml target.

  • So I'll answer your question and ask a question back from anyone who can help me.

    RE the nesting, I was under the impression that they can't be combined when I made it. Then I read CommonMark's specs and it seems like it's possible. It would be miserable to do this with a syntax-directed translation. I used ASDL to write up a tree, and added some features to asdl(1) so they would be handled more properly. I am not sure whether I should use a parser generator for this, but the nesting can be handled by Lex's start conditions --- if I fail to do so, I may use a PEG generator.

    Now my question.

    I think nesting and recursion are a good case for using a push-down parser here --- I will still try and find a solution before I use an LR parser.

    I avoid using Yacc because I honestly have no clue how to use it with a language like Markdown.

    So my thinking is, I would just use a starting condition stack with Lex (I use Flex). It's pretty simple. Let's use a linked list so there are no limits.

     c
        
    
    struct stack { int state; stuct stack *next,  }
    
    struct stack *top, *bottom;
    
    void push ...
    
    int pop ...
    
    
      

    (I usually use typedefs though).

    So now we have a psuedo-pushdown parser. What are these called?

    I am still a beginner at this but one thing that worries me is, how would I handle the tree with this method?

    With Yacc or PEG parser generators, it is easy to assign a value to a reduction/closure. But with this method, Flex won't allow me. Unless I use too many variables.

    I think I may use peg(1). I can even do the same stack thingy with PEG.

    Any help is welcome.

  • Why did you use a ? as a prompt for E, but a >>> as a prompt for Python? I know CPython uses >>> in its termio prompt (and I don't know how they brought that to Windows?) but why would have E used ??

  • Definitely do. Or just pirate it, it does not matter. I pretty much doubt a cent of what you spend on that print or e-print will be given to any of the authors. Mostly likely, they have been paid beforehand, or were just so passionate on the subject, they did it pro-bono. The publisher is not one of those bougie 'boutique' publishers like No Starch Press, it's freaking Cambridge Press, they can take the loss.

    I do often buy books from the aforementioned 'bougie' and 'boutique' pub houses (mind the ASCII 32) and that's hard on me because I am cut off from international banking system and have to do it through a courier. But it's still worth it. Like 'Crafting an Interpreter' is a book I would spend money on. Or any of 'Pragmatic Bookshelf's books, they are great. But don't waste your money on this.

    I realize reading PDFs could be problematic on e-readers, so I will post a guide about making it easier. Check out my post history some time from now.

    Thanks.

  • Native Martians have their own concept of time; just like a typical white male CIS-gendered anglo-saxon Germanic Indo-European Westerner to assume they will be welcoming of your concept of 'time'. But this is a good thought experiment; when, or if, and I don't really care if it's either 'if' or 'when', subhuman scum that is the homosapiens goes to another planet to jerk off there, just as God cast us from Eden unto earth, what would be the temporal and calenderial considerations? I guess the current spacemen have something worked out?

    This book says there are two types of calendars. Those arithmetic, and those astronomical. Calendars could be both. For example, this book mentions the Astronomical Persian calendar, but there's also an arithmetic one. Arithmetic calendars are based on numbers, astronomical ones are based on stars.

    I guess humans, on other planets, would have this eternal sense of 'jetlag' or 'rocketlag' in this case, which would mean all types of astronomical calendars would be out of question. Our scum ape bodies has adapted to this distance from this star we call 'the Sun' (which I believe earns its name from the great mythological figures, 'Which a Chance of Meatballs') so an arithmetic calendar will probably used in Mars.

    Now I just read this book, and half-assedly, so I am not commentarian on the matter of calendars, so maybe it will turn out that on Mars, an astronomical calendar would work best?

    I shan't see, as I hopefully be dead by that time.

  • I'm a simp for Alan Kay and I goon to his SnappChat, but I subscribe to Andrew Appel's OnlyFans (along with many others), so I did not look much into SmallTalk when it comes to '70s languages'. I guess I should do that. Thanks. I could have never put two and two together to realize Python uses prototypes. This blows my mind. Funny thing is just the other day I wrote JavaScript's grammar. https://gist.github.com/Chubek/0ab33e40b01a029a7195326e89646ec5

    I guess I still got a lot to learn so better get moving. I guess by 'full semantic analysis' I meant do a a full type analysis 'before' you emit the bytecode, not after. What is the protocol here exactly? I have seen several variants and supersets of Python that do an ML-style type analysis. They achieve it via the `NAME [ ':' TYCON ]' syntax so the regular Python interpeter would still work.

    So thanks. Learned something with your post.

  • Well what you are describing are the abstractions upon these models. So the 'basis' still remains true. Think of it this way:

    • A modern machine is based on the RAM model;
    • The RAM model is based on the register model;
    • The register model is based on several lower-level abstractions, at the end of which we get to a Turing machine.

    So AFAIK, Sipser's does not explain more intricate models like RAM machine or register machine, I recommend spending some time prompting ChatGPT or reading some books on it (don't know any off hand).

    TL;DR: The formal languages described in Sipser's are the lowest levels of abstraction. Higher levels of theoretical abstractions do exist, which I described several here.

    At the end, remember that theory and practice are to be used in tandem, neither will achieve anything alone. Modern languages we use today once resembled a formal, theoretical language, a PDA made of PDAs. Look at the earliest C compilers, for example: https://github.com/mortdeus/legacy-cc/tree/master/prestruct. It's just a giant state machine.

    I highly recommend picking up an HDL (Hardware Description Language), or implementing one if you got the time. This shall give you a clear picture of how theory differs from practice.

    PS: Modern systems languages also never give a true picture of what programming is, and what should it be about. I recommend picking up a language closer to functional than imperative. I recommend OCaml, or Haskell if could be. That is the other side of formality.

  • I think any ML-based lanuage takes the cake. Like SML, OCaml, etc. These languages are really fluid.

    I think Rust would look 'good' if there was an EBNF grammar for it but all there is are the psuedo-EBNF notations in the specs.

  • You're right yeah. Hand-implementing lexers and parsers is kind of 'inane'. I'm not saying it's stupid. For a small grammar it makes sense. But for a big grammar, just use a PEG generator, or Yacc/Lex. Rust has Lalrpop and Java has ANTLR. There's truly no need to implement a parser from scratch. But people on the internet really seem to think using lexer and parser generators 'limits' them. There are some hacks involed in most Lex/Yacc or PEG specs, but at the end people should keep in mind that LR parsers MUST be generated!

    Maybe implement the scanner? Even that is kinda stupid. Unless you do what Rob Pike says: https://www.youtube.com/watch?v=HxaD_trXwRE