Lesser known tricks, quirks and features of C

(jorenar.com)

224 points | by rramadass 4 days ago ago

97 comments

  • fuhsnn 4 days ago ago

    My recent favorite is glibc's hack to implement _Static_assert under C99: https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#56...

    It uses the constant expression to create a bitfield of size -1 when failed, and leaves the compiler to error on that as the intended assertion. The actual statement is an extern pointer to a function returning a pointer to an array which has sizeof the aforementioned bitfield struct as its size.

    Another one encountered in Toybox is (0 || "foo") being a const expression that evaluates to 1. Apparently the string literal must have been soundly created in data section, so its pointer address is safely assumed to be non-zero.

    • lifthrasiir 3 days ago ago

      You have missed one important thing: every passing assertion will define a single extern function pointer with the same signature, so multiple `_Static_assert` invocations can coexist in a single scope. An extern definition doesn't have to be a function pointer by the way, I guess it helped a linker to have an easier time when removing unused symbols.

      • fuhsnn 3 days ago ago

        Oops too late to edit, that's really a function prototype. So it wouldn't take storage space or affect symbol unless the user naughtily calls the __Static_assert_function.

  • wolfspaw 4 days ago ago

    Really liked the trick of defining the struct in the return part of the function.

    Array pointers: Array to pointer decay is extremely annoying, if it was implemented as Array to "slice" decay it would be great.

    Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

    flexible array member: extremely useful, and now there are good compiler flags for ensuring correct flexible array member usage

    X-Macro: nice, no-overhead enum to string name. Didn't know the trick

    Combining default, named and positional arguments: Named-arguments/default-arg, C version xD. It would be cool if it was added to C language as a native feature, instead of having to do the struct hiding macro.

    Comma operator: really useful, specially in macros

    Digraphs, trigraphs and alternative tokens: di/tri/graphs rarely useful, alternatives synonims of iso646.h are awesome, love using and/or instead of &&/||

    Designated initializer: super awesome, could not use if you wanted C++ portability. Now C++ supports some part of it.

    Compound literals: fantastic, but in C++ it will explode due to stack deallocation in the same line. C++ should fix this and allow the C idiom >/

    Bit fields: nice for more control of structs layout

    constant string concat: "MultiLine" String, C version xD

    Ad hoc struct declaration in the return type of a function: didn't know this trick, "multi value" return, C version xD

    Cosmopolitan-libc: incredible project. Already knew of it, its awesome to offer a binary that runs in all S.Os at the same time.

    Evaluate sizeof at compile time by causing duplicate case error: ha, nice trick for debugging the size of anything.

    • WalterBright 3 days ago ago

      > Array to pointer decay is extremely annoying, if it was implemented as Array to "slice" decay it would be great.

      It's not just annoying, it's the major source of bugs in shipped code. A fix:

      https://www.digitalmars.com/articles/C-biggest-mistake.html

      • wolfspaw 3 days ago ago

        I agree wholeheartedly, I really liked your article and fix.

        (In fact, I already had your article bookmarked xD, and I’m familiar with and truly admire your work)

    • fuhsnn 4 days ago ago

      >Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

      The first array size is actually always decayed to a pointer, supporting it in a compiler without analysis passes like TCC is just a matter of skipping the "static" token and the size.

    • jcelerier 3 days ago ago

      > Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

      C++ does?

          void print(const int (&array)[5]) {
            for(size_t i = 0; i < size; ++i)
              std::cout << array[i] << " ";
          }
      
      will fail at compile time if you pass it anything other than an int[5] array
      • mananaysiempre 3 days ago ago

        Including an int[6] array, unlike with int[static 5]. This is usually not what you want.

        • jcelerier 2 days ago ago

          > This is usually not what you want.

          very interesting comment considering I'm literally fighting with stupid languages with this kind of permissive rules right now, which definitely just create more bugs (for instance silently dropped values because an upstream API changed, added an element at the end of the list, you updated but since you get no error you now have to go through all the calls to check them one-by-one)

          • mananaysiempre 2 days ago ago

            Remember, in C you cannot use anything but a literal constant for the array size. My reference for how useful strict array-size matching can be under such circumstances is standard Pascal (as opposed to Modula-ish Pascals like Borland’s), and the answer there is that it more or less isn’t. Even in C, I’d expect at least some people would actually use things like int(*array)[5], given this syntax is valid even in C89, but in function signatures I’ve literally never encountered it.

            If the size could a (type-level) variable, that would be a very different proposition. But variables lead to expressions, expressions lead to functions, functions lead to suffering^W becoming a full-fledged dependently typed programming language—if not an Agda or an Idris then at least an ATS[1]. I’d welcome that, but as far as I can see the ergonomics are very much not there yet for any kind of low-level programming.

            [1] https://ats-lang.sourceforge.net/

            • jcelerier 19 hours ago ago

              > If the size could a (type-level) variable, that would be a very different proposition. But variables lead to expressions, expressions lead to functions, functions lead to suffering^W becoming a full-fledged dependently typed programming language

              I mean, that's just plain old C++. You could have compile-time expressions operating on compile-time numbers since the 90s ; it's a widely-used feature.

        • skribanto 3 days ago ago

          I believe int[6] can still be passed to static 5 but I would have to double check

    • xeyownt 3 days ago ago

      Pointer decay is not a mistake.

      It is what allows to do int * p = arr, and looping on array element with p++.

      Keeping array type you would jump beyond the last element at first iteration.

      • kaba0 3 days ago ago

        It is, an array and a pointer are different types. There could be ways to convert it to a pointer, but it shouldn’t happen at so many places, implicitly.

        • teo_zero 3 days ago ago

          Correct. A method to extract a pointer from an array already exists:

            int *p = &arr[0];
          
          The mistake is to allow this:

            int *p = arr;
          • uecker 3 days ago ago

            And yet, coding styles do not prohibit it and there is no compiler that has a warning.

  • saagarjha 4 days ago ago

    Mentioning %n without explaining that it is overwhelmingly used for exploits is a little reckless IMO.

    • _kst_ 4 days ago ago

      Background: A %n format specifier in a printf call stores the number of characters written so far into a specified variable. For example:

          #include <stdio.h>
          int main(void) {
              int count;
              printf("%s%n\n", "hello, world", &count);
              printf("count = %d\n", count);
          }
      
      The output is:

          hello, world
          count = 12
      
      %n can be exploited to write data to an arbitrary memory location, but only if the format string is something other than a string literal.

      %n can be exploited, but it's entirely possible to use it safely.

      • lifthrasiir 3 days ago ago

        I think another problem exposed by %n was that you can't easily compose format strings. Sure, `printf(str)` where `str` is a user input would be easy to detect and can be automatically turned into `printf("%s", str)` with some macro hack, but `printf(fmt, ...)` where `fmt` is composed from multiple partial format strings would be harder to reason.

    • greiskul 4 days ago ago

      I'm curious about this, didn't know about %n before. What are the common pitfalls and exploits using this enables?

      • mananaysiempre 4 days ago ago

        You would expect a printf call with a user-controlled format string to be, at worst, an arbitrary read. Thanks to %n, it can be a write as well.

      • lights0123 4 days ago ago

        If the user can control the formatting string, they can write to pointers stored on the stack. It's important to use printf("%s", str) instead of printf(str).

        • rep_lodsb 4 days ago ago

          Useless use of printf; what's wrong with "puts(str)"?

          • shawn_w 3 days ago ago

            puts() adds a newline at the end. gcc will happily turn printf("%s\n", str) into puts(str), though.

            I've never tested to see if printf("%s", str) becomes the equivalent fputs(str, stdout)

  • lifthrasiir 3 days ago ago

    I hate I know all of them...

    > Backslash line splicing

    One reason a trigraph was gone is that `??/`, a trigraph spelling for `\`, also acted like `\` in this context.

    > Using `&&` and `||` as conditionals

    Not only this is uncommon, but chaining them is not always correct because `a && b || c` doesn't equal to `a ? b : c` when `b` evaluates to false.

    > Compile time assumption checking using `enum`s

    Please use `static_assert` already.

    > Matching character classes with `sscanf()`

    This can be combined with `*` to ignore certain characters. For example `%*[ \t]` will skip all horizontal whitespaces, unlike a plain whitespace which will also skip newlines.

    > Detecting constant expressions

    This ultimately comes from C's weird way to say a null pointer, which is defined as any constant expression which type is inferred to be pointer. So a non-constant expression can be distinguished by multiplying it with a known zero constant.

    • fsckboy 3 days ago ago

      > `??/`, a trigraph spelling for `\`, also acted like `\` in this context.

      OF COURSE it should do what \ does, otherwise you have no other way to get a \

      the point of trigraphs is to allow characters to be entered that your character-set/terminal keyboard doesn't allow.

      • lifthrasiir 3 days ago ago

        That's technically true, but it could have been designed much better if that was the real intention:

        1. There is no real reason that trigraphs should be expanded inside a comment. Preprocessors can't make any additional comment, so the comment should be scanned and discarded as soon as possible but trigraphs somehow precede that.

        2. And this very behavior of backslash should have been also deferred as much as possible. ISO C already has two sets of doubly-quoted literals `"asdf"`, where one is used for normal string literals and another is used for preprocessing because `#include "foo\bar.c"` should refer to the file name that contains a backslash, not a backspace (`\b`). Since `#include FILENAME` is also possible, such literals may appear anywhere in the preprocessing line! Therefore we already have to defer processing of some backslashes, so why should remaining backslashes be processed that early?

        In my ideal design, a backslash is either a part of tokens (`+\<newline>=`, `foo\<newline>bar` or `"asdf\n\<newline>fdsa"`) or a standalone token optionally followed with a newline (`\<newline>`). No backslashes within comments are significant, effectively solving the first point. These tokens are then turned into normal tokens or whitespaces respectively, so they remain transparent to the parser. The trigraph could then have been allowed to replace backslashes in such cases (e.g. `+??/<newline>=`) without affecting remaining cases like comments.

        For the record, later digraphs are more or less designed as such but they lack backslashes, even though ISO/IEC 646 still doesn't contain backslashes for all charsets. This hints that the inclusion of trigraphs or digraphs was more due to vendor complaints (known to be IBM) than actual concerns from users who wouldn't be able to type backslashes if it were true.

        • fsckboy 3 days ago ago

          >1. There is no real reason that trigraphs should be expanded inside a comment.

          trigraphs are 100% substitutes for unrepresentable characters. they absolutely positively ALWAYS should be replaced by the character. Pretend it takes place before the character even arrives inside the comment, because it does.

          it's very much like the #define/#include c-preprocessor step, it happens first, that's what keeps it clean, understandable, manageable. (Sure you can have more complex macro systems, but they are... complex, they can get very ugly)

          if you know how to process a unix shell commandline, you know that there are layers to it. Trigraphs are just like that. If you don't know how a unix shell commandline is processed, learn it, it's worth knowing.

          • lifthrasiir 3 days ago ago

            I'm talking about why trigraphs had to behave in such way, not how. C and C++ have a concept of source character set and execution character set, which can diverge. Let's say trigraphs are indeed for unrepresentable characters, then in which character set are them unrepresentable? If the answer is for source, alternative spelling is sufficient and comments should ideally have no effect or users will be confused. If the answer is for execution, why do other characters have no equivalent?

            Also you should be aware that the macro expansion in C/C++ is not like a literal string replacement. `#define FOO bar` doesn't turn `BAREFOOT` into `BAREbarT` or `"OH FOO'S SAKE"` into `"OH bar's SAKE"`. (Some extremely old preprocessors did do so, by the way.) `#define FOO(x) FOO(x)` doesn't make `FOO(bar)` into an infinite recursion because `FOO` is prevented from expansion when `FOO` itself is being already expanded. There are certainly some layers, but they are not what you seem to think.

            • fsckboy 3 days ago ago

              you want to be able to convert source code from one system to another and back again, and you want to rules to be simple so that everybody who writes such a coverter gets it right, and you also don't want to think about a zillion edge cases. If the trigraphs exist on the wrong side of the conversion, flag them. otherwise, it's a very simple process.

              I was not talking about how the preprocessor is implemented, I was talking about the layering. You keep wanting to mix layers because you think you know better; thar be dragons.

              • lifthrasiir 3 days ago ago

                Layering is only valuable when that serves its goals well. I don't see any reason to have an additional layer in the language here. If you are thinking about a strict separation between preprocessor and parser, that is already known to be suboptimal in compilation performance decades ago. (As a related example, a traditional Unixy way to separate archiving and compression is also known to be inefficient; a combined compressing archiver is better in design.)

                • poincaredisk 3 days ago ago

                  I disagree with the downvotes here. C language "layers" are tricky to get right, source of footguns and a backdoor potential (especially the trigram that started this comment chain), and overall a bandaid invented when there were no better solutions (like modules, or Unicode). Trigrams are a weird archaic quirk of the C language (and no other modern language), and I'm glad to see them gone.

                  And since we're thinking about layers, character encoding hacks should be entirely outside of a programming language responsibility. Now that would be a proper layering.

  • coreyp_1 4 days ago ago

    That's a nice list!

    I've been digging into cross-platform (Windows and Linux) C for a while, and it has been fascinating. On top of that, I've been writing a JIT-ted scripting (templating) language, and the ABI differences (not just fastcall vs stdcall vs cdecl) are often not easy to find documentation about.

    I've decided that if I ever get to teach a University class on C again, I wanted to cover some of these things that I feel are often left out, and this list is a helpful reference! Thanks!

    • rramadass 3 days ago ago

      This is actually a pretty good list and that's why i submitted it to HN. The Chinese stratagem "Cast a Brick to attract Jade" is relevant here though i haven't yet seen much "Jade" from others :-) The author's presentation/explanation is also quite succinct and precise with references pointing to further details and thus the overall s/n ratio is very good. This is how tech stuff should be written (contrast with meandering articles with one technique being "explained" over five pages).

      Knowing these sort of techniques is important because they force you to think in different ways to solve a problem which expands one's mental design space. C (and C++) is particularly important here since it is the common "lingua-franca" across all system/application software from servers to desktops to itty-bitty MCUs.

      PS: Also see the book Fluent C by Christopher Preschern which while not dealing with "tricks" shows how to use C effectively using a pattern-like approach.

  • winocm 3 days ago ago

    There’s also the use of typedef to help make function declarations.

    Such as:

      typedef void fptr_t(int);
      fptr_t foo;
    
    That would effectively declare a function with the prototype: `void foo(int)'. This pattern is used quite a bit in BSD kernels.
  • guerrilla 3 days ago ago

    These are great. Most posts I read with titles similar to this are just the authors revealing that they don't know C very well but this one included some interesting things. I didn't know compund literals were lvalues but if you think about executable formats, it makes a lot of sense.

    • rramadass 3 days ago ago

      The references linked to are also a pretty good source of similar info.

  • jonathrg 4 days ago ago

    Multi character constants is one of the many things in C that would be nice to use if the language would just choose some well-defined behaviour for it. It doesn't really matter which.

    • mananaysiempre 4 days ago ago

      Mainstream compilers agree on multicharacter literals being big endian; that is, 'AB' is usually 'A' << CHAR_BIT | 'B'. The exception is MSVC, which also works like that as long as you don't use character escapes, but if you do it emits some sort of illogical, undocumented mess that looks like an ancient implementation bug fossilized into a compatibility constraint.

      • poincaredisk 3 days ago ago

        Your phrasing is a bit confusing. Multicharacter litersl being big endian (as you've defined it) means that it'll actually end up little-endian in memory (in little endian architectures, like x86). So 'ABCD' will end up as 'DCBA' in memory.

        Really interesting to hear about the escaping quirk, I need to test it.

  • golergka 4 days ago ago

        switch (n % 2) {
            case 0:
                do {
                    ++i;
            case 1:
                    ++i;
                } while (--n > 0);
    
        }
    
    Someone is really ought to record a "WAT" video about C.
    • mananaysiempre 4 days ago ago

      The switch statement in C is not a very limited pattern match. The switch statement in C is a very ergonomic jump table. Do not think ML’s case-of with only integer literals for patterns; think FORTRAN’s computed GO TO with better syntax. And it will cease to be a WAT. (For a glimpse of the culture before pattern matching was in programmers’ collective consciousness, try the series on designing a CASE statement for Forth that ran for several issues of Forth Dimensions.)

      • russellbeattie 4 days ago ago

        I don't think there's any confusion of how it works, it's the deep horror in discovering that it's possible in the first place, and a morbid curiosity of the chaos it could cause if abused.

        • mananaysiempre 4 days ago ago

          At least for me, the feelings you describe are characteristic of a footgun, not a WAT. A WAT is rather a desperate bewilderment as to who could ever design the thing that way and why, and for switch statements computed gotos are the answer to that question.

          As for the footgun issue, I mean, it could be one in theory, sure. But I don’t think I’ve ever seen it actually fired. And I can’t really appreciate the Javaesque “abuse” thinking—it is to some extent the job of the language designer to prevent the programmer from accidentally doing something bad, but I don’t see how it is their job to prevent a programmer from deliberately doing strange things, as long as the result looks appropriately strange as well.

          (There are reasons to dislike C’s switch statement, I just don’t think the potential for “abuse” is one.)

    • PhilipRoman 4 days ago ago

      Just think of the "case" statements like any other label, despite the misleading indentation. Then it becomes perfectly natural to jump in the middle of a loop.

    • rramadass 3 days ago ago

      This is actually pretty useful in some usecases. One very good example is Simon Tatham's "Coroutines in C" (https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html) to resume execution in a function after the point it returned from in the earlier call.

      The relevant example code is;

        int function(void) {
          static int i, state = 0;
          switch (state) {
              case 0: goto LABEL0;
              case 1: goto LABEL1;
          }
          LABEL0: /* start of function */
          for (i = 0; i < 10; i++) {
              state = 1; /* so we will come back to LABEL1 */
              return i;
              LABEL1:; /* resume control straight after the return */
          }
        }
      
      becomes;

        int function(void) {
          static int i, state = 0;
          switch (state) {
              case 0: /* start of function */
              for (i = 0; i < 10; i++) {
                  state = 1; /* so we will come back to "case 1"*/
                  return i;
                  case 1:; /* resume control straight after the return */
              }
          }
         }
    • tom_ 4 days ago ago

      This sort of thing is pretty handy sometimes. Don't forget you can have code (e.g., start of the loop) before any of the cases too!

    • lifthrasiir 3 days ago ago

      Just turn every single IOCCC winning entry into a video, you now have a year's supply of contents.

    • 4 days ago ago
      [deleted]
    • agumonkey 4 days ago ago

      I wonder if there's any other instance (in programming or else) of intersecting grammar constructs being accepted.

      • 082349872349872 3 days ago ago

        control "structures" in forth, although in this case the notion of "grammar construct" is more in the head of the user than in the implementation...

        • agumonkey 3 days ago ago

          yeah that's too loose :D

    • pjmlp 3 days ago ago

      C definitly belongs into the set of WAT languages.

  • johnklos 3 days ago ago

    Not sure what happened:

    404

    File not found

    The site configured at this address does not contain the requested file.

  • 38 3 days ago ago

        > int (*ap1)[10] = &arr;
    
    Wow that's garbage syntax. With Go it would be

        var ap1 *[10]int = &arr
    • rramadass 3 days ago ago

      I actually find the C syntax easier to read and understand.

      • lifthrasiir 3 days ago ago

        How about the following then? I can read them but by no means they are intuitive.

            int *x[10]; // How is this different from `ap1` above?
        
            int (*f (void))[10];
        • eMSF 3 days ago ago

          Well, obviously it doesn't have parentheses. It's not like this is the only instance where adding parentheses affects the end result.

          You could write even more complex declarators (but don't have to), but that would not prove that some other syntax is inherently intuitive. Case in point, I cannot parse the Go syntax as I do not know Go.

          In my experience pointers to arrays are rather uncommon and I'm not sure that I've ever written a function returning one, having even less of a need for a pointer to such. (Thus out of all these, only your first example is somewhat common in practice.)

          • lifthrasiir 3 days ago ago

            > I cannot parse the Go syntax as I do not know Go.

            Or you probably never even tried. You should be immediately able to parse it if I provide a hint that `*`, `&` and `[10]` mean roughly the same thing as C, because `*[10]int` has no reasonable reason to be parsed as an array of 10 copies of something. You can't do so in C.

          • rramadass 3 days ago ago

            Right. These are just "old chestnuts" used to scare C noobs particularly in interviews. IIRC the K&R C book itself had a example program to convert C declarations to English and also there exists a utility program called "cdecl" to do the same.

            • lifthrasiir 3 days ago ago

              Better to use that English explanation as a model of readable syntax.

              • rramadass 3 days ago ago

                It is but you just have to know how to map it.

        • rramadass 3 days ago ago

          These are all simple (not necessarily intuitive) if you know how operator binding works in C, (using braces to highlight);

            int *x[10]; ---> { int* } x[10];
          
            int (*f (void))[10] ---> int { { (* {f (void) } ) } [10]; }
          
          The point is that once you have had some practice you can work it out and Go's syntax is not necessarily much better.
          • lifthrasiir 3 days ago ago

            Your original comment said it "is easier to read and understand", not "can be worked out after some practice". Of course it is not like you should inline every `typedef`s into a single mess of complicated types, but you never said that you believe only simplest types should be used and C syntax is easier for them.

            In any case, I think Go is a clear winner here because all logical types are consecutive tokens. For example `f` in my example is (as you correctly parsed) a pointer to a function that returns an array of 10 integers, but that return type is normally written `int [10]` or `int NAME[10]`, while here is written in two chunks `int` and `[10]` with a big parenthesis inside.

            • rramadass 3 days ago ago

              Again, my using "is easier to read and understand" was w.r.t. the parent's claim w.r.t. Go's syntax. You understood it wrong to mean an absolute general case.

              You need practice for complicated things and that is what i was pointing out with "can be worked out after some practice" and not that everything trivial needed practice.

              "Go is a clear winner here" is your claim and not necessarily one that i agree with since as mentioned, knowing the binding rules and a little practice complicated declarations are not that big of a deal.

              • lifthrasiir 3 days ago ago

                Agreed that it's not actually a big deal (hence "here"), but it does strengthen a point that the C syntax wasn't designed carefully after all. The current C type syntax was completely accidental and any reasonable design could have avoided that. If that was too late for some reason, one could have defined a new parallel syntax that solves this problem. In fact C++ did so via its new function declaration syntax `auto f(...) -> ...`. Guess why...

                • rramadass 3 days ago ago

                  > The current C type syntax was completely accidental and any reasonable design could have avoided that.

                  Absolutely baseless claim.

                  The C syntax and language is the product of a small group (not a committee) of smart people with the goals of syntactical brevity, close to machine architecture (PDP-7/11) and the design goal of building along the path of BCPL->B->C. Dennis Ritchie himself explains the rationale in his paper The Development of the C Language and so one does not need to make untenable assumptions. The enduring success of the language (even in the face of all the developments since then in Computer HW and PLT) is proof of the validity of its design goals. Its "Abstract Machine" is simple and there is no complicated Object Model with the syntax merely being a thin veneer over a sequence of bytes. Contrast it with most modern languages (which seem to be designed to solve world peace/hunger and everything in between) and C appears more and more relevant these days. C++ used judiciously without a lot of "new features" introduced by the standards committee (the bane of the language) takes it to the next level sweet spot.

                  • 38 3 days ago ago

                    > C appears more and more relevant these days

                    C is not relevant any more, not sure what world you are living in. it only has any relevance because it was the best option at the time decades ago, and so people are forced to use it when making syscalls. thats it.

                    • rramadass 2 days ago ago

                      What? There are more embedded devices than ever running C/C++ code today. All OSes, System utils etc. are still done in C/C++. All higher level performance oriented frameworks/libraries in any domain (eg. AI/ML) are implemented in C/C++ and then a interface to them are given through wrappers in other languages. Also C is the common "lingua-franca" across languages.

                      C is still in the top five in the TIOBE index today.

                      • 38 2 days ago ago

                        > All OSes, System utils etc. are still done in C/C++.

                        first of all, no. plenty of OS are made in other languages. also, the big OS WERE written in C, and only remain so in order to avoid redoing millions of lines of code.

                        > and then a interface to them are given through wrappers in other languages

                        again this is only done because the OS is using an outdated language, so people are forced to work with it.

                        > C is still in the top five in the TIOBE index today

                        that doesnt matter, this does:

                        https://madnight.github.io/githut/#/pull_requests/2024/1

                        • rramadass 2 days ago ago

                          [flagged]

                          • 38 2 days ago ago

                            > Github by itself is not enough; there are orders of magnitude more code outside of it and hence your assumption is wrong. Also most C/C++ folks prefer to keep code local (proprietary and personal reasons) and hence are not sampled.

                            "you're wrong man, C totally has a bunch of code being used thats private, I swear". you could say that about every single other language. only thing that matter is what can be measured. C is dead man, you are just in denial. its an old crap language that hasn't been relevant in at least a decade. if you need some evidence, just look to the fact that after decades it still doesn't have a package manager, so many people laughably just vendor code when working with C projects.

                            • rramadass a day ago ago

                              [flagged]

                              • dang 8 hours ago ago

                                You broke the site guidelines egregiously in more than one place in this thread. That's not allowed here and we ban accounts that do it. Moreover, you've done it repeatedly in other places also, e.g.:

                                https://news.ycombinator.com/item?id=41601160

                                https://news.ycombinator.com/item?id=41590528

                                https://news.ycombinator.com/item?id=41563488

                                If you keep doing that, we're going to have to ban you. I don't want to ban you! Therefore if you'd review https://news.ycombinator.com/newsguidelines.html and stick to the rules from now on, we'd appreciate it.

                                Among other things, that means not posting any more personal attacks.

                              • lifthrasiir a day ago ago

                                TIOBE can remain transparent and also be seriously flawed in its methodology, which has been already questioned for many years. No popularity indicators would be entirely free from flaws, but anyone who are aware of TIOBE's possible flaws always quote multiple indicators including PYPL [1] and RedMonk Top 20 [2]. That's how you can remain convincing even with only possibly flawed data sources.

                                While C/C++ will remain an important language for many years, its continuing decline is also clear from those indicators. In fact even the most recent TIOBE has reported the lowest ever position (4th) for C, and that fact was already well known for other indicators: PYPL indicator for C was roughly in decline for a decade, so did its RedMonk ranking as well. They both estimate the current use of given language by looking at popularity of tutorials or questions, while TIOBE estimates the cumulative use of given language. All things equal, TIOBE is going to be systematically delayed compared to others, yet still TIOBE ranking for C is now falling and there is no reason to believe otherwise.

                                [1] https://pypl.github.io/PYPL.html

                                [2] https://redmonk.com/rstephens/2024/09/12/top20-jun2024/

                                • rramadass 8 hours ago ago

                                  What you are pointing out is nothing revelatory. That a statistical index is dependent on data and methodology is a vacuously true statement and not an argument. That there are multiple indexes trying to measure the same thing is also true and not an argument. The point was to show an index which is well respected (criticisms not withstanding) as a counter to silly claims.

                                  I personally do not place much stock in these rankings since all of them are flawed in their sampling methodology due to using only publicly accessible indicators like google searches, stack overflow questions, job postings, Github and similar public repositories etc. C programmers on average are more experienced and hence have no need for these. They are already aware of most of the Good/Bad/Ugly about the language and are used to working out problems for themselves and hence don't show up in these metrics. Thus C/C++ rankings might appear to be waning when they are holding steady or rising much more slowly w.r.t. others.

                  • lifthrasiir 3 days ago ago

                    Not exactly, read the exact paragraph in The Development of the C Language:

                    > [...] In all these cases the declaration of a variable resembles its usage in an expression whose type is the one named at the head of the declaration.

                    > The scheme of type composition adopted by C owes considerable debt to Algol 68, although it did not, perhaps, emerge in a form that Algol's adherents would approve of. The central notion I captured from Algol was a type structure based on atomic types (including structures), composed into arrays, pointers (references), and functions (procedures). Algol 68's concept of unions and casts also had an influence that appeared later.

                    Algol 68 by the way had a sensible type syntax that is akin to Go's. Ritchie "explains" the syntax difference by having the same syntax for the type and the expression, but that's not exactly why. For example there are actually two expressions relevant here, `*` dereference and `&` reference. Why was the former used? Why couldn't `*` be made a postfix at this point if the same syntax was a big concern? Ritchie himself never fully elaborated on those points, and I think that's because it was never his primary concern at all.

                    As you have noted, it is very important to realize that C did have their design goals (for which C did a very excellent job). On the other hand it would be very misleading to claim that C had some bigger visions even at that time and they validate its design! Ritchie and others were clearly smart but they didn't design C to be something magnificant, it was a rough product from various trade-offs they had to endure. So why this particular syntax? Well, because B already picked a prefix `*` which Ritchie didn't want to change at all, and he allowed it to infect all other aspects of the type syntax without much consideration. (Or more neutrally, Ritchie couldn't figure out any other option when he had to keep the B compatibility. But keep in mind that B have already changed that operator from BCPL.)

                    Technically speaking it was somehow designed, but only by following the path of the least resistance, not a solid reasoning, hence my use of the word "accidental". There are many other examples, such as the logical `&` (etc.) being renamed to `&&` but the bitwise `&` keeping the original precedence order because nothing was done. To be fair to Ritchie though, it is not his fault, but rather a fault of the whole software community that was fixated on this ancient language designed for specific purposes way too long.

                    • rramadass 3 days ago ago

                      The relevant paragraphs from Ritchie's paper are not just what you quoted (much of your comment is not relevant) but much more;

                      For each object of such a composed type, there was already a way to mention the underlying object: index the array, call the function, use the indirection operator on the pointer. Analogical reasoning led to a declaration syntax for names mirroring that of the expression syntax in which the names typically appear. Thus,

                           int i, *pi, **ppi;
                      
                      declare an integer, a pointer to an integer, a pointer to a pointer to an integer. The syntax of these declarations reflects the observation that i, pi, and ppi all yield an int type when used in an expression. Similarly,

                           int f(), *f(), (*f)();
                      
                      declare a function returning an integer, a function returning a pointer to an integer, a pointer to a function returning an integer;

                           int *api[10], (*pai)[10];
                      
                      declare an array of pointers to integers, and a pointer to an array of integers. In all these cases the declaration of a variable resembles its usage in an expression whose type is the one named at the head of the declaration.

                      The above can be summarized as the following two points;

                      1) "Declaration reflects Use" (from K&R C book)

                      which leads us to the corollary,

                      2) Syntax is variable-centric rather than type-centric i.e. You look at how the variable is supposed to be used and then work out its type.

                      > To be fair to Ritchie though, it is not his fault, but rather a fault of the whole software community that was fixated on this ancient language designed for specific purposes way too long.

                      Again, this is completely baseless and merely your opinion. I have already pointed out the main design goals which drove its design and the fact that it is still relevant today is proof of that. The "simplicity" of its very design is its greatest strength. The fact that modern needs (eg. Strong Type Safety, Multi-paradigm, Security, Concurrency etc.) require us to design/seek out more language features is not a reflection on the language itself since it was designed well before these "wants" became "needs". On the other hand the various extensions of C (eg. Objective-C, Concurrent-C, Handel-C etc.) are proof of its versatility and extensibility and hence its enduring relevance.

                      • lifthrasiir a day ago ago

                        Not only you swiftly dismissed most of my comment without even trying, but you have no backing evidence for why declaration has to reflect use (K&R has no rationale for that AFAIK). And even that principle doesn't imply your claim; it only means that syntax changes to the declaration and uses have to be synchronized.

                        > On the other hand the various extensions of C (eg. Objective-C, Concurrent-C, Handel-C etc.) are proof of its versatility and extensibility and hence its enduring relevance.

                        Most popular enough languages will eventually have tons of extensions, most of which would be obscure and not known to the general public though. I'm not even sure which "concurrent" C you are talking about (there had been multiple extensions of the same name in the literature, none of them ever gained traction in the industry). Have you ever actually seen or used suggested extensions firsthand?

                        Also extending a language is actually kinda easy and doesn't imply some kind of quality. You seem to value your own decade-long experience, but I had been also a programming language researcher decades ago and I know that because any PL researcher will make new languages all the time. Or let's put in this way: Brainfuck has been extended and adapted so many times [1]. Does that make Brainfuck "versatile and extensible" in your definition? Is the enduring relevance even a necessary consequence of such qualities? Think about that.

                        [1] https://esolangs.org/wiki/Brainfuck#Related_languages

                        • rramadass 14 hours ago ago

                          [flagged]

                          • lifthrasiir 5 hours ago ago

                            I already had spent too much time on this discussion so this comment will be the last one for anyone still following this.

                            > Thus Asking questions like "why was * used for pointer syntax" is meaningless since Ritchie himself says he took it from B; [...] Thinking about the evolutionary path BCPL->B->C answers your other questions.

                            It rather means that Ritchie took it from B and didn't feel like it should change. It should be noted that B did change its dereference syntax from BCPL, which eventually settled on `!e` and `e1!e2` for `*e` and `e1[e2]` in C (with everything implicitly being a byte pointer). As far as I'm aware there is no documented reason why Thompson turned `!` into `*`, which by the way would be used in too many places at this point until C moved away from BCPL-style escape sequences in string literals. Maybe Thompson's other language, Bon, has some clue but I have no information about that. In any case Ritchie clearly didn't think far into this particular change, because C was originally conceived as a largely compatible extension to B (then NB) and then stuck. Isn't that accidental enough?

                            > Now you can see how "Declaration reflects Use" and variable-centric syntax makes sense.

                            There are multiple concrete implementations for that driving principle. `int a;` and `a int;` would have been equally okay under this principle, so the only real reason to pick the former is the influence of B (`auto a;`). I wondered that you are using that term only to mean C's strictest implementation, and that you actually don't like ANSI/ISO C's newer function declaration syntax, but then your arguments for the general principle would not back the eventual syntax used by C.

                            > Nothing comes close to the "C family of languages" by which i mean not just extensions but anybody who took inspiration from it.

                            There are many different classes to consider. In the broadest sense even Python is said to be inspired by C, even though it would be absurd to consider Python to be a proof of C's inherent quality in addition to its popularity. Some languages are also only syntactically similar to C because its block syntax did have some nice syntactic property (known as "curly-brace languages" nowadays). Those superficial similarities can't imply your claim.

                            > There is only one "official" one, that was designed by Narain Gehani and explained in the book "The Concurrent C Programming Language".

                            It was never standardized, and apparently it wasn't available much outside of AT&T Labs. If the book would make it somehow official, I can write and publish my own book with the same title today. In any measure, that Concurrent C language is not as notable as other concurrent languages based on or influenced by C.

                            > By design and accident C turned out to be a pretty good Core/Kernel language for others.

                            There are a lot of extension languages that are NOT based on C or C++. In fact, I believe every single language in the TIOBE Top 20 ranking has at least 10 notable ones in average. C was used for extension only because it was popular for many years, and many such extensions had to work around or bend its limitations for their purposes. (For example, C itself doesn't have types for distinct address spaces and many extensions add reserved keywords, which you can add to any existing languages as I noted earlier.)

    • pavlov 3 days ago ago

      Maybe you missed the part where this is C, you know, the language designed by many of the same people as Go but 35 years earlier.

      It would be a time warp worthy of the Rocky Horror Picture Show if C's design could take syntax ideas from Go.

  • ranger_danger 4 days ago ago

    > quirks and features

    Someone is a fan of Doug DeMuro.

    • randomdata 4 days ago ago

      This... is the 1972 Riche C

  • 4 days ago ago
    [deleted]
  • o11c 4 days ago ago

    Bah, those are all well-known.

    What value does the following program return?

        int main()
        {
            int *p = 0;
    
        loop:
            if (p)
                return *p;
    
            int v = 1;
            p = &v;
            v = 2;
            goto loop;
            return 3;
        }
    
    Also, rather than doing `sizeof` via one error at a time, it's better to just emit them to a char array {'0' + sz/10, '0' + sz%10, '\0'}. Generalizing this to signed numbers of arbitrary size is left as an exercise for the reader.
    • _kst_ 3 days ago ago

      It returns 2.

      The only reason that might be surprising is that the "return *p;" statement refers to the value of an object at a point (textually) before its definition. But the lifetime of the object named "v" begins on entry to the innermost compound statement enclosing its definition -- in this case the body of "main".

      Space for "v" is allocated on entry to "main". It's initialized to 1 when its definition is reached. The "return *p;" statement appears before the definition of "v" in the program source, but is executed after its definition was reached at run time, and within its lifetime.

      It's important to remember that scope and lifetime are two different things. The scope of an identifier is the region of program text in which the identifier is visible; for "v" it extends from the definition to the closing "}". The lifetime of an object is the time span during execution in which it exists; for "v" it extends from the time when execution reaches the opening "{" to the time when execution reaches the closing "}". Formally, storage for "n" is allocated at the beginning of its lifetime and deallocated at the end of its lifetime. Compilers can and do optimize allocation and deallocation, as long as the visible behavior is consistent.

      Aside: If "v" were a VLA (variable length array, introduced in C99, made optional in C11) its lifetime would begin when execution reaches its definition.

      • shultays 3 days ago ago

        Can't it reuse v's memory for other things before v is defined? Say there is "int a = 4;" at the beginning of main that is no longer used when it reaches "int v = 1;", can't a & v share same memory location?

        • _kst_ 3 days ago ago

          A compiler can reuse memory as much as it likes -- but only if the visible behavior of the program is consistent with the language requirements.

          If you write:

              {
                  int n = 42;
                  printf("%d\n", n);
              }
          
          in the abstract machine, `sizeof (int)` bytes are allocated on entry to the block and deallocated on exit, but a compiler can legally replace the entire block with `puts("42")` and not allocate any memory for `n`.

          Memory for objects defined in nested blocks is logically allocated on entry to the block, but compilers commonly merge the allocation into the function entry code. Even so, objects in parallel blocks can certainly share memory:

              int main(void) {
                  {
                      int a;
                  }
                  {
                      int b; // might share memory with a
                  }
              }
          
          Logically, memory for `a` is allocated on entry to the first inner block, and memory for `b` on entry to the second inner block. Compilers will typically allocate all the memory on entry to `main`, but can use the same address for `a` and `b`.
        • mananaysiempre 3 days ago ago

          As written, without introducing VLAs or additional blocks, no. C23 §6.2.4(5–6):

          > An object whose identifier is declared with no linkage and without the storage-class specifier `static` has automatic storage duration [...].

          > For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial representation of the object is indeterminate. If an initialization is specified for the object and it is not specified with `constexpr`, it is performed each time the declaration or compound literal is reached in the execution of the block [...]; otherwise, the representation of the object becomes indeterminate each time the declaration is reached.

          That is, a local variable is live from the moment the block that contains its declaration is entered (however and wherever that happens) until it is left (ditto), but is initialized or, for lack of a better word, uninitialized each time execution passes that declaration (however many times that happens, including none). This is despite the fact that at compile time the variable’s name is not in scope until the = introducing its initializer (or the place where such a = would go if there isn’t one). Modulo its smaller feature set, C89 §6.1.2.4(3) stipulates the same.

          In addition to GGP’s deliberately confusing example, this permits the much more reasonable and C89-compatible

            switch (x) {
                int i, j;
            
            case 1:
                /* use i and j */
                break;
            
            case 2:
                /* use i and j */
                break;
            }
          
          The only exception is locals of variably modified type (e.g. variable-length arrays), whose declarations you can’t jump over on pain of undefined behaviour.

          No wonder basically every C compiler allocates a single stack frame at function entry.

    • sweeter 4 days ago ago

      Is it 2? I'm not exactly sure though. I'm interested in hearing the logic