I see a lot about source codes being leaked and I’m wondering how it that you could make something like an exact replica of Super Mario Bros without the source code or how you can’t take the finished product and run it back through the compilation software?

  • @[email protected]
    link
    fedilink
    1431 year ago

    4+4 is 8 But so is 6+2 And 7+1

    You can’t guess which two numbers I started with knowing just the answer

    Code is the same, just with much bigger numbers and more of them

    • @[email protected]
      link
      fedilink
      8
      edit-2
      1 year ago

      I would say that it’s more like 4+4=8 but the original could have been (1+1+1+1)+(3+1) or (2+2)+(1+2+1) etc.

      Basically it’s the same thing but if you really want to understand the code and modify it in any meaningful way you have to know how it was intended and not just the results.

      My point being that decompiling does give you something similar to the original. It’s not just a guess that gives you random code with the correct result, but it could be very different from the source code.

      The reason is that the compiler does a lot of things to make it more efficient but that just means that while 1+1+1+1 can be efficiently written as 4, there still is a good reason for 1+1+1+1 from a logical sense. For example, if you’re counting something, it would make sense to say 1+1+1+1. But if you’re looking at a specific value, maybe it makes more sense to just say 4.

  • Trigg
    link
    fedilink
    811 year ago

    It can be

    What it produces will typically not contain the original names for variables and functions, and will not retain comments. It takes a lot more effort to understand what the intention behind the code was.

    There’s also legality issues.

  • Dark Arc
    link
    fedilink
    English
    581 year ago

    I actually work on a C++ compiler… I think I should weigh in. The general consensus here that things are lossy is correct but perhaps non-obvious if you’re not familiar with the domain.

    When you compile a program you’re taking the source, turning into a graph that represents every aspect of the program, and then generating some kind of IR that then gets turned into machine code.

    You lose things like code comments because the machine doesn’t care about the comments right off the bat.

    Then you lose local variable and function parameter names because the machine doesn’t care about those things.

    Then you lose your class structure … because the machine really just cares about the total size of the thing it’s passing around. You can recover some of this information by looking at the functions but it’s not always going to be straight forward because not every constructor initializes everything and things like unions add further complexity … and not every memory allocation uses a constructor. You won’t get any names of any data members/fields though because … again the machine doesn’t care.

    So what you’re left with is basically the mangled names of functions and what you can derive from how instructions access memory.

    The mangled names normally tell you a lot, the namespace, the class (if any), and the argument count and types. Of course that’s not guaranteed either, it’s just because that’s how we come up with unique stable names for the various things in your program. It could function with a bunch of UUIDs if you setup a table on the compilers side to associate everything.

    But wait! There’s more! The optimizer can do some really wild things in the name of speed… Including combining functions. Those constructors? Gone, now they’re just some more operations in the function bodies. That function you wrote to help improve readability of your code? Gone. That function you wrote to deduplicate code? Gone. That eloquent recursive logic you wrote? Gone, now it’s the moral equivalent of a giant mess of goto statements. That template code that makes use of dozens of instantiated functions? Those functions are gone now too; instead it’s all the instantiated logic puked out into one giant function. That piece of logic computing a value? Well the compiler figured out it’s always 27, so the logic to compute it? Gone.

    Now all of that stuff doesn’t happen every time, particularly not all of those things are always possible optimizations or good optimizations … But you can see how incredibly difficult it is to reconstruct a program once it’s been compiled and gone through optimization. There’s a very low chance if you do reconstruct it, that it will look anything like what you started with.

    • @[email protected]
      link
      fedilink
      131 year ago

      Just wait until you see the crazy optimizers for embedded systems. They take the complete code of a system into consideration, and, in a number of compile passes, reuses code snippets from app, libraries, and OS layer to create one big tangled mess that is hard to follow even if you have the source code…

      • @[email protected]
        link
        fedilink
        41 year ago

        Isn’t that still the same exact process as a normal compiler except in the case of embedded systems your OS is like a couple kilobytes large and just compiled along with the rest of your code?

        As in, are those “crazy optimizations” not just standard compiler techniques, except applied to the entire OS+applications?

        • @[email protected]
          link
          fedilink
          English
          41 year ago

          The main difference is that when you compile a program for Windows, Linux etc., you have an operating system and kernel with their exposed functions/interfaces so even in a compiled program it’s pretty easy to find the function calls for opening a file, moving a window, etc. (as long as the developer doesn’t add specific steps hiding these calls). But in an embedded system, it’s one large mess without any interfaces apart from those directly on the hardware level.

        • @[email protected]
          link
          fedilink
          41 year ago

          In a way, yes. But it really creates a mess when the linker starts sharing code between your code of which you have sources, and then jumps in the middle of system code for which you don’t have sources. And a pain in the whatever to debug.

              • @[email protected]
                link
                fedilink
                11 year ago

                Does commercial mean closed source in this context though? It seems like a waste of resources not to provide the source code for an rtos.

                Considering how small in size they tend to be + with their power/computational constraints I can’t imagine they have very effective DRM in place so it shouldn’t take that much to reverse engineer.

                May as well just provide the source under some very restrictive license.

  • @[email protected]
    link
    fedilink
    56
    edit-2
    1 year ago

    The long answer involves a lot of technical jargon, but the short answer is that the compilation process turns high level source code into something that the machine can read, and that process usually drops a lot of unneeded data and does some low-level optimization to make things more efficient during actual processing.

    One can use a decompiler to take that machine code and attempt to turn it back into something human readable, but will usually be missing data on variable names, function calls, comments, etc. and include compiler-added optimizations which makes it nearly impossible to reconstruct the original code

    It’s sort of the code equivalent of putting a sentence into Google translate and then immediately translating it back to the original. You often end up with differences in word choice that give you a good general idea of intent, but it’s impossible to know exactly which words were in the original sentence.

  • @[email protected]
    link
    fedilink
    521 year ago

    You can. It’s called decompiling. Problem is you lose all the human friendly metadata that was in the original source code, meaning comments, variable names, certain code structures are lost forever because it was deleted in the compilation process. There are tools to help you reintroduce that stuff by going through the variables and trying to make sense out of what they were for but it’s super tedious. With new ai tech that can certainly be improved with AI guessing what they were for but you’ll never get the original meta data back.

    • Hjalmar
      link
      fedilink
      271 year ago

      Also if the code was run through an optimizer (which all modern games should be) the code is even harder to make sense of as it doesn’t necessarily have the same structure and the same variables as the original code

      • @[email protected]
        link
        fedilink
        111 year ago

        This is also very similar to if they ran the source code through an obfuscation tool. Some people do this with chrome extensions. Since they need to give you the source code for it to work on your machine they just change the variables to a, b, c, d and route things though unneeded functions so you don’t know why anything is happening.

  • @[email protected]
    link
    fedilink
    161 year ago

    The best and simplest explanation I’ve seen: The machine code tells the computer what to do while the source code tells the human why it’s doing it.

    Your computer doesn’t need all the “why” information to run the game, so the compilation process gets rid of it. What you’re left with are instructions on exactly what computations to do, and that’s all the computer needs.

    For example, you can see in the machine code that two numbers are being added together. What do those numbers mean and why are we adding them? The source code can tell you that this is code that controls movement, one of the numbers is a velocity, the other is the player’s current position.

    • @[email protected]OP
      link
      fedilink
      41 year ago

      Okay, I think that is sinking in.

      I was under the impression it wasn’t possible or just complete gibberish but it being just the results or instructions is helpful.

      • Ook the Librarian
        link
        fedilink
        4
        edit-2
        1 year ago

        Also, you only decompile to level of basic instructions that the processor understands. When you compile code to add two numbers, well, the processor only adds bytes. There are a quite a few steps that the compiler has to fill in.

        Ok, all that is not a big deal. But then you deal with compiler optimization. Optimizing basically tells the compiler to take its time and find some clever ways to save machine steps. So now the “standard way” for a compiler to implement adding numbers may have other stuff rolled into it because the compiler may see an opportunity to save steps in a seemly unrelated calculation by inserting steps into the addition it is implementing. Now it’s basically unrecognizable. A human didn’t write, and wouldn’t have written that mess that the decompiler gives.

        Edit: I would also like to add that when compile with the debugger flag, you are telling the compiler to produce decompilable code. Don’t change any steps and store variable names as written.

  • Rikudou_Sage
    link
    fedilink
    141 year ago

    As I’ve read somewhere once: it’s easy to make a burger out of a cow. Making a cow out of a burger is slightly harder.

    That means that compiling code is a lossy process - the original code is lost in the process and can never be recovered because it doesn’t exist anywhere anymore.

  • @[email protected]
    link
    fedilink
    111 year ago

    The compilation process discards information in the process leaving a many to one effect. A good decompiler allows one to retrieve a program that is functionally equivalent to the source code but not exactly the source code.

  • Kirk I. M.
    link
    fedilink
    10
    edit-2
    1 year ago

    @Squizzy
    Lots of other people have addressed this, so I won’t repeat the whole thing. You can absolutely do disassembly work, it’s just a pain in the rear.
    But it’s actually been done for Mario, since you brought it up:
    https://github.com/IsoFrieze/SMWDisX
    And also Pokemon.

  • @[email protected]
    link
    fedilink
    English
    81 year ago

    Well, actually it can be. It just takes a lot more to decompile code than compile it. Depending on the objective accuracy.

    Example: the Super Mario 64 Decompilation project. This was a project that used various debug data that was left in the rom to decompile the game back to a source code that compiled a byte accurate version of the rom. This took about 3 years and a lot of skilled developers to accomplish.

    Side note: Super Mario Bros wasn’t built using a compiled language, but rather Assembly. So technically that would be a Disassembly not a Decompilation.

  • @[email protected]
    link
    fedilink
    English
    31 year ago

    You can get close depending on the language by using decompilers. Usually though, they’re rough translations of what the decompiler thinks that the (compiled) machine code does. It’s not a 1:1 deal.

    Basically, a compiler translates the human-readable code to machine code that can actually be recognized and executed by your computer. A decompiler attempts to do the opposite, it translates the machine code back into the original language. But like some “translators”, it’s not always correct. That’s the hard part - once decompiled you will likely have a lot of blanks to fill in and bugs to fix before anything will be compilable again. You’ll likely never be able to get an exact copy of the original source code via decompiler.

  • @[email protected]
    link
    fedilink
    31 year ago

    Code can be decompiled, but generally the end result isn’t human readable. Just having the decompiler version isn’t that valuable. Having the source code as written is more helpful because you get the context of what things were named and how it was organized.

    Decompiled code is a bit like reading a book with all the nouns being random letters and verbs being random numbers.

    • TheVillageGuy
      link
      fedilink
      11 year ago

      Not completely random, every noun/verb would be translatable to a specific word/name. But also characters, there’d be many characters whose names, intentions and goals, relationships/links would also be in the same unreadable state. The storyline would likely not be chronological, but several actions and decisions by all kinds of actors would intertwine. It would be very hard to translate into a readable story, let alone so that it makes sense

  • @[email protected]
    link
    fedilink
    English
    2
    edit-2
    1 year ago

    Code can be decompiled into code that can be recompiled, but compilers translate the human readable code into code that is easier to understand for machines. So decompiled code often ends up being nearly undecipherable for humans, and can take a long time to try and decipher.