Advice for New Programmers: Choose Your First Language Wisely
I learned to program when I was 7, on a BBC Model B with a dialect of BASIC and two dialects of Logo installed. The most important thing for me when I was starting was to find something interesting to work on. It's hard to be motivated to learn to do anything without some reason. I was taught to write some simple games (guess the number) and then learned about simple drawing and wrote small graphical games. Back then, the difference in quality between what I could create and commercial games wasn't too huge: most commercial games were written by one or two people and the limitations of the machine were such that you didn't get significantly more complexity by adding more people.
There are a few potential pitfalls when you're learning to program on your own. In particular, it's very easy to pick up bad habits, and the more that you practice them the worse they get. This is most apparent if you try learning a language that is designed for industrial use first, rather than one designed for teaching. This is part of the reason why I advocate trying lots of different programming languages. Languages like Python make terrible first languages. Anything that describes itself as "multi-paradigm" generally means that it badly implements a lot of possible programming models.
If you want to learn object-oriented programming (and you almost certainly do), then the best place to start is Smalltalk. This, in a system like Squeak or Pharo, gives you an environment where you can inspect everything and where everything is an object. The language is simple enough that it takes half an hour or so to learn, but the time you spend trying it will teach you to write good object-oriented code in any language, even if you never touch Smalltalk again. In contrast, learning a language like Java or C++ first will give you a myriad of bad habits that are very hard to break.
I'd also strongly recommend learning an assembly language, or at the very least a low-level language like C. Again, you don't have to ever use them later, but understanding how high-level constructs map to something that the computer can execute is incredibly valuable.
Understanding the Underlying Theory of Languages
BBC BASIC was not a bad first language, in spite of the overwhelming prejudice against BASIC. It included an inline assembler, so you could get a feel for exactly how things executed on the 6502 processor in the machine, and let you directly manipulate memory via PEEK and POKE, typically for interacting with memory-mapped I/O. Unlike many contemporary dialects of BASIC, it supported structured programming, via subroutines.
I probably wouldn't recommend the BBC micro to new programmers today, because the sparsity of the development environment would be intimidating, but it's worth remembering that the first language that you learn almost certainly won't be the one that you use for real projects later. If it is, you're probably doing something badly wrong, because you won't ever fully understand the limitations of one language until you've learned a few more. There's a small chance that after learning a dozen languages you find the first one you learned was the best for everything that you need to do, but it's quite unlikely.
One trap that self-taught programmers often fall into is neglecting the importance of the underlying theory. I began learning to program when I was 7, and when I arrived at university I was pretty sure that I already knew all of the programming-related parts of the course. It turned out that my lack of knowledge of graph theory and complexity theory was a serious limitation.
In general, the first year of a computer science program tries to teach you two fundamental concepts. If you understand these two ideas well, then you should have no problem with the rest of the course. If you don't, then you'll struggle. These concepts are induction (recursion) and indirection (pointers).
Induction
Induction is the idea that you can define an infinite series by defining a few simple cases and defining a rule that allows you to reduce a complex sequence to a simpler one. For example, you can define multiplication in terms of addition as:
1) n × 0 = 0
2) n × 1 = n
3) n × m = n × (m - 1) + n
If you then want to evaluate 5 × 3, then you look for the first matching rule. It's the third, one, which gives: 5 × (3-1) + 5, or 5 × 2 + 5. Apply it again to that and you get 5 × (2-1) + 5 + 5, or 5 × 1 + 10. Now, the second rule applies and so the 5 × 1 part becomes 5, and so the result is 15.
This is very powerful, because it's simple pattern matching and then applying rules: something that the computer is very good at. You could implement this in a programming language as something like:
int mult(int x, int y) { if (y == 0) return 0; if (y == 1) return x; return mult(x, y-1) + x; }
Multiplication is a pretty trivial example: most computers have hardware for doing multiplication, and it's a lot faster than using this approach, but for more complex things, if you can understand the problem in terms of induction then you can implement it in terms of recursion. This is one of the reason why most universities teach Prolog to undergraduates: you can't write even simple programs in Prolog unless you understand induction, and once you understand induction you've got a very powerful tool to reason with.
Indirection
The other core concept, indirection, is fundamental to building complex data structures. It's simply the idea that a variable, rather than containing a value, can tell you where to look for a value. In a computer, your data is stored in memory and so a variable name is just a human representation of the address of something in the computer's memory. A pointer is just the address of another bit of memory, stored in memory.
Languages like C allow arbitrary levels of indirection, so you can have pointers to pointers to pointers and so on. To the computer, it's nothing special. Most computers don't distinguish between data and addresses (which has been the source of myriad security vulnerabilities over the years, but that's another story), so you can store an address anywhere you can store data. The computer just loads the value and, if you treat it as an address rather than data, and loads the value from that address.
Some IBM mainframes tried to optimise for this by having special values that, when you loaded them, would actually load the result. This caused some problems when people created circular sequences of these values, causing the computer to loop forever trying to find the result that wasn't just another pointer.
Ignore Theory at Your Own Risk
Pointers are one of those things that are hard to explain, because once you understand them you wonder how you (or anyone else) ever found them complicated, but they're one of the core building blocks of programs. One of the problems with using languages like Java or Python for teaching is that students only learn about references, which are a somewhat reduced version of a pointer, lacking the generality in the interests of ease of use. While Java references are easier to use correctly than C pointers, they're slightly harder to understand, because now you have some things that can have references to them (objects) and lots of other things that can't (references, integers, floating point values), and no obvious reason why not. In contrast, C lets you use the full expressiveness of the underlying machine, even if you then use that to shoot your own feet off.