Perl - How to the base

From Linux - Help
Jump to navigation Jump to search
Perl.png

PAGE WORK IN PROGRESS

My First program with Perl

Hello World!

A first little program and example is the traditional incantation to the programming gods.

 #!/usr/bin/perl -w
 
 print "Hello, world.\n";

The first line: #!/usr/bin/perl -w normally, Perl treats a line starting with # as a comment and ignores it. However, the # and ! characters together at the start of the first line tell Unix how the file should be run. In this case the file should be passed to the perl interpreter, which lives in /usr/bin/perl.

Let's have a look at the second line of our program: print "Hello, world.\n";. The print function tells perl to display the given text without the quotation marks. The text inside the quotes is not interpreted as code (except from some 'special cases') and is called a string. As we'll see later, strings start and end with some sort of quotation mark. The \n at the end of the quote is one of these 'special cases' – it's a type of escape sequence, which stands for 'new line'. This instructs perl to finish the current line and take the prompt to the start of a new one.

Program Structure

As we saw earlier, a line starting with a sharp (#) is treated as a comment and ignored. This allows you to provide comments on what your program is doing, something that'll become extremely useful to you when working on long programs or when someone else is looking over your code. For instance, you could make it quite clear what the program above was doing by saying something like this:

 #!/usr/bin/perl
 use warnings;
 # Print a short message
 print "Hello, world.\n";

Statements and Statement Blocks

If functions are the verbs of Perl, then statements are the sentences. Instead of a full stop, a statement in Perl usually ends with a semicolon, as we saw above:

 print "Hello, world.\n";

To print something again, we can add another statement:

 print "Hello, world.\n";
 print "Goodbye, world.\n";

We can also group together a bunch of statements into a block – which is a bit like a paragraph – by surrounding them with braces: {...}. We'll see later how blocks are used to specify a set of statements that must happen at a given time and also how they are used to limit the effects of a statement. Here's an example of a block:

 {
     print "This is";
     print "a block";
     print "of statements.\n";
 }

Do you notice how I've used indentation to separate the block from its surroundings? This is because, unlike paragraphs, you can put blocks inside of blocks, which makes it easier to see on what level things are happening. This:

 print "Top level\n";
 {
      print "Second level\n"; 
      {
           print "Third level\n";
      }
      print "Where are we?";
 }

is easier to follow than this:

 print "Top level\n";
 {
 print "Second level\n";
 {
 print "Third level\n";  
 }
 print "Where are we?";
 }

As well as braces to mark out the territory of a block of statements, you can use parentheses to mark out what you're giving a function. We call the set of things you give to a function the arguments, and we say that we pass the arguments to the function. For instance, you can pass a number of arguments to the print function by separating them with commas:

 print ("here ", "we ", "print ", "several ", "strings.\n");

We can also limit the amount of arguments we pass by moving the brackets:

 print ("here ", "we ", "print "), "several ", "strings.\n";

We only pass three arguments, so they're the ones that get printed:

 here we print

What happens to the others? Well, we didn't give perl instructions, so nothing happens.


Escape Sequences

 Escape   Sequence Meaning
 \t       Tab
 \n       Start a new line (Usually called 'newline')
 \b       Back up one character ('backspace')
 \a       Alarm (Rings the system bell)
 \x{1F18} Unicode character

White Space

White space is the name we give to tabs, spaces, and new lines. Perl is very flexible about where you put white space in your program. We have already seen how we're free to use indentation to help show the structure of blocks. You don't need to use any white space at all, if you don't want to. If you prefer, your programs can all look like this:

 print"Top level\n";{print"Second level\n";{print"Third level\n";}print"Where are
 we?";}

Personally, though, I'd call that a bad idea. White space is another tool we have to make our programs more understandable. Let's use it as such.

Number Systems

base 2, the binary system. In the binary system, one digit represents one unit of information: one binary digit, or bit.

As well as binary, there are two more important sequences we need to know about when talking to computers. We don't often get to deal with binary directly, but the following two sequences have a logical relationship to base 2 counting. The first is octal, base 8.

Eight is an important number in computing. Bits are organized in groups of eight to form bytes, giving you the range of 0 to 255 that we saw earlier with ASCII. Each ASCII character can be represented by one byte. As we said in the paragraph before, octal is one way of counting bits – it has, however, fallen out of fashion these days. Octal numbers all start with 0, (that's a zero, not an oh) so we know they're octal and proceed as you'd expect: 00, 01, 02, 03, 04, 05, 06, 07, carry one, 010, 011, 012...017, carry one, 020 and so on. Perl recognizes octal numbers if you're certain to put that zero in front, like this:

 print 01101;

prints out the decimal number:

 577

The second is called the hexadecimal system, as mentioned above. Of course, programmers are lazy, so they just call it hex. (They like the wizard image.)

Decimal is base 10, and hexagons have six sides, so this system is base 16. As you might have guessed from the number 1F18 above, digits above 9 are represented by letters, so A is 10, B is 11, and so on, all the way through to F which is 15. We then carry one and start with 10 (which, in decimal, is 16) all the way up through 19, 1A, 1B, 1C, 1D, 1E, 1F, and carry one again to get 20 (which in decimal is 32). The magic number 255, the maximum number we can store in one byte, is FF. Two bytes next to each other can get you up to FFFF, better known as 65535. We met 65535 as the highest number in the Unicode character set, and you guessed it, a Unicode character can be stored as a pair of bytes.

To get perl to recognize hex, place 0x in front of the digits so that:

 print 0xBEEF;

gives the answer:

 48879

Working with Simple Values

Types of Data

A lot of programming jargon is about familiar words in an unfamiliar context. We've already seen a string, which was a series of characters. We could also describe that string as a scalar literal constant. What does that mean?

It's a literal, because it's something that means what it says, as opposed to a variable. A variable is more like a pigeonhole for data; the important thing is to look inside it and see what it contains. A variable, such as $fish, is probably not going to stand for the word 'fish' preceded by a dollar sign, it's more likely to contain 'trout', 42, or –10. A literal, on the other hand, such as the string "Hello, world" is the piece of paper that goes into a pigeonhole – it doesn't stand for something else. It represents literally those twelve characters.

It's also a constant, because it can't change. Variables, as their name implies, may change their contents, but constants are written into the text of your program once and for all, and the program can't change that. Another way of expressing this is that the data is hard-coded into the program. We will see later how it's almost always better to avoid hard-coding information.

By calling a variable a scalar, we're describing the type of data it contains. If you remember your math (and even if you don't) a scalar is a plain, simple, one-dimensional value. In math, the word is used to distinguish it from a vector, which is expressed as several numbers. Velocity, for example, has a pair of co-ordinates (speed and direction), and so must be a vector. In Perl, a scalar is the fundamental, basic unit of data of which there are two kinds – numbers and strings.