A simple "hello world" program in Perl. But not just a one liner .... in fact, we're going to put some stuff in this that will provide us with a great start for any Perl program.
Downloading Text And Images - Updated
Note: this will only work on http sites, not https. See previous video.
In "The Social Network", the character based on Mark Zuckerberg uses Perl to download images from the Internet. It's easier than you think, and we'll see how to do it in this tutorial. NOTE: My website has been updated since this tutorial, so please choose a different image to download other than my logo when you try this tutorial. Or, download my logo, but please be aware that the path has changed. You can easily find the new path by right-clicking the image you want to download in your browser, going to "view image" or the equivalent, and noting the path.
Perl is great for moving files around, deleting files, renaming them, etc. Let's take a look at how we can check if a whole batch of files exist, learning to use arrays along the way.
In this tutorial we'll look at how to read text files in Perl. We'll also start looking at regular expressions! Do not be afraid -- soon, with a little practice, regular expressions will be like unto a brother.
Let's take a look at the important topic of writing text files. After all, perl is great for creating text files, whether you want to record what's going on, scrape stuff from the Internet or just reformat existing text files. We'll also take a look at how we can basically do a "find and replace" with a simple perl regex (regular expression) command.
The "." wildcard in Perl is the most useful wildcard and can stand for any letter or symbol. Even if you only learn this one regular expression special character, you'll be able to do a whole lot more than if you didn't know it.
Groups are just a way of finding out what your regular expression wildcards actually matched. Which can be very handy.
Quantifiers enable you to say how many of a given character or wildcard you want to match. They can either match as much as possible, or just enough to gel with the rest of your regular expression.
Escape sequences let you match particular special characters or entire classes of characters, like all numbers or all alphanumeric characters and so on.
Numeric quantifiers let you specify exactly how many of a given character or sequence you want to match. You can even specify a range.
It's time for a test of your regex and Perl knowledge. Watch out -- there's a trick question near the end. It's so tricky, I even tricked myself.
More on Reading Files Line By Line: Tips, Tricks and Vital Knowledge
The split function lets you split a string on some delimiter into an array of strings, which is just incredibly useful for reading stuff like comma or tab separated files.
The opposite of "split" is "join"; joining strings in an array together in one line. This is handy for creating SQL statements and also for debugging arrays; but even handier for debugging is the Data::Dumper, which can display any data structure, no matter how complex.
We need to deal with the new line at the end of every line; get rid of it somehow. And how about the spaces which are all too often found hanging around commas in CSV files? Fortunately there's an easy answer to all this.
Using the important "push" subroutine to add data to arrays.
We can create "multidimensional" array in Perl -- arrays where you need more than one index or "dimension" to specify the location of an element. This will help us a lot if we want to store data from a spreadsheet or CSV file for example.
Hashes are the other complex data type in Perl, along with arrays. They are basically lookup tables or "maps", and are unbelievably useful.
How to iterate over hashes.
Arrays of hashes might sound like going a bit far, but it'll enable us to build up very powerful and useful data structures.
We can use an array of hashes to hold CSV data (or any kind of database data) in a very convenient manner.
It's easy and useful to put in a few checks to make sure your CSV data is in order.
Perl makes it very easy to clean up bad data; in fact, it's one of the things that Perl is best at.
Test your knowledge so far with this little challenge ....
Web Scraping and More Regular Expressions
Web scraping "by hand" - using regular expressions - is a great way to improve your Perl and learn a few useful tricks. So let's take a look at how to write a web scraper!
Character classes enable you to specify sets of characters that can be matched, or not matched. They're extremely useful.
Often you want to apply a regular expression repeatedly to the same text; for instance, to extract all images from an HTML page. There are several ways to do this; we'll look at a memory-efficient way here. Note: I've changed my website since this lecture, so where I talk about matching stuff on my site, that's no longer relevant. You can try matching something else on my site or some other site though; the stuff in the video is just an example.
If you're working with small text documents, you can collect all your matches altogether at once. This involves using a match expression in an array context; something not possible in many languages, but possible in Perl.
Building a Complete Progam: Command Line Options
Often you'll want the end user (even if it's you) to be able to run your Perl script with various options. Let's look at how to retrieve options that are specified on the command line.
We've already seen an example of a subroutine in Perl; the "main" subroutine that I like to define in my programs (although it's not obligatory, and doesn't have to be called "main" either). Let's look at defining subroutines in general, and returning values from them.
How to create multi-line strings and comments in Perl. Alas, this is a bit cumbersome -- at least in Perl 5.*, which we're looking at here. But you can do it, and it's very useful.
Passing subroutine arguments works a bit differently in Perl to other languages. Last time I looked, the official Perl documentation said that it was "simplicity itself" (if I remember correctly). This is an overstatement, unless you think that differential calculus is also simplicity itself. However, once you get your head around it, it's really OK. Hopefully this tutorial will help.
We've already seen references to hashes briefly, but it's worth going over them again. It's much easier and more efficient to pass a reference to a hash to a subroutine than the actual hash itself, in the same way that it's easier to give someone your address than to physically take them there.
Finally, for this section, we can now look at checking values in hashes. This will enable us to check the command line options we got earlier.
Parsing XML and Complex Data Structures
Perl's great for working with files, and makes it easy to list all the files in a directory.
No new Perl concepts in this tutorial, but we'll look at putting some structure in place to process files one at a time.
In this tutorial we'll look at how to parse XML using regular expression alone; this is a really good technique for working with massive files or what you just want to pick a few bits out of your XML and don't want to use a heavyweight parser.
In this tutorial we'll look at the XML::Simple parser module and how to extract data from the resulting complex document object model.
In this tutorial we'll get the remaining data out of the complex data structure that resulted from parsing our XML. I highly recommend having a go at this yourself after watching the video, or even better, before! Or just enjoy the video --- but remember that you'll need to practise this stuff to be able to use it.
We've been extracting stuff from complex data structures, but now let's look at the reverse; building up complex data structures from scratch.
Working with Databases
A review of three of the world's most popular databases that you can use with Perl --- all of which come in free versions, and two of which are basically totally free.
Let's create a database to hold our data using the free MySQL Workbench too.
Perl makes working with databases a joy, most of the time. In this tutorial we'll look at how to connect to a database.
Now we can insert some data into our database using standard SQL. If you don't know SQL and want to learn, there are lots of free tutorials about and SQL is not as hard to learn as a programming language (not by a long chalk!) so don't panic.
We can issue simple SQL commands using the "do" method (a "method" is a subroutine that's part of an "object" -- the database handle object in this case). We'll use it here to clear the database before importing data into it.
If you insert a database row with an auto-increment ID, you'll need to get the ID of that row before you can then insert data that references that row. Fortunately Perl makes it easy.
Perl makes it very easy to run SQL queries. Let's take a look.
This is just a quick reminder that you can easily print data to a file in any format you desire. I was going to get into join(), but then realised it was kind of superfluous here (unless we switch to using fetchrow_arrayref).
Perl One-Liners
Perl allows you to run small one-line programs directly on a terminal or console. It turns out to be quite surprising how much you can actually get done with one line of Perl, if you miss out the bells and whistles and condense it a bit.
Perl allows us to easily read a file, loop through it and print each line, just by giving an extra option on the command line. This allows us to efficiently construct programs that, for example, replace text in a file.
We can use a one-liner to search for and replace text in a whole bunch of files, editing the actual files and making backups of the originals. All in one line!
Modules and OO Perl
Modules allow you to organise your Perl code, packaging subroutines that belong together into reusable units.
Perl modules can be placed in a directory structure, allowing you to organise them hierarchically (much like Java, for instance).
A short introduction to OO programming. If you already understand object-oriented programming, you can skip this video. Stay tuned if you don't know what it is. This is just a little introduction to objects and classes and why we need them. No perl code here; we'll look at that in the next tutorial.
Now we turn to implementing classes and objects in Perl. These techniques can really help simplify and organise your programs.
Web Application Basics
The Apache HTTP server is the world's most popular web server, and you can run it even on most laptops. Here we'll look at how to install it and configure it for use with Perl "CGI" scripts.
The simplest possible CGI web app just consists of a program that prints a content header followed by some text. We'll create such a program in this tutorial.
The CGI.pm module simplifies a lot of web-related tasks that would otherwise be tricky. Some parts of it are of dubious value, but others are definitely extremely helpful.
It's common to add parameters (name-value pairs) into a URL, which can then be used to output different stuff from your Perl web application. Here I'll show you how to get parameter values from the URL.
Let's finish this little introduction to Perl web apps with a look at processing HTML forms -- the basis of website interactivity.
Basic Sysadmin Tasks
Perl provides standard modules for many common tasks, including copy and moving files, as well as the built-in "unlike" function for deleting files.
Perl makes it easy to execute console commands on your operating system and to get whatever output is returned from those commands.
Conclusion
Conclusion, and some useful websites; in particular,
http://www.cpan.org/
http://perldoc.perl.org/
http://www.perlmonks.org/
http://stackoverflow.com/
Appendix 1: Example Data
Some example XML data.
Appendix 2: Alternate Systems
This video actually isn't about Perl as such; it's about how to run scripts in UNIX-like systems.
Extras
In this video I'll compare initializing and referencing array and hash data side-by-side, just so you have it in one place. Those brackets can be a bit of a jungle.
Let's review references to hashes and arrays side by side, along with casting.