Day 28: awk Basics Complete & /Regex/

I finished covering the basics of awk. I covered a few example problems. I will continue to build on my awk skills as I go but for now it’s all about Lecture 4 exercises.

TLDR;

Okay, so here are the highlights of what I did:

  • I finished writing some baseline notes for awk. I covered things like the syntax, examples, built-in variables, etc.
  • Completed a great regular expressions interactive tutorial.
  • I started looking into other programming languages like RPerl, and C. I have never really learned those three and so far in the course, a lot of the programs have mentioned them. It might be worth it to look into them for a bit just for the exposure. the C language is definitely on my list but R and Perl have started to peak my interest. I have to stay focused and complete things one at a time for now lol.

My awk Syntax Notes:

Selecting Column Data

Each line of text from an input file given to the awk program is split into fields by the FS (field-separator). By default the FS is a “white space”. The FS acts as a delimiter for each field. Each field is then stored in an $n variables based on it’s column position. e.g. $1 selects the first field in a line of input text. $0 selects all the fields from the line i.e. selects the entire line.

Note:

  • Delimited -> to have fixed boundaries or limits (adj.)
  • Delimit -> to determine the limit or boundary of (verb)
  • Limited -> restricted in size, amount, or extent (adj.)
  • Limit -> to set or serve as a cap or restriction. (verb)

Built-in Variables in awk

awk has many built-in variables that we have access to:

  • NR (Number of Records) keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file.
  • NF keeps a count of the number of fields within the current input record.
  • FS contains the field separator character which is used to divide fields on the input line. The default is “white space”, meaning space and tab characters. FS can be reassigned to another character (typically in BEGIN) to change the field separator.
  • RS(record separator) stores the current record separator character. Since, by default, an input line is the input record, the default record separator character is a newline.
  • OFS (output field separator) stores the output field separator, which separates the fields when Awk prints them. The default is a blank space. Whenever print has several parameters separated with commas, it will print the value of OFS in between each parameter.
  • ORS (output record separator) stores the output record separator, which separates the output lines when Awk prints them. The default is a newline character. print automatically outputs the contents of ORS at the end of whatever it is given to print.

Keep in mind that awk programs are usually run once for each line (input record). So when we use NR or NF we are dealing with the number of records up to and including the current line (record) and the number of fields found in the current line (record).

References:

Conclusion

That’s all for today. If you are interested in the MIT course you can check out the video lecture I’m currently going through. The lecture is helpful but isn’t sufficient by itself. Anyways, until next time PEACE!