Day 25: Manipulate Data with grep & sed

Today was all about Lecture 4 from the MIT Missing Semester Course. The topic was data manipulation and handling log files. It was a great opportunity for me to learn more about the grep , awk , and sed programs.

TLDR;

Okay, so here are the highlights of what I did:

  • I continued to play around with some of the styles on the local version of my blog site’s theme while using Sass. I have a few features that I want to add but I haven’t gone through the proper design process. I just wanted to improve my setup for now.
  • I finished watching Lecture 4 and I read through the lecture notes . Based on what was covered in the lecture I thought I should take the time to review and write my own notes on the highlighted programs:
    • grep
    • awk
    • sed
    • less

Lecture 4 Notes so far

Lecture 4 – Data Wrangling

This lecture is a part of the MIT Missing Semester Course. It covers the general concept of manipulating data acquired from one file to output more desirable/focused information. The techniques and tools covered in this lecture are often utilizing when filtering through log files from a server.

Efficiency with ssh

Because we are remotely accessing another computer we want to try and be efficient with the processes we run and the data we pull tour local machine. Ideally we want to limit the amount of data we download to just what we need. For example:

# In this command sequence we are downloading the entire log file and then performing all of the data manipulation on our local computer. This is bad
ssh myserver journalctl | grep sshd | grep "Disconnected from" | less

# In this sequence, by using quotes ('') we can perform all of the data manipulation on the remote server and only download the data we want from the log file.
ssh myserver 'journalctl | grep sshd | grep "Disconnected from"' | less

We could also save the data to a local .log file after the remote server finished manipulating the data as we commanded it to.

Other Tools Mentioned

There were other tools mentioned but were not the focus in the lecture. For example:

  • paste
  • The R programming language for statistics (I think)
  • The st program
  • journalctl to get log files from a server I think…

References

Conclusion

That’s all for today. If you are interested in the MIT course you can check out the video lecture I’m currently going through. The lecture is helpful but isn’t sufficient by itself. Anyways, until next time PEACE!

YouTube player