Today was all about Lecture 4 from the MIT Missing Semester Course. The topic was data manipulation and handling log files. It was a great opportunity for me to learn more about the grep
, awk
, and sed
programs.
TLDR;
Okay, so here are the highlights of what I did:
- I continued to play around with some of the styles on the local version of my blog site’s theme while using Sass. I have a few features that I want to add but I haven’t gone through the proper design process. I just wanted to improve my setup for now.
- I finished watching Lecture 4 and I read through the lecture notes . Based on what was covered in the lecture I thought I should take the time to review and write my own notes on the highlighted programs:
grep
awk
sed
less
Lecture 4 Notes so far
Lecture 4 – Data Wrangling
This lecture is a part of the MIT Missing Semester Course. It covers the general concept of manipulating data acquired from one file to output more desirable/focused information. The techniques and tools covered in this lecture are often utilizing when filtering through log files from a server.
Efficiency with ssh
Because we are remotely accessing another computer we want to try and be efficient with the processes we run and the data we pull tour local machine. Ideally we want to limit the amount of data we download to just what we need. For example:
# In this command sequence we are downloading the entire log file and then performing all of the data manipulation on our local computer. This is bad
ssh myserver journalctl | grep sshd | grep "Disconnected from" | less
# In this sequence, by using quotes ('') we can perform all of the data manipulation on the remote server and only download the data we want from the log file.
ssh myserver 'journalctl | grep sshd | grep "Disconnected from"' | less
We could also save the data to a local .log
file after the remote server finished manipulating the data as we commanded it to.
Other Tools Mentioned
There were other tools mentioned but were not the focus in the lecture. For example:
paste
- The
R
programming language for statistics (I think) - The
st
program journalctl
to get log files from a server I think…
References
Conclusion
That’s all for today. If you are interested in the MIT course you can check out the video lecture I’m currently going through. The lecture is helpful but isn’t sufficient by itself. Anyways, until next time PEACE!