Hw04-Data Wrangling
- Learn regex in regexone
- Total the file in
/usr/share/dict/words
which is include at least three charactera
and not end by's
.
cat tmp | tr "[:upper:]" "[:lower:]" | awk "/^(.*?a.*?){3}.*([^'s])?/" | wc -l
Find the most frequent character in the end of the word.
cat tmp | tr "[:upper:]" "[:lower:]" | awk "/^(.*?a.*?){3}([^'s])?/" | sort -r | head -n3
Find the suffix that the end character by two and print the three most comman
cat tmp | tr "[:upper:]" "[:lower:]" | awk "/^(.*?a.*?){3}.*([^'s])?/" | sed -E "s/.*([a-z]{2})$/\1/" | sort | uniq -c | sort -nk1,1 | tail -n3
3.To do in-place substitution it is quite tempting to do something like sed s/REGEX/SUBSTITUTION/ input.txt > input.txt
.
However, the file is empty after the command.
Why?
The file named tmp2(> tmp2
) will be empty firstly before the regex was interpreted.
sed -i '' -e 's/foo/bar/g' tmp2
- Find your average, median, and max system boot time over the last ten boots. Use
journalctl
on Linux andlog show
on macOS, and look for log timestamps near the beginning and end of each boot. On Linux, they may look something like:
Logs begin at ...
and
systemd[577]: Startup finished in ...
On macOS, look for:
=== system boot:
and
Previous shutdown cause: 5
Ok, the homework is difficult to me, and I don't know why the command showlog
didn't give the boot time, and it just give the start and end time likely.
- Find an online data set like this one, this one, or maybe one from here. Fetch it using
curl
and extract out just two columns of numerical data. If you’re fetching HTML data,pup
might be helpful. For JSON data, tryjq
. Find the min and max of one column in a single command, and the difference of the sum of each column in another.