• 博客园logo
  • 会员
  • 众包
  • 新闻
  • 博问
  • 闪存
  • 赞助商
  • HarmonyOS
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录
jacklee404
Never Stop!
博客园    首页    新随笔    联系   管理    订阅  订阅
Hw04-Data Wrangling

Hw04-Data Wrangling

  1. Learn regex in regexone

image-20220702090448211

  1. Total the file in /usr/share/dict/words which is include at least three character a and not end by 's .
cat tmp | tr "[:upper:]" "[:lower:]" | awk "/^(.*?a.*?){3}.*([^'s])?/" | wc -l

image-20220702090740237

​ Find the most frequent character in the end of the word.

cat tmp | tr "[:upper:]" "[:lower:]" | awk "/^(.*?a.*?){3}([^'s])?/" | sort -r | head -n3

image-20220702091340247

​ Find the suffix that the end character by two and print the three most comman

cat tmp | tr "[:upper:]" "[:lower:]" | awk "/^(.*?a.*?){3}.*([^'s])?/" | sed -E "s/.*([a-z]{2})$/\1/" | sort | uniq -c | sort -nk1,1 | tail -n3

image-20220702093957445

3.To do in-place substitution it is quite tempting to do something like sed s/REGEX/SUBSTITUTION/ input.txt > input.txt.

image-20220702105759419

However, the file is empty after the command.

Why?

The file named tmp2(> tmp2) will be empty firstly before the regex was interpreted.

image-20220702103030520

sed -i '' -e 's/foo/bar/g' tmp2
  1. Find your average, median, and max system boot time over the last ten boots. Use journalctl on Linux and log show on macOS, and look for log timestamps near the beginning and end of each boot. On Linux, they may look something like:
Logs begin at ...

and

systemd[577]: Startup finished in ...

On macOS, look for:

=== system boot:

and

Previous shutdown cause: 5

​ Ok, the homework is difficult to me, and I don't know why the command showlog didn't give the boot time, and it just give the start and end time likely.

image-20220702115109224

image-20220702115132184

image-20220702115150617

  1. Find an online data set like this one, this one, or maybe one from here. Fetch it using curl and extract out just two columns of numerical data. If you’re fetching HTML data, pup might be helpful. For JSON data, try jq. Find the min and max of one column in a single command, and the difference of the sum of each column in another.
posted on 2022-07-02 12:19  Jack404  阅读(34)  评论(0)    收藏  举报
刷新页面返回顶部
博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3