博客园与Hexo同步发布的方法

新博客地址:https://gyrojeff.top,欢迎访问! 本文为博客自动同步文章,为了更好的阅读体验,建议您移步至我的博客👇

本文标题:博客园与Hexo同步发布的方法

文章作者:gyro永不抽风

发布时间:2020年09月15日 - 00:09

最后更新:2020年09月19日 - 00:09

原始链接:http://hexo.gyrojeff.moe/2020/09/15/%E5%8D%9A%E5%AE%A2%E5%9B%AD%E4%B8%8EHexo%E5%90%8C%E6%AD%A5%E5%8F%91%E5%B8%83%E7%9A%84%E6%96%B9%E6%B3%95/

许可协议: 署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 转载请保留原文链接及作者!

介绍

之前想到:博客一次写作,多地发布,于是边有了这个项目:选择性同步Hexo与博客园。

GitHub链接:https://github.com/JeffersonQin/hexo-cnblogs-sync

魔改Next模板,增加标题栏提醒

themes/next/layout/_macro/post.swig中,选择合适的地方加入:

1
2
3
4
5
6
7
8
9
{% if post.cnblogs %}
<br>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="fas fa-rss"></i>
</span>
<span class="post-meta-item-text" id="cnblogs_sync_text"><a href="https://www.cnblogs.com/jeffersonqin/">博客园</a> 同步已开启</span>
</span>
{% endif %}

这样的话,如果要同步,只需要控制文件头就可以了,类似于本文:

1
2
3
4
5
6
7
8
9
10
11
12
13
---
title: 博客园与Hexo同步发布的方法
date: 2020-09-15 00:31:57
tags:
- Hexo
- Blog
- cnblogs
- python
- html
- css
categories: Hexo
cnblogs: true
---

BS4微调Hexo生成的HTML

先扫描一遍public文件夹里的输出文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
orig_dir = "../public/"
repo_dir = "../public_cnblogs/"
ignore_files = ["../public/index.html"]
ignore_dirs = ["../public/page/", "../public/archives"]
# Filtering the files needed to post
for root, dirs, files in os.walk(orig_dir):
for file in files:
try:
file_name = str(file)
file_path = os.path.join(root, file)
file_dir = root.replace('\\', '/')
file_path = file_path.replace('\\', '/')

flag = True

for ignore_dir in ignore_dirs:
if file_dir.startswith(ignore_dir):
flag = False
break
for ignore_file in ignore_files:
if file_path == ignore_file:
flag = False
break

if (file_name == 'index.html' and flag):
index_files.append(file_path)
print('\033[42m\033[37m[LOGG]\033[0m File Detected:', file_path)

except Exception as e:
raise e

print('\033[44m\033[37m[INFO]\033[0m File Detection Ended.')

然后使用BS4来操作HTML和CSS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Filter the articles that are to be synced

resource_dict = {}

for index_file in index_files:
post_body = ""
with open(index_file, 'r', encoding='utf-8') as f:
html_doc = f.read()
soup = BeautifulSoup(html_doc, "html.parser")
check_msg = soup.select('span[id=cnblogs_sync_text]')
if (len(check_msg) == 0): continue
post_body_list = soup.select('div[class=post-body]')
if (len(post_body_list) == 0): continue
print('\033[42m\033[37m[LOGG]\033[0m Target Detected:', index_file)
for child in soup.descendants:
if (child.name == 'img'):
if ('data-original' in child.attrs and 'src' in child.attrs):
child['src'] = 'https://gyrojeff.moe' + child['data-original']
elif ('src' in child.attrs):
child['src'] = 'https://gyrojeff.moe' + child['src']
if (child.name == 'a'):
if ('href' in child.attrs):
if (str(child['href']).startswith('/') and not str(child['href']).startswith('//')):
child['href'] = 'https://gyrojeff.moe' + child['href']
post_body = soup.select('div[class=post-body]')[0]
math_blocks = post_body.select('script[type="math/tex; mode=display"]')
for math_block in math_blocks:
math_string = str(math_block).replace('<script type="math/tex; mode=display">', '<p>$$').replace('</script>', '\n$$</p>')
math_block.replace_with(BeautifulSoup(math_string, 'html.parser'))
save_dir = os.path.join(repo_dir, index_file[len(orig_dir):-len("index.html")])
if not os.path.exists(save_dir): os.makedirs(save_dir)
copyright_div = str(soup.select('div[class=my_post_copyright]')[0])
with open(save_dir + 'index.html', 'w', encoding='utf-8') as f:
f.write(header + '\n' + copyright_div + '\n' + str(post_body))
tags = soup.select('div[class=post-tags]')
tags_text = []
if len(tags) != 0:
tags_div = tags[0].select('a')
if len(tags_div) > 0:
for tag in tags_div:
tags_text.append(tag.contents[1][1:])

resource_dict[save_dir + 'index.html'] = {
'tags': tags_text,
'title': soup.select('meta[property="og:title"]')[0]['content']
}
print('\033[44m\033[37m[INFO]\033[0m File Generated:', save_dir + 'index.html')

print('\033[44m\033[37m[INFO]\033[0m File Generation Ended.')

上面的大多数代码基本上都需要具体情况具体分析,分析生成的HTML和我们需要的代码之间的关系。值得注意的是,math_blocks的那一段代码是巧妙地解决数学公式mathjax只渲染到<script>的问题的。(直接匹配再用$$$$替换,这样可以直接使用博客园的markdown进行渲染,毕竟markdown可以兼容html)这里面,header是我根据这套Next主题自己扒的,源码在我的GitHub上有(链接在本文文末)

MetaWeblog发布博文

博客园可以使用MetaWeblog接口,不过categories有点问题,不太能使用。

原作:https://github.com/1024th/cnblogs_githook

但是在删除接口方面存在问题,这里进行了更改。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
import xmlrpc.client as xmlrpclib
import json
import datetime
import time
import getpass

'''
配置字典:
type | description(example)
str | metaWeblog url, 博客设置中有('https://rpc.cnblogs.com/metaweblog/1024th')
str | appkey, Blog地址名('1024th')
str | blogid, 这个无需手动输入,通过getUsersBlogs得到
str | usr, 登录用户名
str | passwd, 登录密码
str | rootpath, 博文存放根路径(添加git管理)
'''

'''
POST:
dateTime dateCreated - Required when posting.
string description - Required when posting.
string title - Required when posting.
array of string categories (optional)
struct Enclosure enclosure (optional)
string link (optional)
string permalink (optional)
any postid (optional)
struct Source source (optional)
string userid (optional)
any mt_allow_comments (optional)
any mt_allow_pings (optional)
any mt_convert_breaks (optional)
string mt_text_more (optional)
string mt_excerpt (optional)
string mt_keywords (optional)
string wp_slug (optional)
'''

class MetaWebBlogClient():
def __init__(self, configpath):
'''
@configpath: 指定配置文件路径
'''
self._configpath = configpath
self._config = None
self._server = None
self._mwb = None

def createConfig(self):
'''
创建配置
'''
while True:
cfg = {}
for item in [("url", "MetaWebBlog URL: "),
("appkey", "博客地址名(网址的用户部分): "),
("usr", "登录用户名: ")]:
cfg[item[0]] = input("输入" + item[1])
cfg['passwd'] = getpass.getpass('输入登录密码: ')
try:
server = xmlrpclib.ServerProxy(cfg["url"])
userInfo = server.blogger.getUsersBlogs(cfg["appkey"], cfg["usr"], cfg["passwd"])
print(userInfo[0])
# {'blogid': 'xxx', 'url': 'xxx', 'blogName': 'xxx'}
cfg["blogid"] = userInfo[0]["blogid"]
break
except:
print("发生错误!")
with open(self._configpath, "w", encoding="utf-8") as f:
json.dump(cfg, f, indent=4, ensure_ascii=False)

def existConfig(self):
'''
返回配置是否存在
'''
try:
with open(self._configpath, "r", encoding="utf-8") as f:
try:
cfg = json.load(f)
if cfg == {}:
return False
else:
return True
except json.decoder.JSONDecodeError: # 文件为空
return False
except:
with open(self._configpath, "w", encoding="utf-8") as f:
json.dump({}, f)
return False

def readConfig(self):
'''
读取配置
'''
if not self.existConfig():
self.createConfig()

with open(self._configpath, "r", encoding="utf-8") as f:
self._config = json.load(f)
self._server = xmlrpclib.ServerProxy(self._config["url"])
self._mwb = self._server.metaWeblog

def getUsersBlogs(self):
'''
获取博客信息
@return: {
string blogid
string url
string blogName
}
'''
userInfo = self._server.blogger.getUsersBlogs(self._config["appkey"], self._config["usr"], self._config["passwd"])
return userInfo

def getRecentPosts(self, num):
'''
读取最近的博文信息
'''
return self._mwb.getRecentPosts(self._config["blogid"], self._config["usr"], self._config["passwd"], num)

def newPost(self, post, publish):
'''
发布新博文
@post: 发布内容
@publish: 是否公开
'''
while True:
try:
postid = self._mwb.newPost(self._config['blogid'], self._config['usr'], self._config['passwd'], post, publish)
break
except:
time.sleep(5)
return postid

def editPost(self, postid, post, publish):
'''
更新已存在的博文
@postid: 已存在博文ID
@post: 发布内容
@publish: 是否公开发布
'''
self._mwb.editPost(postid, self._config['usr'], self._config['passwd'], post, publish)

def deletePost(self, postid, publish):
'''
删除博文
'''
return self._server.blogger.deletePost(self._config['appkey'], postid, self._config['usr'], self._config['passwd'], publish)

def getCategories(self):
'''
获取博文分类
'''
return self._mwb.getCategories(self._config['blogid'], self._config['usr'], self._config['passwd'])

def getPost(self, postid):
'''
读取博文信息
@postid: 博文ID
@return: POST
'''
return self._mwb.getPost(postid, self._config['usr'], self._config['passwd'])

def newMediaObject(self, file):
'''
资源文件(图片,音频,视频...)上传
@file: {
base64 bits
string name
string type
}
@return: URL
'''
return self._mwb.newMediaObject(self._config['blogid'], self._config['usr'], self._config['passwd'], file)

def newCategory(self, categoray):
'''
新建分类
@categoray: {
string name
string slug (optional)
integer parent_id
string description (optional)
}
@return : categorayid
'''
return self._server.wp.newCategory(self._config['blogid'], self._config['usr'], self._config['passwd'], categoray)

GitPython版本管理

在生成新的html文档之前,先删除旧的(除了.git目录):

1
2
3
4
5
6
if os.path.exists(repo_dir):
for sub_dir in os.listdir(repo_dir):
if os.path.isdir(os.path.join(repo_dir, sub_dir)) and sub_dir != '.git':
shutil.rmtree(os.path.join(repo_dir, sub_dir))
if os.path.isfile(os.path.join(repo_dir, sub_dir)):
os.remove(os.path.join(repo_dir, sub_dir))

下面是封装好的gitpythonclass:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import git
import os

class RepoScanner():

def __init__(self, repopath):
self._root = repopath
try:
self._repo = git.Repo(self._root)
except:
# TODO: color log
print('\033[44m\033[37m[INFO]\033[0m Fail to open git repo at: %s' % (repopath))
while (True):
in_content = input('\033[44m\033[37m[INFO]\033[0m Try to create a new repo? [y/n]: ')
if (in_content == 'y' or in_content == 'Y'):
break
if (in_content == 'n' or in_content == 'N'):
return
try:
self._repo = git.Repo.init(path=self._root)
except Exception as e:
raise e

def getNewFiles(self):
return self._repo.untracked_files

def scan(self):
diff = [ item.a_path for item in self._repo.index.diff(None) ]
deleted = []
changed = []
for item in diff:
if not os.path.exists(os.path.join(self._root, item)):
deleted.append(item)
else: changed.append(item)
return {'new': self.getNewFiles(), 'deleted': deleted, 'changed': changed}

剩下还有一些细节:包括本地化文章ID等就不在本文内过多赘述,这里只提供大致思路,具体代码详见Github

注意

如果发现中文转码出现问题,记得运行下面这行命令:

1
git config --global core.quotepath false

GitHub

https://github.com/JeffersonQin/hexo-cnblogs-sync

posted @ 2020-09-15 09:13  gyro永不抽风  阅读(458)  评论(0编辑  收藏  举报