本地Profile

本地运行的Python代码，Profile就和Debug一样简单，具体操作是：

工具栏上，点击”Profile ‘your project’”
在Run窗口里面，导航按钮上有一个软盘的图标，是用来保存profile信息的，手动点击一下，会更新你当前的profile stats窗口数据(或者在新窗口展现)，同时在Run窗口会输出pstat文件保存的位置，例如：
Snapshot saved to /tmp/LogTransfer7.pstat
如果profile stats窗口不小心关闭了，可以通过tools->Open CProfile snapshot打开一个pstat文件，比如刚才保存的。

远程Profile

一般的做法是，在Windows上编辑代码，然后配置一个自动上传代码到Linux服务器，具体操作：

菜单”Tools” -> “Deployment” -> “Configuration”。
在Deployment弹出窗口，点击”+”添加一个SFTP的Server，点击确定。
在Connection Tab页配置SFTP连接信息，在Mappings页配置本地代码和源端代码的映射路径。
点击左上角的”Use as Default”可以将其设置为默认。
菜单”Tools” -> “Deployment” -> “Automatic Upload (always)” 可设置为保存的时候自动上传。

配置完自动上传后，配置远程执行的Python interpreter，具体操作：

菜单”File” -> “Settings” -> 点击左树”Project Interpreter” -> 右边的”Project Interpreter”输入框的后面，有配置的图标，点击弹出下拉菜单->”Add Remote”。
选择”Deployment configuration”，选择上面做自动上传的时候配置的那个配置名称即可。
如果配置成功，Path mappings输入框会显示映射路径，下面的列表会显示找到的服务器上的库版本。

异常 - 找不到模块

如果报出类似错误：

1
2
3

Traceback (most recent call last):
...
ImportError: No module named elasticsearch

可能是PyCharm远程运行和直接在服务器上运行的环境变量不一致，可以在代码最开始打印sys.path确认：

1 2	import sys print sys.path

一般，此错误可以通过添加PYTHONPATH解决，PyCharm并没有加载服务器上的PYTHONPATH环境变量，此环境变量类似于JAVA的CLASSPATH，Linux系统的LD_LIBRARY_PATH，都是用于指定依赖库的加载路径的。参考官方回答：https://intellij-support.jetbrains.com/hc/en-us/community/posts/206594065-PyCharm-remote-debugging-doesn-t-respect-PYTHONPATH-settings

配置步骤为：

打开Run/Edit Configurations菜单，展开左树Python/你的工程名字，Configuration选项卡下Python interpreter应该是选择的你的远程服务器的python命令，此选项的上面有一个Environment variables的配置，点开。
添加远程服务器上的环境变量PYTHONPATH及其值。

异常 - 路径映射不正确：

ssh://root@yourhost:22/usr/bin/python -u /root/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client '0.0.0.0' --port 33563 --file /tmp/pycharm_project_431/access.py
pydev debugger: process 16685 is connecting
 
Connected to pydev debugger (build 171.3780.115)
['/tmp/pycharm_project_431', '/root/.pycharm_helpers/pydev', '/home/fantom/party/spark/python', '/usr/lib64/python2.7/site-packages/gtk-2.0', '/usr/lib/python2.7/site-packages']
Traceback (most recent call last):
  File "/root/.pycharm_helpers/pydev/pydevd.py", line 1579, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/root/.pycharm_helpers/pydev/pydevd.py", line 1016, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
IOError: [Errno 2] No such file or directory: '/tmp/pycharm_project_431/access.py'
 
Process finished with exit code 1

那可能是你的路径并没有映射正确，参考上面”Path mappings”的地方。

分析Profile数据

这个Profile比起Perl的NYTProf来说，粒度比较粗糙，只到函数级别。

视图有两种，一种是表格的统计结果展现窗口：

一种是调用关系的展现窗口：

通过两幅图提取关键信息，影响性能最主要的几个因素：

parse_xml2json      143825      time 28.5% 整体耗时最多的
parse_frame(w)      102194  own time 20.2% 自己耗时第一的
fastkvd.kvdwrite    61      own time 14.4% 自己耗时第二的
OutInnerRule.search 30657   own time  6.1% 自己耗时第三的
ip_to_str           6786342     time  2.7% 函数调用次数太多的

对于函数本身实现比较耗时，需要分析代码，如果遇到调用了第三方库，需要考虑是否替换一直实现方法。
我在分析过程中发现，fastkvd.kvdwrite是不可拆分函数，C++实现的，但是进入C++版本发现，其内部使用了string做大量json数据的连接，我对Java的性能优化记忆犹新，String连接内部空间扩充会非常耗时，应该换成StringBuffer，C++的stl库也一样，string可以使用reserve()提前分配好内存。

通过Profile还发现，json.dumps比较耗时，于是替换为usjon，号称比较快，安装方法为：

下载 https://pypi.python.org/pypi/ujson

在编译环境上打包：

1

python setup.py install --user

会生成：

1

2

~/.local/lib/python2.7/site-packages/easy-install.pth

~/.local/lib/python2.7/site-packages/ujson-1.35-py2.7-linux-x86_64.egg

安装到运行时环境
将 ujson-1.35-py2.7-linux-x86_64.egg 复制到运行时 /home/fantom/share/Python-2.7/lib/site-packages 下，并将文件名添加进easy-install.pth：

1

echo ./ujson-1.35-py2.7-linux-x86_64.egg >> /home/fantom/share/Python-2.7/lib/site-packages/easy-install.pth

对比两者性能，发现的确快(10万条dumps测试结果要快7倍)：

1 2	2017-09-24 03:55:29.152 15347/MainThread I inserter.py:114 UJSON count 100000, cost 2.01 ms 2017-09-24 03:55:35.763 15347/MainThread I inserter.py:126 JSON count 100000, cost 14.33 ms

对于函数调用次数过多的，是否重构逻辑。
ip_to_str本身占用少，不需要重构，但是OutInnerRule.search逻辑，其实是可以加入Cache层的，如果输入参数相同，就不需要再去search了，这样一下子减少了很多调用，当然，这是内存换时间的做法。

Twotigers

[转载]使用PyCharm Profile Python性能

本地Profile

远程Profile

异常 - 找不到模块

异常 - 路径映射不正确：

分析Profile数据

公告