日常运维记录

一、csdn复制

javascript:document.body.contentEditable='true';document.designMode='on';  #使用ctrl +x复制

二、npm依赖问题

1、npm依赖生成
​``` sh
yum install npm #部分应用node版本有要求,需要确认
npm config set registry=https://registry.npm.taobao.org
npm i -g cnpm ;cnpm -v ;npm install -g cgr  #可选
cnpm i 或 npm i #生成依赖,正常情况下npm即可完成,生成后即可使用

常见问题
1)spawn git ENOENT #表示git没有安装
2)near heap limit Allocation failed - JavaScript heap out of memory
export NODE_OPTIONS="--max-old-space-size=4096" #根据主机配置,限制内存
3)找不到"node:path" modules #升级npm版本

4)You installed esbuild for another platform than the one you're currently using.
This won't work because esbuild is written with native code and needs to
install a platform-specific binary executable.
这个fishx的框架   是区分操作系统的   所以没法直接拿Windows的依赖用;要自己在Linux虚机里面下代码   打包编译   然后把依赖copy出来

5)cnpm certificate has expired
npm config set strict-ssl false

6)@ccmany/theme@^0.7.24' is not in the npm registry. 私有依赖
npm config set registry https://registry-cnpm.dayu.work/
npm config set @ccmany:registry=http://127.0.0.1:8081/repository/npm-public/

三、maven依赖

注意清理:"_remote.repositories"  和 ".lastUpdated",排除本地仓库本身问题

示例:a.txt内容如下(内容可以从maven提示缺失的依赖中获取)
com.alibaba.cloud:spring-cloud-starter-alibaba-nacos-discovery:jar:1.5.0.RELEASE
com.alibaba.cloud:spring-cloud-starter-alibaba-nacos-config:jar:1.5.0.RELEASE
com.alibaba.cloud:spring-cloud-starter-alibaba-nacos-discovery:jar:1.5.0.RELEASE


问题:downloading http://0.0.0.0/...
虽然指定了 -s,但是还是会访问maven安装目录内conf/settings.xml,注释掉即可

生成安装命令
for info in `cat a`;do echo $info |awk -F':' '{print "mvn install:install-file -Dfile="$2"-"$4".jar -DgroupId="$1" -DartifactId="$2" -Dversion="$4" -Dpackaging=jar"}' ;done
注意:有的是pom,不一定是jar

mvn生成依赖:
mvn dependency:copy-dependencies -DoutputDirectory=./lib -DincludeScope=runtime
outputDirectory: 指定依赖项的存储路径(绝对或相对路径)。

includeScope: 可选参数,指定依赖范围(默认包含compile、runtime)。

四、python依赖

./bin/pip3 install elasticsearch sentence_transformers  einops gunicorn flask #安装某个依赖
./bin/pip3 freeze > requirements.txt #生成依赖列表
./bin/pip3 download -r requirements.txt -d tmp #下载依赖到tmp文件下
./bin/pip3 install --no-index --find-links=/home/es/tmp -r /home/es/requirements.txt #新主机上安装依赖
 
 
ModuleNotFoundError: No module named '_ssl' 错误,
./configure --with-ssl-default-suites=openssl --enable-optimizations


五、black_exporter

1、安装black-exporter
https://github.com/prometheus/blackbox_exporter/releases/download/v0.20.0-rc.0/blackbox_exporter-0.20.0-rc.0.linux-amd64.tar.gz

2、创建主机定义配置文件
[root@master1 prometheus]# cat file_sd/file.yml
- targets:
  - "127.0.0.1:3306"
  labels:
    node: "127.0.0.1:3306"
- targets:
  - "127.0.0.1:9090"
  labels:
    node: "127.0.0.1:9090"


3、修改prometheus配置文件
  - job_name: 'blackbox_tcp_connect' # 检测某些端口是否在线
    scrape_interval: 10s
    metrics_path: /probe
    params:
      module: [tcp_connect]
    file_sd_configs:
    - files:
      - ./file_sd/*.yml
      refresh_interval: 1m
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.74.128:9115 # blackbox-exporter 服务所在的机器和端口

4、创建prometheusrules
 - alert: BlackboxProbeFailed
    expr: probe_success == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Blackbox 探测到 failed (instance {{ $labels.instance }})
      description: "Blackbox 探测到主机tomcat 8080端口异常 \n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

5、创建grafana大盘
https://grafana.com/grafana/dashboards/9965
https://grafana.com/grafana/dashboards/11543

6、告警测试
curl 'https://oapi.dingtalk.com/robot/send?access_token=xxx'  -H 'Content-Type: application/json'  -d '{"msgtype": "text","text": {"content":"我就是我, 是不一样的烟火"}}' 

六、ffmpeg

ffmpeg -i out.mp4 -vf delogo=x=40:y=10:w=360:h=120:show=0,delogo=x=1525:y=10:w=360:h=120:show=0 -c:v libx264 -c:a copy correst.mp4

ffplay -i out.mp4 -ss 57 -vf delogo=x=40:y=10:w=360:h=120:show=1,delogo=x=1525:y=10:w=360:h=120:show=1

-ss: 开始时间
-t: 持续多久
x:左右位置,0为左上
y:上下位置,0为左上

修改视频:


#!/bin/bash
file=$1
generaresult(){
echo -n >  ${file}.result
sed -i 's/[[:space:]]//g' $file
for i in `cat $file`;do 
	count=`echo $i |wc -c`
	if [[ $i == P* ]];then 
		title=$i
	fi
	
	if [ $count -ge 10 ] ;then 
		name=$i 
	fi 
	
	if [ $count -ge 4  ] && [ $count -le 6 ] && [[ $i != P* ]];then 
		time=$i
		echo "====>$title--->$name-->$time"
		echo "$name#$time" >> ${file}.result #生成文件 
	fi
done 

}
echo "============rename===="
rename() {
for m in `ls *.mp4 `;do 
	echo "==>$m"
	vtime=`ffprobe  -v error -select_streams v:0 -show_entries stream=duration -of default=noprint_wrappers=1:nokey=1 $m  |cut -d'.' -f1`
	for n in `cat ${file}.result`;do 
		fname=`echo $n |awk -F'#' '{print $1}'`
		ftime=`echo $n |awk -F'#' '{print $2}'`
		ftime1=`echo $ftime |cut -d':' -f1 |sed 's/^0//'`
		ftime2=`echo $ftime |cut -d':' -f2 |sed 's/^0//'`
		ftime3=$(($ftime1*60 + $ftime2 ))
		avg=$(($vtime - $ftime3))
		abs_num=${avg#-}
		if [ $abs_num -le 1 ] ;then 
			mv $m ${fname}.mp4
		fi 
	done 
done 
}

#generaresult
rename


七、os问题

pcre导致无法执行命令:提示缺少 libpcre.so.1
1、intramfs中有 libpcre.so.1文件,copy过去直接使用即可
2、/usr/lib/dracut/skipcpio /boot/initramfs-3.10.0-1160.el7.x86_64.img |zcat | cpio -div
3、dd of=usr/lib64/libpcre.so.1 of=/usr/lib64/libpcre.so.1


pcre导致无法执行命令:提示缺少 libpcre.so.1
1、intramfs中有 libpcre.so.1文件,copy过去直接使用即可
2、/usr/lib/dracut/skipcpio /boot/initramfs-3.10.0-1160.el7.x86_64.img |zcat | cpio -div
3、dd if=usr/lib64/libpcre.so.1 of=/usr/lib64/libpcre.so.1
dd if=usr/lib64/libz.so.1 of=/usr/lib64/libz.so.1
dd if=usr/lib64/libz.so.1.2.7 of=/usr/lib64/libz.so.1.2.7
yum reinstall zlib pcre 

zlib安装后导致的问题:
[root@localhost tmp]# rpm -qa |grep zlib
rpm: error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or directory
cp -arp  usr/lib64/libz.so.1*  /usr/lib64/

磁盘在线扩容后,df -h 空间没变化,但是磁盘大小已生效
resize2fs /dev/vdb 

cron 任务不执行
Nov 14 11:10:01 ops2 crond[1203048]: (root) PAM ERROR (Module is unknown)
Nov 14 11:10:01 ops2 crond[1203048]: (root) FAILED to authorize user with PAM (Module is unknown)


/etc/cron.allow 添加root
修改 /etc/pam.d/cron
session    sufficient  pam_loginuid.so

日志:/var/log/secure,/var/log/cron 
pam_tally.so 改为:pam_faillock.so

#!/bin/bash
for ip in 10.40.168.224 10.40.168.226 10.40.168.85;
do echo "====>$ip"
ssh root@$ip "date"
done 

八、终端留痕

echo '/usr/local/script.sh' >> /etc/profile 

cat >> /usr/local/script.sh  << E
#!/bin/bash
LOGPATH="/usr/local/script/"
TRM=\$(echo `tty` |sed 's/\///g') #user terminal
U=\$(whoami) #user 
T=\$(date +%Y%m%d_%H%M%S) #time 
FILE="\${U}_\${TRM}_\${T}"
script -t\${LOGPATH}\${FILE}.time -f \${LOGPATH}\${FILE}.log
E

mkdir -p /usr/local/script
chmod 777 -R /usr/local/script
chmod +x /usr/local/script.sh 

回放文件:(注意这里如果用户停顿了多久,在回放的时候也会停顿)
scriptreplay -t $时间文件, -f $命令文件 

九、java

java查看提供的hsf接口
1、解压war包,进入lib目录
2、开发确认是哪个jar包提供的功能,解压war包,包中看具体的路径,有对应的class文件即为存在
3、在日志中找,关键字匹配
4、看服务启动时间,在startup后还是前


https://blog.51cto.com/hmtk520/2067043 #java内存问题

查看class:jar -tf #查看class文件


替换war包中单个内容:
[root@localhost tmp]# jar tvf a.war |grep wdk.properties 
  9044 Thu Mar 27 15:20:18 CST 2025 WEB-INF/classes/wdk.properties
[root@localhost tmp]# jar xvf a.war WEB-INF/classes/wdk.properties 
 inflated: WEB-INF/classes/wdk.properties
[root@localhost tmp]# ls -l
total 318048
-rw-r--r-- 1 root root 325676004 Mar 27 16:09 jsdptest-eap.war
drwxr-xr-x 3 root root      4096 Mar 27 16:11 WEB-INF
[root@localhost tmp]# vi WEB-INF/classes/wdk.properties #修改文件内容
[root@localhost tmp]# jar -uvf a.war WEB-INF/classes/wdk.properties 


十、oracle问题

oracle会自动启动服务
后续改密码你们用 connect sys/密码@127.0.0.1/pdborcl as sysdba登陆,不指定数据库实例登陆的是oracle的默认库

sqlplus / as sysdba #登陆方式1
connect sys/密码@127.0.0.1/pdborcl as sysdba

sqlplus USER/PASSWD@localhost:port/SCHEMA

ora-01033:
	sqlplus / as sysdba
	startup mount #提示链接未关闭
	shutdown immediate #关闭连接
	startup mount #启动正常
	alter database open; #注意需要分号,如果提示错误ORA-01589 ,则打开数据库必须使用RESETLOGS或NORESETLOGS选项;
	startup #启动
	
	提示“ invalid username/password; logon denied”
	alter user master_oper identified by master_pwd;
	
	参考:https://blog.csdn.net/beihuanlihe130/article/details/108149030
	
	ORA-28000: the account is locked
	alter user MASTER_OPER account unlock;

十一、db2重启

重启思路:1)关闭链接 ;2)停止实例(会自动停止数据库)
启动思路:1)先启动实例;2)再启动数据库(一般是随着实例自动启动)

注: 一切操作,先切换用户 db2inst1

1、停止步骤
db2 list applications #查看活跃链接,重启会丢失链接,确认后再操作
db2 force application all #关闭链接
db2 list applications #确认已经没有链接

2、停止实例
db2stop #停止实例
db2start #启动实例

3、激活数据库
db2 list active databases #查看活跃的数据库是否包含目标数据库,则启动,如果已经启动则不需要再执行启动动作
#db2 activate database $数据库 #启动实例

十二、kafka日志清理

#确保kafka已经关闭,如果磁盘完全打满,可以先手动清理部分日志,否则kafka跑不起来
ps -ef |grep kafka

#修改配置文件
cd /data/kafka/kafka_2.12-2.3.0/
vim config/server.properties #增加下面3个选项清理,分别是保留时间,自动清理,清理延迟执行时间,
log.retention.hours=1
log.cleaner.enable=true
log.cleaner.delete.retention.ms=500

#启动,遇到启动失败报zookeeper连接问题,需要先启动zook
./bin/kafka-server-start.sh -daemon ./config/server.properties
tail -f logs/kafkaServer.out
#查看是否成功设置有删除标记
find / -name *.deleted

十三、阿里云slb默认类型

k8s默认创建的slb类型为公网类型

https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/configure-an-ingress-controller-to-use-an-internal-facing-slb-instance?spm=a2c4g.11186623.help-menu-85222.d_2_3_4_1_5.6b212529HdP1Sc&scm=20140722.H_151506._.OR_help-T_cn~zh-V_1#e2ac121dab926

默认创建网络类型为


apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-lb-intranet
  namespace: kube-system
  labels:
    app: nginx-ingress-lb
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: intranet # 指明负载均衡实例地址类型为私网类型。
spec:
  type: LoadBalancer
  externalTrafficPolicy: "Cluster"
  ports:
  - port: 80
    name: http
    targetPort: 80
  - port: 443
    name: https
    targetPort: 443
  selector:
    app: ingress-nginx

十四、数据库常用sql

1、dump语句
mysqldump --all-databases --single-transaction --master-data=2 --triggers --routines --host=127.0.0.1 --port=3306 --user=root  > dbpaas-metadb.sql 

2、磁盘空间
1)数据文件磁盘空间占用高。
	参数 genery_log
	SELECT table_schema AS '数据库', table_name,SUM(data_length + index_length + data_free)/1024/1024 AS "表大小MB",SUM(DATA_FREE)/1024/1024 AS "碎片大小MB"
FROM information_schema.TABLES
WHERE table_name='general_log'
	
	TRUNCATE TABLE mysql.general_log;
	

	SELECT file_name, concat(TOTAL_EXTENTS,'M') as 'FIle_size' FROM INFORMATION_SCHEMA.FILES order by TOTAL_EXTENTS DESC
	
2)日志文件磁盘空间占用高。 在没有正确设置日志备份策略时,可能会由于大事务SQL导致日志增长较快
	
3)临时文件磁盘空间占用高。
	通常导致临时文件占用高的原因是由于查询语句的排序、分组、关联表产生的临时表文件,或者大事务未提交前产生的日志缓存文件。
	show processlist;
	explain select * from alarm group by created_on order by default;
	
4)系统文件磁盘空间占用高。
	系统文件过大主要是由于undo文件过大。当存在对InnoDB表长时间不结束的查询语句,而且在查询过程中表有大量的数据变化时,系统会生成大量的undo信息,占用大量存储空间。


3、sql信息统计
https://cloud.tencent.com/developer/article/1776619
#分别统计每个分库大小
select table_schema,
concat(round(sum(data_length/1024/1024/1024),2),'GB') as data_length,
concat(round(sum(index_length/1024/1024/1024),2),'GB') as index_length,
concat(round(sum(data_free/1024/1024/1024),2),'GB') as data_free
from tables GROUP by table_schema ;

#查看单个表大小
select table_schema,
concat(round(sum(data_length/1024/1024/1024),2),'GB') as data_length,
concat(round(sum(index_length/1024/1024/1024),2),'GB') as index_length,
concat(round(sum(data_free/1024/1024/1024),2),'GB') as data_free
from tables where information_schema.table_schema=’DB_Name’ and information_schema.table_name=’Table_Name’;

#单个库内所有表大小排序(按照data_free排序)
select
table_schema as '数据库',
table_name as '表名',
table_rows as '记录数',
truncate(data_length/1024/1024, 2) as '数据容量(MB)',
truncate(index_length/1024/1024, 2) as '索引容量(MB)',
truncate(data_free/1024/1024, 2) as '碎片容量(MB)'
from information_schema.tables where table_schema='DB_NAME' 
order by data_free desc, data_length desc limit 10;




posted @ 2025-04-22 10:04  MT_IT  阅读(19)  评论(0)    收藏  举报