php 截取GBK文档某个位置开始的n个字符

cut.php:

#!/usr/bin/php
<?php
define('INPUT_FILE', 't.txt');
define('OUTPUT_FILE', 'a.txt');
$pos = max(intval($argv[1]), 0); 
$len = max(intval($argv[2]), 0); 
$file_size = filesize(INPUT_FILE);
if($pos >= $file_size) exit;
$fp = fopen(INPUT_FILE, 'rb');
$point = 0; //current byte position
$string = ''; 
while(ftell($fp) < $file_size) {
    if($point >= $pos + $len) break;$byte = fread($fp, 1); 
    //php version >= 5.4
    $char = unpack('C', $byte)[1];
    if($char <= 0x7f) {
        //single byte
        if($point >= $pos) $string .= $byte;
        $point += 1;
        continue;
    } else {
        //double bytes
        if($point >= $pos) {
            $string .= $byte.fread($fp, 1); 
        } else {
            fseek($fp, 1, SEEK_CUR);
        }
        $point += 1;
        continue;
    }   
}
fclose($fp);
file_put_contents(OUTPUT_FILE, $string);
?>

源文件t.txt内容:

dkei20王nnso

测试命令:

./cut.php 6 1

查看结果:

hexdump -C t.txt && hexdump -C a.txt

posted @ 2012-11-29 17:56  呆头鱼  阅读(234)  评论(0编辑  收藏  举报