PHP在linux读取word文档

作者: 淡蓝海域 分类: linux,php 发布时间: 2020-11-05 03:15

#wget http://www.winfield.demon.nl/linux/antiword-0.37.tar.gz
#tar zxvf antiword-0.37.tar.gz
#cd antiword-0.37
#make
#make install

antiword
cp /root/bin/*antiword /usr/local/bin/
mkdir /usr/share/antiword
cp -R /root/.antiword/* /usr/share/antiword/
chmod 777 /usr/local/bin/*antiword
chmod 755 /usr/share/antiword/*

安装完成之后,如果要在web上查看的话,需要使用root执行 make global_install

1
2
3
4
5
6
7
8
9
10
11
12
<?php 
header("Content-type: text/html; charset=utf-8"); 
  
  
$filename 'test.doc'
#$content = shell_exec('/usr/local/bin/antiword '.$filename); 
$content = shell_exec('antiword -mUTF-8 '.$filename);  
  
  
echo '<pre>'
print_r ($content); 
echo '</pre>'

shell上测试使用

  1. /usr/local/bin/antiword  你的word文档
  2. #如果中文码乱,再加上编码
  3. /usr/local/bin/antiword   -w 0 -m UTF-8.txt  你的word文档
  4. #需要注意,word文档内容太少会提示: I’m afraid the text stream of this file is too small to handle.

php上使用

  1. $filename = ‘你的word文档’;
  2. $content = shell_exec(‘/usr/local/bin/antiword -w 0 -m UTF-8.txt ‘.$filename);
  3. //将串中所有可能的全角符转为半角符
  4. //全角
  5. $DBC = Array(
  6.             ‘0’ , ‘1’ , ‘2’ , ‘3’ , ‘4’ ,
  7.             ‘5’ , ‘6’ , ‘7’ , ‘8’ , ‘9’ ,
  8.             ‘A’ , ‘B’ , ‘C’ , ‘D’ , ‘E’ ,
  9.             ‘F’ , ‘G’ , ‘H’ , ‘I’ , ‘J’ ,
  10.             ‘K’ , ‘L’ , ‘M’ , ‘N’ , ‘O’ ,
  11.             ‘P’ , ‘Q’ , ‘R’ , ‘S’ , ‘T’ ,
  12.             ‘U’ , ‘V’ , ‘W’ , ‘X’ , ‘Y’ ,
  13.             ‘Z’ , ‘a’ , ‘b’ , ‘c’ , ‘d’ ,
  14.             ‘e’ , ‘f’ , ‘g’ , ‘h’ , ‘i’ ,
  15.             ‘j’ , ‘k’ , ‘l’ , ‘m’ , ‘n’ ,
  16.             ‘o’ , ‘p’ , ‘q’ , ‘r’ , ‘s’ ,
  17.             ‘t’ , ‘u’ , ‘v’ , ‘w’ , ‘x’ ,
  18.             ‘y’ , ‘z’ , ‘-’ , ‘ ’ , ‘:’ ,
  19.             ‘.’ , ‘,’ , ‘/’ , ‘%’ , ‘#’ ,
  20.             ‘!’ , ‘@’ , ‘&’ , ‘(’ , ‘)’ ,
  21.             ‘<’ , ‘>’ , ‘"’ , ‘'’ , ‘?’ ,
  22.             ‘[’ , ‘]’ , ‘{’ , ‘}’ , ‘\’ ,
  23.             ‘|’ , ‘+’ , ‘=’ , ‘_’ , ‘^’ ,
  24.             ‘¥’ , ‘ ̄’ , ‘`’
  25.         );
  26. // 半角
  27. $SBC = Array(
  28.         ‘0’, ‘1’, ‘2’, ‘3’, ‘4’,
  29.         ‘5’, ‘6’, ‘7’, ‘8’, ‘9’,
  30.         ‘A’, ‘B’, ‘C’, ‘D’, ‘E’,
  31.         ‘F’, ‘G’, ‘H’, ‘I’, ‘J’,
  32.         ‘K’, ‘L’, ‘M’, ‘N’, ‘O’,
  33.         ‘P’, ‘Q’, ‘R’, ‘S’, ‘T’,
  34.         ‘U’, ‘V’, ‘W’, ‘X’, ‘Y’,
  35.         ‘Z’, ‘a’, ‘b’, ‘c’, ‘d’,
  36.         ‘e’, ‘f’, ‘g’, ‘h’, ‘i’,
  37.         ‘j’, ‘k’, ‘l’, ‘m’, ‘n’,
  38.         ‘o’, ‘p’, ‘q’, ‘r’, ‘s’,
  39.         ‘t’, ‘u’, ‘v’, ‘w’, ‘x’,
  40.         ‘y’, ‘z’, ‘-‘, ‘ ‘, ‘:’,
  41.         ‘.’, ‘,’, ‘/’, ‘%’, ‘#’,
  42.         ‘!’, ‘@’, ‘&’, ‘(‘, ‘)’,
  43.         ‘<‘, ‘>’, ‘”‘, ‘\”,‘?’,
  44.         ‘[‘, ‘]’, ‘{‘, ‘}’, ‘\\’,
  45.         ‘|’, ‘+’, ‘=’, ‘_’, ‘^’,
  46.         ‘$’, ‘~’, ‘`’
  47.     );
  48. $content = str_replace($DBC, $SBC, $str);  // 全角到半角
  49. var_dump($content);

如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!