elasticsearch 索引文档

文档上传

$content = file_get_contents($request->file('file'));
$base64content = base64_encode($content);

$client = ClientBuilder::create()
    ->setHosts(config("es.hosts"))->setSSLVerification(false)
    ->setApiKey(config("es.api"))
    ->build();

$param = [
    'index' => 'docwrite',
    'type' => '_doc',
    'body' => [
        "category" => $request->input("category")??"",
        "name" => $request->file('file')->getClientOriginalName(),
        "doc_type" => $request->file('file')->getClientOriginalExtension(),
        "content" => $base64content
    ],
    'pipeline' => 'attachment'

];
$response = $client->index($param);

OCR

安装Tesseract

brew install tesseract tesseract-lang

下载fscrawler，执行一个不存在的人物会提示新建任务：

bin/fscrawler test01

然后打开对应的配置文件，配置示例

---
name: "test01"
fs:
  url: "/Users/yourname/Desktop/tmp"
  update_rate: "1m"
  includes:
    - "*.pdf"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: false
  lang_detect: false
  continue_on_error: true
  ocr:
    language: "chi_sim"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
    - url: "https://127.0.0.1:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  ssl_verification: false

  username: "elastic"
  password: "password"

启动任务(重新执行，一次就结束)

bin/fscrawler test01 --loop 1 --restart

菜单

elasticsearch 索引文档

文档上传

OCR

评论

win11 终端打开慢

解决下载文件夹变为“Downloads”

Windows 11系统终极优化指南

Windows 11 解除本地账户与 Microsoft 账户的绑定

mysql8 group by 报错

docker镜像下载

Go设置GOPROXY国内加速

vscode 插件下载与离线安装

解决M1/M2芯片Mac无法控制外接显示器音量和亮度问题

边缘手势开源软件