解决hexo搭建博客批量修改图片路径问题.md

2024/7/15 hexo__bug解决博客 hexo__bug 已解决bug 该文总阅读量次

解决hexo搭建博客批量修改图片路径问题

背景

我不想用图床，但是一个一个修改处理图片路径太麻烦了，有这时间我都可以去搞个图床了，因此我想用python帮我批量处理一下，同时还有初始化，废话不多说，直接上代码

解决方案

import os
import re
import shutil
from datetime import datetime

# 定义当前目录和 images 目录
current_directory = os.getcwd()
md_directory = os.path.join(current_directory, "source", "_posts").replace("\\", '/')
local_images_directory = os.path.join(md_directory, "images").replace('\\', '/')
github_images_directory = os.path.join(current_directory, "source", "images").replace('\\', '/')
backup_directory = os.path.join(current_directory, "source", "_posts_backup").replace('\\', '/')

print("保存图片路径：", local_images_directory)
print("Markdown 文件路径：", md_directory)
print("备份路径：", backup_directory)

# 如果 images 目录不存在，则创建
if not os.path.exists(local_images_directory):
    os.makedirs(local_images_directory)

# 如果备份目录不存在，则创建
if not os.path.exists(backup_directory):
    os.makedirs(backup_directory)

# 若github_images_directory目录存在，则删除并重新创建
if os.path.exists(github_images_directory):
    shutil.rmtree(github_images_directory)

if not os.path.exists(github_images_directory):
    os.mkdir(github_images_directory)

# 生成以当前时间命名的备份文件夹
current_time_str = datetime.now().strftime("%Y%m%d%H%M%S")
backup_subdir = os.path.join(backup_directory, current_time_str, "_posts")
# 备份 _posts 目录到以当前时间命名的文件夹
shutil.copytree(md_directory, backup_subdir)
print(f"已备份 _posts 目录到 {backup_subdir}")

# 获取当前日期和时间
current_datetime = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# 定义模板字符串
template = '''---
title: {title}
date: {date}
pinned : 0
tags: 
  - {tags}
  - {tags}
categories: {categories}
description: |
    {title}
---
'''

# 定义匹配图片路径的正则表达式
image_pattern = re.compile(r'!\[(.*?)\]\((.*?)\)')
# 初始化一个集合用于存储需要复制的图片文件名
images_to_copy = set()

def InitTitle_and_UpdateImageurl():
    # 遍历md_directory目录下的所有 Markdown 文件
    for root, _, files in os.walk(md_directory):
        for file in files:
            if file.endswith(".md"):
                print("正在处理文件: ", file)
                file_path = os.path.join(root, file)
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()

                # 检查文件是否包含模板
                if not content.startswith('---'):
                    # 添加模板
                    title = os.path.splitext(file)[0]
                    new_content = template.format(
                        title=title,
                        date=current_datetime,
                        tags="杂",
                        categories="杂",
                        description=title
                    ) + content
                else:
                    new_content = content

                # 查找并替换本地图片路径
                def replace_image_path(match):
                    alt_text = match.group(1)
                    old_path = match.group(2)
                    filename = os.path.basename(old_path)
                    if old_path.startswith('http'):
                        # 网络图片保持原样
                        return match.group(0)
                    elif '../images' not in old_path:
                        try:
                            shutil.copy2(old_path, local_images_directory)
                            # images / image - 20240715163648965.png
                            md_image_path = f'![{alt_text}](images/{filename})'
                            return md_image_path
                        except FileNotFoundError:
                            # print(f"Local image not found: {old_path}")
                            return match.group(0)
                        except shutil.SameFileError:
                            md_image_path = f'![{alt_text}](images/{filename})'
                            print(md_image_path)
                            return md_image_path
                    else:
                        return match.group(0)
                updated_content = re.sub(image_pattern, replace_image_path, new_content)

                image_paths = re.findall(r'!\[.*?\]\((.*?)\)', updated_content)
                for image_path in image_paths:
                    # 如果图片路径不是网络路径，加入集合
                    image_path = image_path.replace('images/', '')
                    if not image_path.startswith('http'):
                        # 处理可能的相对路径
                        storage_image_path = os.path.join(local_images_directory, image_path)
                        copy_image_path = os.path.join(github_images_directory, image_path)
                        images_to_copy.add((storage_image_path, copy_image_path))
                # 将修改后的内容写回文件
                with open(file_path, 'w', encoding='utf-8') as f:
                    f.write(updated_content)

                # print("1:",os.path.exists(github_images_directory))
    #
    # # 收集 GitHub 目录中的所有图片文件名
    # github_images = set(os.listdir(github_images_directory))
    #
    # print(os.path.exists(github_images_directory))
    #
    # 将本地访问图片路径复制到 GitHub 访问图片路径
    for storage_image_path, copy_image_path in images_to_copy:
        try:
            shutil.copy2(storage_image_path, copy_image_path)
            print(f"复制 {os.path.basename(copy_image_path)} 于 {github_images_directory}")
        except FileNotFoundError:
            print(f"Image not found: {storage_image_path}")
        except Exception as e:
            print(f"Error copying image {storage_image_path}: {e}")
    #     filename = os.path.basename(copy_image_path)
    #
    #     if filename in github_images:
    #         print(f"图片 {filename} 已有，跳过")
    #         # 移除已有图片名以便于之后删除不再使用的图片
    #         github_images.remove(filename)
    #     else:
    #         try:
    #             shutil.copy2(storage_image_path, copy_image_path)
    #             print(f"复制 {filename} 于 {github_images_directory}")
    #         except FileNotFoundError:
    #             print(f"Image not found: {storage_image_path}")
    #         except Exception as e:
    #             print(f"Error copying image {storage_image_path}: {e}")
    #
    # # 删除 GitHub 目录中未在 images_to_copy 集合中的图片
    # for unused_image in github_images:
    #     unused_image_path = os.path.join(github_images_directory, unused_image)
    #     try:
    #         os.remove(unused_image_path)
    #         print(f"Deleted unused image {unused_image} from GitHub directory.")
    #     except Exception as e:
    #         print(f"Error deleting image {unused_image}: {e}")


def Rename_md(index_len = 3, sort_desc = True):
    # 定义提取日期的正则表达式
    date_pattern = re.compile(r'date: (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})')

    # 获取所有 Markdown 文件
    md_files = [file for file in os.listdir(md_directory) if file.endswith(".md")]

    # 函数以提取日期并根据需要返回适合排序的值
    def extract_date(file_path, sort_desc = True):
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()
        match = date_pattern.search(content)
        if match:
            date = datetime.strptime(match.group(1), '%Y-%m-%d %H:%M:%S')
        else:
            date = datetime.now()  # 如果没有日期，则使用当前日期
        return -date.timestamp() if sort_desc else date.timestamp()

    # 用户指定排序方向，True 为降序，False 为升序
      # 修改这里来控制排序方向
    # 对文件进行排序
    md_files.sort(key=lambda file: extract_date(os.path.join(md_directory, file), sort_desc))

    # 遍历排序后的文件，并重命名
    for index, file in enumerate(md_files, start=1):
        file_path = os.path.join(md_directory, file)

        # 检查文件名是否已经有序号
        match = re.match(r'^(\d+)-', file)
        match_len = len(match.group(0)) if match else 0
        if re.match(r'^(\d+)-', file):
            # 如果已有序号，则去掉原有的序号
            new_file_name = f"{index:0{index_len}d}-{file[match_len:]}"  # 假设原序号长度为三位数字加连字符
        else:
            # 如果没有序号，直接添加新序号
            new_file_name = f"{index:0{index_len}d}-{file}"
        print(new_file_name)
        # 消除空格
        new_file_name = new_file_name.replace(' ', '')
        new_file_path = os.path.join(md_directory, new_file_name)
        os.rename(file_path, new_file_path)  # 重命名文件


def Rename_by_date():
    # 定义提取日期的正则表达式
    date_pattern = re.compile(r'date: (\d{4})-(\d{2})-(\d{2}) \d{2}:\d{2}:\d{2}')
    # 定义检查文件名中日期前缀的正则表达式
    file_date_pattern = re.compile(r'^\d{2}-\d{2}-\d{2}-')
    # 获取所有 Markdown 文件
    md_files = [file for file in os.listdir(md_directory) if file.endswith(".md")]
    # 遍历每个 Markdown 文件
    for file in md_files:
        file_path = os.path.join(md_directory, file)
        with open(file_path, 'r', encoding='utf-8') as f:
            content = f.read()
        # 提取文件内容中的日期
        match_content = date_pattern.search(content)
        if match_content:
            year, month, day = match_content.groups()
            # 构建新文件名前缀
            date_prefix = f"{year[2:]}-{month}-{day}"
        else:
            # 如果没有日期，则使用当前日期
            date_prefix = datetime.now().strftime("%y-%m-%d")
        # 如果文件名已经包含日期前缀同时与md文件内容中的日期一致，则跳过重命名,若不一致则删除日期前缀
        match_name = file_date_pattern.match(file)
        if match_name == match_content:
            print(f"跳过文件（已有日期前缀）: {file}")
            continue
        elif match_name:
            file = file.replace(match_name.group(0), '')
        # 构建新文件名
        new_file_name = f"{date_prefix}-{file}"
        # 消除文件名中的空格
        new_file_name = new_file_name.replace(' ', '')
        # 构建新文件路径
        new_file_path = os.path.join(md_directory, new_file_name)
        # 重命名文件
        os.rename(file_path, new_file_path)
        print(f"重命名文件: {file} -> {new_file_name}")


if __name__ == "__main__":
    InitTitle_and_UpdateImageurl()
    # index_len为序号长度例如3为001，4为0001, sort_desc为排序，默认为True降序
    # 已废弃，发现新增之后原博客路径不对，故改为时间戳为名称
    # Rename_md(index_len=3 , sort_desc = False)
    Rename_by_date()

gpt对上面代码的解释，我觉得反正比我写的好：

1.`InitTitle_and_UpdateImageurl` 函数

这个函数负责初始化Markdown文件的标题和更新文件中的图片路径。

功能：
- 遍历指定目录下的所有Markdown文件。
- 检查每个文件是否包含YAML头部模板，如果不存在则添加一个标准模板。
- 检查并更新文件中的本地图片路径，确保图片链接指向正确的位置，并处理网络图片和本地图片的不同情况。
- 复制图片到GitHub可访问的目录，支持在线预览。
图片处理：
- 如果图片是本地存储的，会尝试将图片复制到一个统一的目录下，并更新Markdown文件中的引用路径。
- 对于网络图片，保持原有的URL不变。
- 使用正则表达式来匹配Markdown中的图片链接。
异常处理：
- 处理文件找不到的情况和尝试复制相同文件的问题（shutil.SameFileError）。

2. `Rename_md` 函数（废弃）

由于排序重命名后是001-文件名称、002-文件名称…这样的格式，在插入的时候非常不友好，会把顺序改了，同时之前给别人的链接就失效了，所以我改为了Rename_by_date函数

这个函数负责根据文件中的日期信息对Markdown文件进行排序并重命名，支持按日期升序或降序。

功能：
- 提取每个Markdown文件的日期信息，并根据这些日期对文件进行排序。
- 根据排序结果，重命名文件，文件名包括一个序号，这个序号根据排序结果动态生成。
- 序号的长度可以动态指定，支持未来的扩展（例如，从三位数到四位数）。
日期提取：
- 使用正则表达式从Markdown文件的YAML头部提取日期。
- 如果没有找到日期，则假定为当前日期。

当然还有备份功能，在_posts同级目录下会创建一个_posts_backup的备份文件夹，每启动运行一次脚本就会获取当前时间戳同时以该时间戳命名，再把_posts备份到这里

3.`Rename_by_date` 函数

这个函数负责根据文件中的日期信息对Markdown文件进行排序并重命名，直接将文件名称加上时间前缀，避免Rename_md函数修改的时候前缀改变。

最后还是改为用这个函数了，改完这个函数的效果是这样的：

LOADING

解决hexo搭建博客批量修改图片路径问题.md

解决hexo搭建博客批量修改图片路径问题

背景

解决方案

1.InitTitle_and_UpdateImageurl 函数

2. Rename_md 函数（废弃）

3.Rename_by_date 函数

1.`InitTitle_and_UpdateImageurl` 函数

2. `Rename_md` 函数（废弃）

3.`Rename_by_date` 函数