Next.js 中负责任的 Markdown | CSS-Tricks

Markdown 确实是一种很棒的格式。它非常接近纯文本，所以任何人都可以快速学习它，而且它的结构足够好，可以被解析并最终转换为任何你想要的格式。

话虽如此：解析、处理、增强和转换 Markdown 需要代码。在客户端交付所有这些代码会带来成本。它本身并不大，但它仍然是几十 KB 的代码，仅用于处理 Markdown，而不是其他任何东西。

在本文中，我将解释如何在 Next.js 应用程序中使用 Unified/Remark 生态系统（我真的不知道该用哪个名字，这太令人困惑了）将 Markdown 保持在客户端之外。

总体思路

这个想法是只在 Next.js 的 getStaticProps 函数中使用 Markdown，这样在构建期间（或者如果使用 Vercel 的增量构建，则在 Next 无服务器函数中）就会完成，但永远不会在客户端完成。我想 getServerSideProps 也可以，但我认为 getStaticProps 更可能是常见用例。

这将返回一个 AST（抽象语法树，也就是说一个描述我们内容的大型嵌套对象），该对象是解析和处理 Markdown 内容的结果，而客户端只负责将该 AST 渲染成 React 组件。

我想我们甚至可以在 getStaticProps 中直接将 Markdown 渲染为 HTML 并返回它以使用 dangerouslySetInnerHtml 渲染，但我们不是那种人。安全问题。而且，用我们自己的组件以我们想要的方式渲染 Markdown 的灵活性，而不是以纯 HTML 形式渲染。认真的朋友们，不要那样做。 😅

export const getStaticProps = async () => {
  // Get the Markdown content from somewhere, like a CMS or whatnot. It doesn’t
  // matter for the sake of this article, really. It could also be read from a
  // file.
  const markdown = await getMarkdownContentFromSomewhere()
  const ast = parseMarkdown(markdown)

  return { props: { ast } }
}

const Page = props => {
  // This would usually have your layout and whatnot as well, but omitted here
  // for sake of simplicity of course.
  return <MarkdownRenderer ast={props.ast} />
}

export default Page

解析 Markdown

我们将使用 Unified/Remark 生态系统。我们需要安装 unified 和 remark-parse，仅此而已。解析 Markdown 本身比较简单

import { unified } from 'unified'
import markdown from 'remark-parse'

const parseMarkdown = content => unified().use(markdown).parse(content)

export default parseMarkdown

现在，我花了很长时间才明白为什么我的额外插件，比如 remark-prism 或 remark-slug，不能像这样工作。这是因为 Unified 的 .parse(..) 方法不会使用插件处理 AST。正如它的名字所暗示的那样，它只是将 Markdown 内容字符串解析成一个树。

如果我们希望 Unified 应用我们的插件，我们需要 Unified 经历他们称之为“运行”阶段。通常，这通过使用 .process(..) 方法而不是 .parse(..) 方法来完成。不幸的是，.process(..) 不仅解析 Markdown 并应用插件，还会将 AST 字符串化成另一种格式（比如通过 remark-html 生成 HTML，或者通过 remark-react 生成 JSX）。这不是我们想要的，因为我们希望保留 AST，但在它被插件处理之后。

| ........................ process ........................... |
| .......... parse ... | ... run ... | ... stringify ..........|

          +--------+                     +----------+
Input ->- | Parser | ->- Syntax Tree ->- | Compiler | ->- Output
          +--------+          |          +----------+
                              X
                              |
                       +--------------+
                       | Transformers |
                       +--------------+

因此，我们需要做的就是运行解析和运行阶段，但不是字符串化阶段。 Unified 没有提供一种方法来执行这三个阶段中的两个，但它为每个阶段提供了单独的方法，因此我们可以手动完成它

import { unified } from 'unified'
import markdown from 'remark-parse'
import prism from 'remark-prism'

const parseMarkdown = content => {
  const engine = unified().use(markdown).use(prism)
  const ast = engine.parse(content)

  // Unified‘s *process* contains 3 distinct phases: parsing, running and
  // stringifying. We do not want to go through the stringifying phase, since we
  // want to preserve an AST, so we cannot call `.process(..)`. Calling
  // `.parse(..)` is not enough though as plugins (so Prism) are executed during
  // the running phase. So we need to manually call the run phase (synchronously
  // for simplicity).
  // See: https://github.com/unifiedjs/unified#description
  return engine.runSync(ast)
}

瞧！我们将 Markdown 解析成了一个语法树。然后，我们在该树上运行了我们的插件（为了简单起见，这里同步完成，但您可以使用 .run(..) 来异步完成它）。但是，我们没有将我们的树转换为 HTML 或 JSX 等其他语法。我们可以自己在渲染中完成。

渲染 Markdown

现在我们已经准备好了我们酷炫的树，我们可以按照我们想要的方式渲染它。让我们有一个 MarkdownRenderer 组件，它接收树作为 ast 属性，并使用 React 组件渲染它。

const getComponent = node => {
  switch (node.type) {
    case 'root':
      return ({ children }) => <>{children}</>

    case 'paragraph':
      return ({ children }) => <p>{children}</p>

    case 'emphasis':
      return ({ children }) => <em>{children}</em>

    case 'heading':
      return ({ children, depth = 2 }) => {
        const Heading = `h${depth}`
        return <Heading>{children}</Heading>
      }

    case 'text':
      return ({ value }) => <>{value}</>

    /* Handle all types here … */

    default:
      console.log('Unhandled node type', node)
      return ({ children }) => <>{children}</>
  }
}

const Node = node => {
  const Component = getComponent(node)
  const { children } = node

  return children ? (
    <Component {...node}>
      {children.map((child, index) => (
        <Node key={index} {...child} />
      ))}
    </Component>
  ) : (
    <Component {...node} />
  )
}

const MarkdownRenderer = props => <Node {...props.ast} />

export default React.memo(MarkdownRenderer)

我们渲染器的大部分逻辑都在 Node 组件中。它根据 AST 节点的 type 键（这是我们的 getComponent 方法处理每种类型的节点）来确定要渲染的内容，然后渲染它。如果该节点有子节点，它会递归进入子节点；否则，它只是将组件作为最终叶子渲染。

清理树

根据我们使用的 Remark 插件，在尝试渲染页面时，我们可能会遇到以下问题

错误：错误序列化 .content[0].content.children[3].data.hChildren[0].data.hChildren[0].data.hChildren[0].data.hChildren[0].data.hName 从 “/” 中的 getStaticProps 返回。原因：undefined 无法序列化为 JSON。请使用 null 或省略此值。

这是因为我们的 AST 包含键，其值为 undefined，而 undefined 不是可以安全地序列化为 JSON 的东西。 Next 给出了解决方案：要么完全省略该值，要么如果我们需要它，则用 null 替换它。

但是，我们不会手工修复每条路径，因此我们需要递归遍历该 AST 并清理它。我发现，当使用 remark-prism（一个用于启用代码块语法高亮的插件）时，会出现这种情况。该插件确实在节点中添加了 [data] 对象.

我们可以做的是在返回 AST 之前遍历它以清理这些节点

const cleanNode = node => {
  if (node.value === undefined) delete node.value
  if (node.tagName === undefined) delete node.tagName
  if (node.data) {
    delete node.data.hName
    delete node.data.hChildren
    delete node.data.hProperties
  }

  if (node.children) node.children.forEach(cleanNode)

  return node
}

const parseMarkdown = content => {
  const engine = unified().use(markdown).use(prism)
  const ast = engine.parse(content)
  const processedAst = engine.runSync(parsed)

  cleanNode(processedAst)

  return processedAst
}

最后，我们可以做的一件事是删除 position 对象，它存在于每个节点上，并保存 Markdown 字符串中的原始位置。它不是一个大对象（它只有两个键），但是当树变得很大时，它会迅速累加。

const cleanNode = node => {
  delete node.position

总结

就是这样了！我们设法将 Markdown 处理限制在构建/服务器端代码中，因此我们没有将 Markdown 运行时发送到浏览器，这是不必要的成本。我们将一个数据树传递给客户端，我们可以遍历它并将其转换为任何我们想要的 React 组件。

希望这对您有所帮助。 :)

Titus

# 2021 年 8 月 15 日

您好！ remark/unified 的维护者在这里！我看到了关于使用哪个术语的问题，所以想试着解释一下。很明显，这让人困惑，但对于任何感兴趣的人来说，这将是解释。

unified 是指所有这一切背后的东西：parse、run、stringify 接口。它也是用户用来称呼所有事物的名称（通常称为统一集合）。

remark 是 Markdown 生态系统：因此，如果您有在 Markdown AST 上工作的插件，那就是 remark。

在许多情况下，您还在处理 HTML，这被称为 rehype。

还有一些其他附加的 AST 生态系统，自然语言、javascript、xml，以及其他名称。

因此，如果您从 Markdown 开始，可以使用 remark-parse 和其他 remark 插件。
如果您从 HTML 开始，可以使用 rehype-parse 和 rehype 插件。
您可以到此为止，并使用 remark-stringify/rehype-stringify。
或者，您可以从一种转换为另一种，使用 remark-rehype 或 rehype-remark。并使用其他生态系统的插件！

示例：https://github.com/remarkjs/remark-rehype#use

Damon Blais

# 2021 年 8 月 31 日

因此… 遵循本指南并不像想象中那么简单。在最新版本中，有一些东西并没有完全奏效。

首先，在 parseMarkdown 函数中，如果我使用 runSync，它不起作用。如果我将其转换为 async 函数并使用 run(ast)，TypeScript 会发出非常响亮的抱怨，但结果至少有效。

Argument of type 'import("./node_modules/@types/mdast/index").Root' is not assignable to parameter of type 'import("./node_modules/rehype-format/node_modules/@types/hast/index").Root'.
  Types of property 'children' are incompatible.
    Type 'Content[]' is not assignable to type 'RootContent[]'.
      Type 'Content' is not assignable to type 'RootContent'.
        Type 'Paragraph' is not assignable to type 'RootContent'.
          Property 'value' is missing in type 'Paragraph' but required in type 'Text'.ts(2345)
index.d.ts(75, 5): 'value' is declared here.

我甚至不打算谈论其他类型是如何未指定的（并非所有人都使用 TypeScript，所以这并不是本指南的错误）。

对于那些想要类型的人来说，这是我想到的

type Node = {
  properties: { [key: string]: string }
  tagName?: string
  type: string
  value?: any
}

type keyable = {
  key: Key | null | undefined
}

const getComponent = (node?: NodeType) => {
  if (!node || !node.type) return null

  { ... }
}

// I tried typing node, it's a huge pain, I gave up.
const Node = (node: any) => { ... }

const MarkdownRenderer = ({ ast }: { ast: any }) => <Node {...ast} />

这也提醒了我，getComponent 需要清理 Node，然后再返回 Fragment，否则使用 React 严格模式的东西就会尖叫。基本上，用以下内容替换结尾处的返回（然后将每个返回 Fragment 的 switch case 替换为 break）。

  // erorr: Fragment only accepts 'key' props
  if (node.tagName != undefined) delete node.tagName
  if (node.type != undefined) delete node.type
  if (node.value != undefined) delete node.value
  return Fragment

现在我们已经解决了所有这些问题，让我们谈谈getComponent 的预期返回值是什么：一个 ReactElement 函数/类构造函数。

为什么这很重要？

返回字符串（例如 ‘a’）实际上并不会做它应该做的事情。相反，它渲染一个空的 a 标签，没有任何属性。 因此，我们需要对支持的标签使用正确的构造函数。

我不得不将 switch case 替换为以下内容

  switch (node.type) {
    case '': // the root node is {} on load
    case 'comment':
      return null // don't render comments

    case 'root': // explodes without named root
      // eslint-disable-next-line no-case-declarations
      const root: FC = ({ children }) => <Fragment>{children}</Fragment>
      return root

    case 'text': // all Nodes without a tagName end up being of type 'text' not 'paragraph' -- I wonder what parser you were using that uses 'paragraph' ?
      return function text() {
        // expected text is located in node.value
        return <Fragment>{node.value}</Fragment>
      }

    // and now we come to HTML elements
    case 'element':
      // only render whitelisted elements
      switch (node.tagName) {
        case 'a':      return a
        case 'h1':     return h1
        case 'h2':     return h2
        case 'h3':     return h3
        case 'h4':     return h4
        case 'h5':     return h5
        case 'h6':     return h6
        case 'li':     return li
        case 'ol':     return ol
        case 'ul':     return ul
        case 'code':   return code
        case 'p':      return p
        case 'pre':    return pre
        case 'strong': return strong

        default:
          console.log('unhandled html tag', node)
      }

      console.log('removed unsafe HTML tag', node)
      return null

    default:
      console.log('unhandled node tag', node)
  }

这些组件本身呢？

  const h1: FC = ({ children }) => {
    return <h1>{children}</h1>
  }
  const h2: FC = ({ children }) => {
    return <h2>{children}</h2>
  }
  const h3: FC = ({ children }) => {
    return <h3>{children}</h3>
  }
  const h4: FC = ({ children }) => {
    return <h4>{children}</h4>
  }
  const h5: FC = ({ children }) => {
    return <h5>{children}</h5>
  }
  const h6: FC = ({ children }) => {
    return <h6>{children}</h6>
  }

  const li: FC<keyable> = ({ key, children }) => {
    return <li key={key}>{children}</li>
  }
  const ol: FC = ({ children }) => {
    return <ol>{children}</ol>
  }
  const ul: FC = ({ children }) => {
    return <ul>{children}</ul>
  }

  const p: FC = ({ children }) => {
    return <p>{children}</p>
  }

  const strong: FC = ({ children }) => {
    return <strong>{children}</strong>
  }

  // note: you need to define `a` inside `getComponent` so it can use the `node` variable from the parent context
  const a: FC = ({ children }) => {
    const classes = []

    // allow attr 'class': string
    if (typeof node.properties?.class === 'string') {
      classes.push(node.properties.class)
    } 

    // allow attr 'className': string
    if (typeof node.properties?.className === 'string') {
      classes.push(node.properties.className)
    }

    // allow attr 'className': string[]
    if (node.properties?.className?.length) {
      classes.push(...node.properties.className)
    }

    // NOTE: You only need to do this for Gatsby, NextJS and other PWA, SSR or SSG frameworks, or React routers that have their own Link component.
    return (
      <Link href={node.properties?.href}>
        <a className={classes.join(' ') || undefined}>{children}</a>
      </Link>
    )
  }

  // likewise, if you're using prism or another syntax highlighting plugin, you'll need to allow the code and pre tags to have a className. You need to define this inside `getComponent` for it to access node.
  const code: FC = ({ children }) => {
    return <code>{children}</code>
  }
  const pre: FC = ({ children }) => {
    return <code>{children}</code>
  }

Kitty Giraudel

评论永久链接# 2021 年 8 月 31 日
您好 Damon，感谢您抽出时间留下评论。内容很多，请允许我逐一说明。
- 这篇文章最初是为 unified v9 编写的，我忘了提，抱歉。这就是导致 runSync 失败的原因。我刚在 v9 中尝试了一下，工作正常，所以 Unified API 必须在 v10 中发生了变化（这正是主要版本的目的，所以我想很公平）。我更新了文章，提到 Unified 应该安装在 v9 中。
- 我个人没有使用 TypeScript，而且从未使用过，所以我在这方面能做的不多。正如您所说，并非每个人都使用 TypeScript。抱歉您遇到了问题。
- 关于 Fragments，您完全正确。我更新了代码以使用 ({ children }) => <>{children}</>，因此没有道具传递给片段。这样，就不需要像您建议的那样在渲染之前移动清理操作，而且它也变得不受 AST 字段添加的影响。
- 返回诸如 p 或 em 之类的字符串作为 getComponent 的一部分工作正常（刚刚测试过）。但是，对于链接来说，它行不通，因为它们从 AST 中接收一个 url 键，需要渲染 href 属性。我想使用一个适当的组件定义会更安全一些，所以我相应地更新了文章。正如前面提到的，所有类型都需要实现，因为代码片段只显示了几个。我还添加了对 text 类型的处理，以提高清晰度。
再次感谢您的反馈！我希望这篇文章现在更清晰了。:)
Kitty Giraudel

评论永久链接# 2021 年 9 月 5 日

回来补充一下我最近发现的：似乎只要将导入更新为使用命名导入而不是默认导入（import { unified } from 'unified'），unified v10 也能正常工作。
Charlie

评论永久链接# 2022 年 8 月 12 日
Typescript 提示：节点类型位于 mdast（remark 的依赖项）中。
```
import {
  Content as ContentAST,
  Root as RootAST,
  Heading as HeadingAST,
  Text as TextAST,
  List as ListAST,
} from 'mdast';

type NodeAST = RootAST | ContentAST;
```
我为所有类型创建了别名，这样它们就不会与我使用的 UI 库冲突。

Will

# 2021 年 9 月 6 日

我喜欢这篇文章，我认为用 Markdown 创作但将 AST 发送到客户端的方法很有趣。

我最近自己也一直在尝试这个。我首先尝试渲染成 HTML，看看大小如何。不过，我还在渲染数学公式，但我做的两个测试导致 markdown 文件分别增加了 18 倍和 35 倍。我只能猜测 AST 也是类似的，因为这些树可能非常大。我认为这是一种在网络大小与运行时解析成本之间的权衡。目前我选择了运行时解析，因为我认为它可以减少我需要拆分管道的需求。

Mosaad

# 2022 年 5 月 13 日

这篇文章非常详细！

帮助我在类似的情况下实现了类似的功能，我接收的是 HTML 而不是 Markdown，但仍然希望用自定义组件（如 Nextjs 的 Image 和 Link）替换一些元素。

总体思路

解析 Markdown

渲染 Markdown

清理树

总结

评论

发表评论 取消回复

发表评论取消回复