How to use the parseDocument function from htmlparser2

Find comprehensive JavaScript htmlparser2.parseDocument code examples handpicked from public code repositorys.

htmlparser2.parseDocument is a function in Node.js that parses a given HTML document into a DOM-like tree structure.

1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
    `Input length ${html.length} is above allowed limit of ${maxInputLength}. Truncating without ellipsis.`
  );
  html = html.substring(0, maxInputLength);
}

const document = htmlparser2.parseDocument(html, { decodeEntities: options.decodeEntities });
const bases = findBaseElements(document.children);
const builder = new BlockTextBuilder(options, picker, metadata);
walk(bases, builder);
return builder.toString();
fork icon0
star icon0
watch icon1

+ 51 other calls in file

How does htmlparser2.parseDocument work?

htmlparser2.parseDocument is a method in the htmlparser2 library that parses an HTML document string and creates a parse tree of nodes that represents the document's structure, including the HTML tags, attributes, and text content.

During the parsing process, htmlparser2 emits various events based on the type of the node encountered, and the user can listen to these events to customize the parsing process or manipulate the parse tree.

Once parsing is complete, the user can inspect the resulting parse tree to extract information or perform operations on the document.

Ai Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
const htmlparser = require("htmlparser2");

const html = " First paragraph Second paragraph ";

const handler = new htmlparser.DomHandler((err, dom) => {
  if (err) {
    console.error(err);
  } else {
    const paragraphs = dom.filter((node) => node.name === "p");
    const textContents = paragraphs.map((p) => p.children[0].data);
    console.log(textContents); // ["First paragraph", "Second paragraph"]
  }
});

const parser = new htmlparser.Parser(handler);
parser.parseComplete(html);

This example uses htmlparser2 to parse the HTML document string html, and creates a DomHandler instance to collect the parsed HTML nodes into a DOM tree structure. It then filters the p elements from the DOM tree, and extracts their text content into an array using the map function. Finally, it logs the resulting array of text contents to the console.