Skip to content Skip to sidebar Skip to footer

How To Search For Content In XPath In Multiline Text Using Python?

When I search for the existence of data in text() of an element using contains, it works for plain data but not when there are carriage returns, new lines/tags in the element conte

Solution 1:

Use:

//td[text()[contains(.,'Good bye')]]

Explanation:

The reason for the problem is not that a text node's string value is a multiline string -- the real reason is that the td element has more than one text-node children.

In the provided expression:

//td[contains(text(),"Good bye")]

the first argument passed to the function contains() is a node-set of more than one text nodes.

As per XPath 1.0 specification (in XPath 2.0 this simply raises a type error), a the evaluation of a function that expects a string argument but is passed a node-set instead, takes the string value only of the 1st node in the node-set.

In this specific case, the first text node of the passed node-set has string value:

 "
                 Hello world "

so the comparison fails and the wanted td element isn't selected.

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select="//td[text()[contains(.,'Good bye')]]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<table>
      <tr>
        <td>
          Hello world <i> how are you? </i>
          Have a wonderful day.
          Good bye!
        </td>
      </tr>
      <tr>
        <td>
          Hello NJ <i>, how are you?
          Have a wonderful day.</i>
        </td>
      </tr>
</table>

the XPath expression is evaluated and the selected nodes (in this case just one) are copied to the output:

<td>
          Hello world <i> how are you? </i>
          Have a wonderful day.
          Good bye!
        </td>

Solution 2:

Use . instead of text():

tdouthtml.xpath('//td[contains(.,"Good bye")]')

Post a Comment for "How To Search For Content In XPath In Multiline Text Using Python?"