Skip to content Skip to sidebar Skip to footer

Extracting Variables From Javascript Inside Html

I need all the lines which contains the text '.mp4'. The Html file has no tag! My code: import urllib.request import demjson url = ('https://myurl') content = urllib.request.urlope

Solution 1:

You could use BeautifulSoup to extract the <script> tag, but you would still need an alternative approach to extract the information inside.

Some Python can be used to first extract flashvars and then pass this to demjson to convert the Javascript dictionary into a Python one. For example:

import demjson

content = """<script type="text/javascript">/* <![CDATA[ */ 
... 
...
</script>"""

script_var = content.split('var flashvars = ')[1]
script_var = script_var[:script_var.find('};') + 1]
data = demjson.decode(script_var)

print(data['video_url'])
print(data['video_alt_url'])

This would then display:

https://www.ptrex.com/get_file/4/996a9088fdf801992d24457cd51469f3f7aaaee6a0/33000/33247/33247.mp4/
https://www.ptrex.com/get_file/4/774833c428771edee2cf401ef2264e746a06f9f370/33000/33247/33247_720p.mp4/

demjson is an alternative JSON decoder which can be installed via PIP

pip install demjson

Post a Comment for "Extracting Variables From Javascript Inside Html"