Extracting Variables From Javascript Inside Html
I need all the lines which contains the text '.mp4'. The Html file has no tag! My code: import urllib.request import demjson url = ('https://myurl') content = urllib.request.urlope
Solution 1:
You could use BeautifulSoup to extract the <script>
tag, but you would still need an alternative approach to extract the information inside.
Some Python can be used to first extract flashvars
and then pass this to demjson
to convert the Javascript dictionary into a Python one. For example:
import demjson
content = """<script type="text/javascript">/* <![CDATA[ */
...
...
</script>"""
script_var = content.split('var flashvars = ')[1]
script_var = script_var[:script_var.find('};') + 1]
data = demjson.decode(script_var)
print(data['video_url'])
print(data['video_alt_url'])
This would then display:
https://www.ptrex.com/get_file/4/996a9088fdf801992d24457cd51469f3f7aaaee6a0/33000/33247/33247.mp4/
https://www.ptrex.com/get_file/4/774833c428771edee2cf401ef2264e746a06f9f370/33000/33247/33247_720p.mp4/
demjson
is an alternative JSON decoder which can be installed via PIP
pip install demjson
Post a Comment for "Extracting Variables From Javascript Inside Html"