未加星标

Node.js or Ruby for Scraping

字体大小 | |
[前端(javascript) 所属分类 前端(javascript) | 发布者 店小二03 | 时间 2018 | 作者 红领巾 ] 0人收藏点击收藏

I am trying to make an application that requires a lot of data scraping from multiple websites. I tried scraping websites using Ruby but gems such as Mechanize only seem to scrape static pages and not dynamic content. I have a couple questions regarding which of these languages, or any other language, I should use for this project (I am considering using Node because quite a few elements in the application have to be in real time).

Is it possible to use Ruby and/or Node to scrape dynamic content? If so which tools specifically should be used? If multiple users are going to be scraping from multiple sites, which language would you recommend using? On a slightly unrelated note, is it possible to combine Node and Rails?

Thanks in advance!

Problem courtesy of: Karan Chitnis

Solution

You can utilize the capybara gem for scraping javascript sites using ruby.

This has the advantage of being able to use actual browsers such as Firefox, Chrome and IE through the selenium driver. Or you can use headless browsers such as webkit (via capybara-webkit) or phantomjs (via poltergeist).

When you use capybara, just be sure to use a javascript enabled driver, such as selenium or capybara-webkit. My driver of the day is poltergeist.

There are some instructions for how to use capybara with remote sites in their readme .

Node vs. Ruby is a very open ended question. My answer here is suggesting Ruby because that is my experience and preference. "Combining" them could mean many things, they can be used in concert, each playing to their strengths.

Solution courtesy of: Daniel Evans

Discussion

When you say that mechanize can't scrape dynamic content, you really mean that it's a little bit more work to figure out which ajax requests need to be made and make them. The other side of that is that once you do you generally get a nice json response that's easy to deal with. Mechanize is also much faster than a full browser solution so my opinion is that it's usually worth the extra work.

As far as Node goes, there's potential and maybe once it's been around for a while some great libraries will become available, but I haven't seen anything yet that would make up for the ruby things I wiss miss.

Discussion courtesy of: pguardiario

This recipe can be found in it's original form on Stack Over Flow .

本文前端(javascript)相关术语:javascript是什么意思 javascript下载 javascript权威指南 javascript基础教程 javascript 正则表达式 javascript设计模式 javascript高级程序设计 精通javascript javascript教程

分页:12
转载请注明
本文标题:Node.js or Ruby for Scraping
本站链接:https://www.codesec.net/view/611786.html


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 前端(javascript) | 评论(0) | 阅读(12)