What is a spider?
A spider is an application that extracts data from a web page and manipulates it into a “more usable” format. Spiders are typically built for a specific web-site and purpose. Properly done, a spider will emulate a user’s behaviour whilst shielding the scrapers true identity through proxy-servers.
They are also known as data collectors, data extractors, web crawlers, web scrapers and web-site rippers.
How much does it cost to write and run a spider?
It depends on numerous factors, including:
- Number of fields to be scraped
- Number of pages, and different page formats
- The complexity of the data manipulation and degree of manual intervention required
- Existence of anti-spider protection mechanisms, or limitations of the rate at which the site can be scraped
- Once off or ongoing project
As a ball park indication, a very simple scrape costs around AUD $500 whilst a complicated, protected site with millions of pages can cost around AUD $5,000. Prices are discounted for multi-spider projects or long-term ongoing projects.
Fixed Price or Per-hour?
We offer both types of billing structures:
- Fixed price is typically used for simple, accurately specified projects. Job requirements not expected to change over the life of the project.
- Per-Hour billing is preferred for complex projects, or those on an agile development path where requirements adapt in response to changed circumstances.
Do you offer volume discounts?
Multi-spider projects or long-term ongoing projects obtain volume discounts.
Can I run the spiders from my own server?
Yes you can, however you may have to install additional software, and obtain access to proxy-servers.
Typical workflow for a data scraping project?
- Please Contact us with an overview of your project requirements.
- We call (or skype) you back to discuss your project and better understand your requirements.
- We will review the sites with your intentions in mind and then provide a written quote. This takes approx. 1 day.
- Sign a mutual non-disclosure agreement, if required.
- Once we receive your signed Letter of Engagement, we commence the development of the spider(s).
- Once completed, spiders are tested and reviewed. We also provide you with a sample data extract for your review and approval.
- Once approved, we run the spider, providing feedback at pre-agreed intervals.
- Complete any post-production data processing including parsing, standardisation, normalisation and de-duplication.
- Submit data to you in agreed format.
- Once deliverables approved, submit invoice, which is due for payment in 14 days
Payment mechanisms?
Payment by direct deposit into our bank account. PayPal is accepted, however we include the (somewhat expensive) PayPal fees to your invoice. Please note that we do not have credit card facilities.
What are the payment terms?
Payments are expected within 14 days from the date of invoice. If payment has not been received within 30 days, a fee of 2 % of the value of the invoice will be charged. This fee will be charged every 30 days until the invoice is paid in full.
Can I obtain the spider(s) source code?
Yes, we can provide you the spider source code at no extra charge, however you may need to install additional software, and/or obtain access to proxy-servers, before it can be used on your system.
How do I get the scraped data?
Data is provided in the agreed format either by email, via a shared DropBox folder or uploaded directly to your Amazon S3 account. This includes any extracted images, PDFs and documents where required.
Do you provide a fully managed web harvesting solution?
We provide a fully managed solution where we take care of all the entire scraping process so that you can receive freshly-harvested data, without the fuss & hassle.
Do you do provide additional data services?
Yes, we are a data focussed company that provides a full suite of data analysis, data-cleansing, data mining and data warehousing services.
ARE YOU RELATED TO THE WEB-SCRAPING GROUP?
Yes. The Web Scraping Group and the Data Scraping Group are divisions of the same Australian company, Net Assets (Australia) Pty Ltd t/a The Data Group. We have a couple of websites on the internet in order to optimise our traffic from Google. But behind the scenes, it’s the same team.
Is data scraping legal?
In certain situations data scraping is considered unethical or even illegal. Much depends on how it is done, the type of data extracted and for what purpose the extracted data will be used. Each data scraping project thus needs to be assessed on its own merits. If in doubt, please obtain your own legal advice.
We have been doing this since 2005 and have not heard of any legal problems from our clients, for their use of the scraped data.
We value client confidentiality and discretion. We also scrape websites in a highly anonymous manner that is impossible to trace back to us (and you). We can covertly get the data for you, but what happens thereafter, depends on what you do with it.
I’M CONCERNED WITH THE MISTREATMENT OF THE PEOPLE IN YOUR IMAGES
Mr Chimp, aka “The Boss”, would like to address your concerns and confirm that no humans were severely harmed in the making of this web site.