One of my ever-present fears is that somebody may think ill of the scraping that I seem to be involved in. I’m constantly getting worthy requests for scrapers that will examine more commercial things than Jobcentre Plus and its successors. These I’ve largely done, the organisations do seem to be more-or-less engaged on what I’d regard as altruistic activities, even if they charge for them in some round-about way. When they run these scrapers, in production, I insist that they do it on their Internet estate, rather than this activity appear to come from
ZOIS. Nevertheless, significant amounts of traffic come from
ZOIS as I work-up and test these things. I’m paranoid that somehow I’ll acquire some undeserving reputation and encounter some dubious firewall activity as a result. These fears had been unfounded, but now Jobserve, the veteran contractor jobs database, has blocked us. Or so it would appear. They’ve not seen fit to enlighten us, don’t respond to e-mails and it only appears to be one IP address. Of course I could be wrong and it may be no more than the misconfiguration of a router, or some such, but if there are others out there with similar problems I wouldn’t mind hearing from you.
What Might Have Gotten Up Jobserve’s Nose?
Might it be the RSS pre-fetch system.
Although useful it their own right I was never really that satisfied with Jobserve’s raw key-word search. I therefore set up my own weighted system that allowed a vacancy to be scored. The system was sophisticated enough to spot the over use of key-words and score other dubious activity negatively. And the data in the original mail-list could be used as feedstock for this. Lately, however, the mail-list became unreliable and I’d taken to downloading jobs found on an RSS feed. The jobs are downloaded only once and I try to keep the impact of this on the Jobserve web-site as low as possible. The program that does the downloading is written in Perl, uses LWP and advertises itself in the User-Agent string with an invitation to get in touch with me if anybody was unhappy about this. The vacancy details went into a database, which was, in ancient times a place for all such things and was private, for it was meant to give me an edge over would-be competitors.
This database sort-of still exists. These days it’s down to just Jobserve and it concentrates on UK IT Jobs I can do. The downloads are kept in a database, so we don’t ask for the same stuff twice, and are indexed and subjected to that weighted scoring system. Those who rate highly are brought to my attention using e-mail and an internal-only web-site. It also used to allow a little trend spotting, and the example I always gave was that it indicated which Java Frame-work was the most popular and need to be learnt. And then it all got blocked.
Although I’m on a health sabbatical, I wouldn’t mind doing stuff again. This would be part-time and limited in geography. There’d be an appropriate compromise on compensation. So, when a job appears that I can do, frankly in my sleep, it gets an ‘interest’ letter to say I’d be up for it, if a little less full-time than they’d like. The silence is deafening, but it keeps my hand in and I can show that Barkis is willing, even if Clara is not.
So far investigations have been confined to scanning port ranges using nmap(8), using traceroute(8) and similar tools to lightly investigate the problem. I’ve also asked acquaintances on the Internet at large to see what they can see.
Since the beginning of the year, when we changed Internet Service Provider (ISP) we’ve been running just about everything of a single IP address. This fixed address has a sensible and believable set of names in the Domain Names System (DNS), which you can look
up using a whois(8) command. Its number is 188.8.131.52. Nmap suggests that all the addresses in Jobserver’s IP address range 184.108.40.206/24 are blocked, including port 25 on machines which advertise themselves as Mail Exchange machines, and port 80 on web-servers. The ports do not reply to SYN flagged packets, but rather simply drop the inbound connection. It would be more polite to return a SYN-RST packet, but firewalls tend not to do this, it results in the machine issuing the original SYN packet sitting in a SYN-SENT state for that connection tying up a small amount of resources. The behaviour is usually intentional, and was originally thought of as away of making large naïve scans costly.
Traceroute indicates that the failure is beyond 220.127.116.11.intl.telstra.net (18.104.22.168), which suggests that if this is indeed firewall activity then it’s most likely occurring at the edge of the Jobserve estate. All Jobserve’s Mail Exchange machines are similarly blocked, bar one, smtp.jobserve.us, which appears to be managed by Global Crossing, or at least that’s what the IP address suggests.
Use a Proxy
I can indeed get to the site through a number of proxies. A little light googling suggests that strange IP re-write rules are de-rigueur at Jobserve. For example you’re in India with an Indian IP address, you try to look at jobserve.co.uk, you get jobserve.in, or whatever it it is. Or, several ranty posts would have it. In these instances using a proxy is indicated too. I’ll leave it as an exercise for the reader to use a well-formed search term and find these things for themselves.
Since I can use proxies I can use their fairly useless UI and search, if I so wish. I’ve not checked policy for these proxies, but my LWP code will run through a proxy. In fact it used to, I’ve Squid on the box in the cellar. So, if this firewalling was stop me pre-emptively downloading jobs from an RSS feed, it was a pretty lame one. And there’s always the local wireless community.
Would You Like Some of This
Jobserve, from at least the turn of the century, has been a pretty crap place to find contracts. That I had to write some clever code to try and get around some of its short-comings says it all to me. Now that I’m only interested in short-term, part-time and pro-bono stuff then it’s time to stop. It does leave a legacy of code and data, and if you’re sufficiently interested drop me an line and I’ll share it with you. But be careful, though, you may hack Jobserve off.