How does this happen?
Let’s start with what a job board is and how it operates. Job boards make a profit by promoting employers’ job postings on their website. Generally, the sites are free to use for job seekers, who flood the boards by the thousands to shoot off applications as easily and broadly as possible. The more open jobs these sites appear to offer, the more applicants they’ll attract, and a larger applicant pool will, in turn, secure more investment from hiring managers.
This feedback loop generates unintended consequences; when it comes to data pollution, three misbegotten offspring of the job boards skip to the front of the line: fraud, expired listings and duplication.
There’s no bigger fan of the low barrier to post on job boards than the con artist. Protected by the apparent legitimacy of good company–Fortune 100 companies post to job boards, most reputable companies do–fraudsters spin out B.S. postings to phish for personal information, credit card and social security numbers, any crumbs of data that might lead to a successful theft from job seekers. Forbes reports that job fraud is on the rise, with 14 million people exposed to job scams in the first quarter of 2022 alone. These jobs obviously do not correspond to the real job market, but when providers sell job board data, they don’t separate the wheat from the chaff. Millions of “jobs” that never had any intention of leading to a hire are thus incorporated into the data, giving the impression that job listings don’t actually correspond to hires.
If not a bigger threat to job seekers themselves, duplicate jobs are definitely the more pervasive threat to data quality at large–there are just so many of them. Millions of jobs are aggregated from other job boards and then syndicated to a network of other job sites and/or reposted elsewhere to drive traffic and satisfy the marketing priorities of hiring managers. This rapid multiplication of a single job into many duplicate listings gluts up the data with false signals.
Another flaw in the job ad model is the duration of listings on job boards. Companies looking to advertise an opening often purchase ad space on job boards over a given term, say 30 days. Even if the job is filled on day 7, the listing continues to appear on boards for the remaining three weeks of the term. These expired jobs form yet another thorn in the side of accurate, timely data.
In a series of posts over the next few weeks, we’ll walk you through these sources of data pollution at greater depth. To give you a comprehensive view of the problem we’ll explain how fake and duplicated jobs are generated and by whom, how these warped elements disrupt accurate analysis, the kinds of misguided conclusions they often point to, and–crucially–how to detect and avoid the job board data trap.
It’s difficult enough marshaling actionable predictions about the ever-shifting job market when the dataset is pristine. Doing it with corrupt data is an absolute shot in the dark. Better job data means better predictions. That’s why LinkUp never sources from Job Boards. It pulls directly from over 60,000 company websites, all over the country and around the globe, every day.