Our Data

Extraordinary Lengths

Private company intelligence requires a unique approach to data. Given the thousands of data sources and varying degree of quality, SourceScrub built a system that combines advanced data acquisition technologies with a human approach to data quality. The result is the highest quality data set to be found.











4-Dimensional Data

Our approach to data starts with a deep understanding our our customers' goals. We built a 4 dimensional data model to help customers quickly find, research and connect with privately held companies. This purpose-built approach gives users exactly what they need to be successful.


The company dimension captures core details on the company such as employee count, revenue, job postings and website ranking.


Sources capture where companies show up on the web.  This includes buyer's guides, best-of lists, conference attendance, industry associations.

(added Q1 2020)

The people dimension captures contact details and professional background of the people associated with a company.

(coming Q4 2020)

The Investor dimension captures information on the investors behind the companies. This includes transaction details, portfolio companies and deal history.

9 core signals

While there are hundreds of signals to choose from, we've built unique data process  around 9 core signals. These signals are surfaced in our web platform as filters and pivots and will accelerate your "time to insight".

Trended Data

Historical data capture

A unique advantage of the SourceScrub data set is that we've consistently captured data points over time. Customers can see trended employee counts, job postings, website traffic and other key signals.

Our data process

We've designed our system to optimize for Private Company Intelligence. A set of systems, processes and a team of over 450 analysts work around the clock to ensure maximum accuracy and precision. This helps improve M&A pipeline management as well as establishes a private equity deal sourcing platform.

Data Collection

Privately-held company data is challenging to source and digitize. Crawlers and researchers work methodically across 50,000 web sources to find and ingest data into the SourceScrub database. Our data operations team is organized by data dimension to ensure a deep knowledge and understanding of the data types.

Structuring the Data

With a complex data set across 4 dimensions and hundreds of data fields, normalizing and structuring the data is critical to get your data model right. Structuring the data ensures users can create the right connections across companies and across verticals. We use the dimensional data model to create linkages across data sources: company, sources, investors and people.

Data Quality Operations

SourceScrub has built a world-class Data Operations team of over 450 data analysts who work 24/7 to normalize, edit and QA our data. It's the combination of web technologies with human editorial which gives SourceScrub a unique advantage.  Some of the quality processes we have in place include:

Hand written company descriptions which ensures accurate understanding of the company as well as richer search and discovery.

Cross-referencing critical data points such as employee counts to ensure the most accurate representation of the data.

Outlier data QA - Data from disparate sources often doesn't make sense. Our human QA process cleans data in a way that machine learning cannot.

Delivering Data to Customers

Once our data is collected, organized and QAed, we give customers access to it in the way that most makes sense. From data exports, to a web interface to API access SourceScrub delivers the data the way you need it.

Transform your
Deal Team

Discover more opportunities and drive more prospect engagement. Level up your deal team with the most accurate data set available.

Let’s talk