Documentation
Transparency in data sourcing and quality control is foundational to institutional trust. This page outlines how Signal Cat builds, validates, and delivers its datasets.
Data Sourcing Philosophy
All Signal Cat datasets are built from primary sources — direct collection from the systems and filings that generate the data. We never rely on third-party aggregators, data brokers, or re-syndicated feeds. This ensures data provenance, eliminates pollution, and gives our clients confidence in lineage.
Point-in-Time Integrity
Every observation is recorded as-of the moment of collection. We do not retroactively revise historical data. When source data changes — a job posting is edited or removed, for example — the change is recorded as a new observation, preserving the full history. This ensures your backtests reflect information that was actually available at each point in time. No look-ahead bias.
Entity Mapping
Companies are mapped to equity tickers using a combination of automated matching and manual review. We classify entities using BLS, GICS, and NAICS codes to support sector-level analysis. Ambiguous mappings are flagged for human review. Both public and private companies are covered where applicable.
Quality Assurance
Automated validation checks run on each collection cycle. Volume anomaly detection flags sudden drops or spikes for review. Schema consistency is enforced across all historical snapshots. Manual spot checks are conducted on a rotating sample to catch issues that automated systems may miss.
Infrastructure & Operational Resilience
Signal Cat's data pipelines are fully automated and designed for institutional-grade reliability. Collection runs on dedicated infrastructure with monitoring, alerting, and automated recovery for source changes and collection failures. Redundancy and retry logic ensure consistency across collection cycles. All pipelines are version-controlled and tested through 30+ dbt models with automated data quality checks before delivery. The system is built to operate independently of any single person — uptime and data freshness are not dependent on manual intervention.
Data Dictionaries
Each product includes a comprehensive data dictionary with field definitions, types, and example values. Schema previews are available on each product page.
Delivery & Integration
All Signal Cat products are available through three delivery methods:
Snowflake Data Share
Live data share — no ETL required. Query directly from your Snowflake account. Updates appear automatically at the product's refresh cadence. Setup requires only a Snowflake account and accepting the data share.
AWS S3 (Parquet/CSV)
Parquet or CSV files delivered to your S3 bucket via cross-account access. New files are written at each update cycle. Compatible with any data warehouse.
Flat File (CSV/JSON)
Downloadable CSV or JSON exports via secure link. Suitable for one-time analysis, academic research, or environments without cloud data warehouse access.
All delivery methods include schema change notifications and a full data dictionary.
MNPI & Compliance
All Signal Cat data is derived exclusively from publicly available sources. We do not source data from private or confidential channels, insider relationships, or any non-public information.
A Due Diligence Questionnaire (DDQ) is available upon request for compliance review. Contact us at data@signalcat.co for compliance-related inquiries.
For additional detail on methodology for a specific dataset, see the individual product pages or contact us directly.