Data Comparison in Data Capturing and its Obstacles

Data quality is a paramount pillar in the digital and online marketing sphere. Hence we strive to perfection and compare systems for plausibility. So we can be sure our data captured on a website is trustworthy and we can fully rely our decisions on that information. Giving us intelligence to guide marketing campaigns.

Often the comparison between systems is complex and the outcome not plausible. Which is to different reasons and the very complex structure of dependencies in client- and server-side code. In this article we will very deeply dive into common issues and caveats to be aware of when comparing data. As we want to compare apples with apples.

What are the challenges?

Comparing online marketing and intelligence systems, such as tag managers, data capturing platforms, customer data bases, etc. is a very tricky beast. For this article we assume that the compared systems are JENTIS Data Capturing Platform and a secondary system that is not client-side based but a server or database that handles the orders or other customer-data that is accepted to be “complete”. Which by definition is not a fair comparison, as we will see further, but we must strive for the most challenging comparison for the completeness of this topic.

Generally a data capturing or tag management platform are relying on client-side JavaScript code, also JENTIS follows this specification that creates rich data while also this method has baked in caveats.

Dependencies and JavaScript Collisions

Dealing with dependencies and potential conflicts in JavaScript, especially in the context of tag management software that interacts with other third-party tools like consent management platforms, can introduce complex challenges. JavaScript environments on websites are highly dynamic and can be prone to collisions and unexpected behavior due to the inclusion of multiple scripts from different sources, managed by entirely different teams and organizations. Here are strategies to handle such scenarios effectively on your website, where JENTIS is implemented.

Be Aware of Dependencies

Essential Dependencies: By design JENTIS respects consent. It does so with native integrations with all popular Consent Management Platform solutions. This is one of many possible dependencies that are required for the data to be processed correctly. If the loading of the CMP fails or it has an error this has an impact on the data captured with JENTIS.

Handling Third-Party Script Conflicts: With how web design and JavaScript are an open and collaborative system this has the downside of possible conflicts. Each script can influence another when working with global namespace or functions. Where conflicts can be mitigated but there is no guarantee that a script always will be correctly executed.

A solution to this topic of dependencies and conflicts is a custom and independent monitoring. Tools like Sentry allow for a generic observation if the websites operation is harmed or if errors arise in certain situation. They will be monitored and website administrators can follow up on those situations. However, as Sentry is again a client-side JavaScript software, it has the same limitations as any other software on the client side (it can be blocked by ITP/ETP or browser addblocking software, as this monitoring might be considered tracking).

Browser Security Policies

Content Security Policy (CSP): Website owners might implement CSPs that restrict the sources from which scripts can be loaded. If a script’s (dynamically) generated paths or hosts don’t match the website’s CSP directives, the browser will block them.

While JENTIS JS code is hosted in the first party domain the host is static but the path is dynamic. Which in rare cases might produce CSP issues. Please use tools like https://csp-evaluator.withgoogle.com/ to identify possible issues.

Same-origin Policy: While JENTIS implementation approach minimizes cross-origin issues by using the first-party domain, drastic changes in path or subdomain could still occasionally trigger same-origin policy concerns, especially if the software attempts to interact with resources or APIs expecting a more stable origin. Possible implications must be evaluated by the websites backend and dev-ops teams individually.

Privacy Tools and Browsers

Enhanced Tracking Prevention (ETP) and Intelligent Tracking Prevention (ITP): Browsers like Firefox and Safari have implemented sophisticated tracking protections that can limit the capabilities of cookies and storage, potentially affecting how a software like JENTIS operates or stores data.

Privacy-focused Browsers: Browsers such as Brave or certain configurations of other major browsers may block or restrict scripts they interpret as tracking or unnecessary, even if they're hosted on the first-party domain.

Use your Analytics tools capabilities to test if there are browsers and versions that look suspicious or missing entirely. For that you must compare the data to independent sources (ie. public analytics and statistics or a server-side based analytics system that can evaluate server-logs) that make a comparison possible broken down by browser, operating system and versions.

Network-Level Blocking

Corporate Firewalls and Network Policies: Some organizations implement network-level blocking of certain types of traffic, including scripts that are not whitelisted or are detected as tracking or analytics tools.

ISP-Level Filtering: In rare cases, Internet Service Providers (ISPs) might have policies or filtering technologies in place that could impact the delivery or functionality of your software.

Both impact network requests that are a mandatory component in data capturing with client-side based JavaScript.

Browser Extensions

Apart from traditional ad-blockers, there are numerous browser extensions that users might install for privacy protection, script blocking, or more specific purposes like JavaScript control. These can interfere with or outright block a script from executing.

Browser Configuration and Features

Do Not Track (DNT) Settings: While not directly blocking scripts, browsers with DNT enabled might be used in conjunction with other tools or settings that limit tracking capabilities.

JavaScript Disabled: A small fraction of users might have JavaScript disabled in their browser settings for security or privacy reasons, which would prevent a client-side software from running.

Technical Issues and Compatibility

Browser Compatibility: With JENTIS we ensure that our script is compatible with all major browsers and their versions. However we can not guarantee that browser updates may introduce breaking changes or that very outdated browsers interpret the code incorrectly by not following state of the art syntax and methods.

Network Issues and Latency

DNS resolution problems, CDN outages, or other network-level issues could prevent a script from loading. This impacts individuals sometimes the same as broader outages of services. This can impact the data captured client-side.

Slow Network Speeds: On mobile networks, especially in areas with poor coverage or during peak usage times, bandwidth can be severely limited. This can delay the loading of scripts and cause timeouts, particularly for larger JavaScript files or when trying to fetch resources from the network. Which is a building block in any data capturing infrastructure, as each event (from page_view to click) must generate a data stream from the users device to the receiving endpoint (the server where JENTIS is hosted).
High Latency: High latency can delay the initiation of script requests, affecting the perceived load time of the website and the responsiveness of your data capturing software on your website.
Intermittent Connectivity: Mobile users might experience intermittent connectivity, where the connection drops and reconnects frequently. This could interrupt the loading or execution of scripts and potentially lead to incomplete or failed operations.

Browser and Device Limitations

Limited Processing Power: Mobile devices, especially older models, may have limited CPU or GPU power, which can slow down the execution of complex JavaScript operations, affecting the performance of the client-side software.
Memory Constraints: Devices with limited RAM might struggle to run memory-intensive scripts, leading to slow performance or, in extreme cases, crashing the tab or browser.
Battery Saving Modes: Many mobile devices enter battery saving modes that can limit background processing and reduce the priority of certain tasks, potentially affecting the operation of JavaScript applications.

Content Loading Strategy

Resource Competition: The client-side JENTIS script might be competing with other resources for bandwidth and browser processing time. On a congested network or a device with limited resources, this can lead to delays in execution.

How to overcome and identify gaps in data comparisons

Now that we are aware of a broad facet of possible hinderances that make comparisons of data very complex endeavor. With this in mind we want to point out best practice how to continue with a comparison that sheds as much light as possible on gaps and deviations between systems compared.

Data Collection Methodology

Understand Differences: Be aware of how each system collects data. For example, your tag management software might track user interactions in real-time on a website, while a BI system could be aggregating data from various sources, including offline sales or purchases made through different channels.

So make absolutely sure the metric in comparison is in both system describes the exact same observation. Some purchases can have time and source related limitations or dependencies, why in one system it appears related to one date and to a different date in another. Ensure that the data from both sources aligns in terms of the timing of events. Differences in time zones or the way timestamps are recorded can lead to discrepancies. Or the source for one system is the website while in the other it sums app and web.

Data Processing and Integrity

Data Cleansing: Different systems may have varying levels of data cleansing and validation. Ensure that the data from both sources is processed to remove duplicates, correct errors, and validate entries to improve accuracy.

Data captured with JENTIS and further reported with a tool such as GA4 has many implications on the processing. GA4 will filter data if it assumes it is spam or bot traffic.

Filtering and Segmentation: Ensure that the data segments you are comparing are equivalent. For instance, if one system filters out bot traffic or certain user demographics, the other system should apply the same filters.

Metric Definitions

Ensure Consistent Definitions: Different systems might define key metrics differently. For example, what one system considers a "purchase" might include all completed transactions, whereas another might only count transactions over a certain value or those that have been delivered and not returned.

In best case do not use metrics that rely on conditions or assumptions but instead report observations 1:1. Users and sessions are time constraint based metrics that are computed for a complete and finite dataset. Making them bad metrics for comparisons as that implications are complex and hard to trace back with exact technical specifications. Favor metrics that observe an event one by one to reduce complexity.

Conversion Attribution: Understand how each system attributes conversions. Some systems may use last-click attribution, while others may use first-click or even multi-touch attribution models. These differences can significantly impact how sales or conversions are counted.

Sampling and Extrapolation

Be Aware of Sampling: Some BI tools use data sampling to process large datasets quickly, which can lead to approximations. Ensure that if one dataset is based on sampling, the comparison takes this into account.

Consider the Impact of Extrapolation: If any extrapolation or predictive modeling is used in generating the datasets, understand the assumptions and methodologies behind these processes to ensure they are compatible.