Israel Karasek, Author at Kochava Kochava Thu, 18 Aug 2022 22:03:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.2 https://s34035.pcdn.co/wp-content/uploads/2016/03/favicon-icon.png Israel Karasek, Author at Kochava 32 32 What is SKAdNetwork, and how do you use it? https://s34035.pcdn.co/blog/what-is-skadnetwork-and-how-do-you-use-it/ Tue, 02 Mar 2021 00:06:03 +0000 https://www.kochava.com/?p=36598 The post What is SKAdNetwork, and how do you use it? appeared first on Kochava.

]]>

In this episode, we interview Vivian Watt, a product manager at Kochava who has been working intensely with teams to prepare for SKAdNetwork and the forthcoming changes with iOS 14.

You can learn more about available SKAdNetwork Solutions here.

In This Episode

headshots vivian watt

Vivian Watt
Product Manager at Kochava

The post What is SKAdNetwork, and how do you use it? appeared first on Kochava.

]]>
The Questions No One is Asking: Three Keys to Actionable Data https://www.kochava.com/blog/questions-no-one-asking-three-keys-actionable-data/ Tue, 02 May 2017 17:39:41 +0000 https://www.kochava.com/?p=9487 The post The Questions No One is Asking: Three Keys to Actionable Data appeared first on Kochava.

]]>

A blue-tinted man reading a newspaper, symbolizing old data, and a green-tinted man reading a smartphone, symbolizing real-time data.

We live in a world with ever-expanding volumes of data, data types, and use cases for how data can become actionable. With each generation of datasets, a corresponding set of tools are made available but often aren’t put together holistically to the entirety of how data should be understood.

Any experienced marketer can identify the difference between a toolset created for the sake of tool creation and a product that is engineered holistically to empower the marketer. At Kochava, we talk a lot about how we build our platform with “interlocking” capabilities. This means that as new data sets are available—or new needs arise from our customers—we don’t simply attach a new toolset on top of the system only to find a Frankenstein created over time. Instead, we talk about how our platform exposes interlocking capabilities so that various features across the platform take advantage and leverage the data available. This is what enables Kochava to deliver an unfair advantage for our customers.

Designing a product that creates this unfair advantage means we are solving real world problems—not only in concept. For us, this is not just a goal but an evolving reality. What many of our customers don’t know is that when we’re brought in to help understand performance, deep-dive into data, help unpack potential fraud indicators, or even identify under- (or over-) performing segments, we use the same tools internally to analyze data that we provide directly to our customers, and they’re pretty great!

When you have true row-level data at your disposal, not only is there complete transparency in all your system decisions, but you can quickly innovate other products and offerings on top of that dataset.

Let’s unpack three essential questions you should ask about your data and what the answers should tell you about your measurement platform. Along the way, I’ll spell out some of the nerdy things you can do with your data inside the Kochava platform.

How fast can I access my data?

When I was younger, I spent a fair share of time playing the original “Age of Empires.” It’s a great example of engaging multiplayers on rudimentary networking frameworks. At the time, partially because a certain 12-year-old couldn’t afford much of a PC, and partially because that same 12-year-old had setup his home network, he had less than ideal latency while playing on it. This meant that I would move a swath of archers or build reinforcements to defend a nearby castle, only to realize once the multiplayer engine caught up, that the enemy had flanked around the river and was now burning down a defenseless resource village.

We ask our customers to make sure we receive a real-time feed of their data, enabling us to make split-second decisions and empowering them to do the same through our products. This gives you the edge to make decisions each second, minute, or hour, instead of day or week. Programmatic or not, having a delayed picture of the world is not only blinding but infuriating.

Receiving real-time data in this manner is crucial for fighting fraud, which is becoming increasingly important. Keeping row-level data allows us to identify that n-factor that may be a significant fraud indicator but isn’t kept around if we were to only aggregate that data. For instance, we have run fraud audits for non-customers utilizing other measurement providers. When we perform these audits, we identify which parameters, keys, header values, etc., when scored, pinpoint potential fraud indicators. If that data is not provided to us at row-level, we only have what the measurement provider deemed “useful” at the time of ingestion. The result is that the world looks nice and fluffy because they aren’t keeping the correct metrics. Or, it looks deceivingly like there’s action being taken against fraudulent traffic because that provider only keeps the metrics that “can” be used by an inferior fraud detection tool.

To say that “real time” is a buzzword is probably an understatement. But not having true real-time data results in making delayed or blind decisions when attempting to capture an audience with an ever-decreasing attention span and results in false perceptions about acquired or reached users.

The working relationship of real time in conjunction with row-level data makes itself known in some of our real-time fraud abatement tools. Take Traffic Verification, a system that in real time validates that any click or impression meets a set of marketer-defined criteria. The criteria may include validation against a hot cache (a.k.a Global Fraud Blocklist) of known site IDs, IP addresses, and device IDs. The Blocklist is comprised of our curated list of bad actors added manually, programmatically (yes, an API is available), or flagged via the Fraud Console. Like all of our real-time decisions, we’re adding to the common Kochava object the results and decision flags around anything that passes or fails verification.

Further, once we’ve made any decision, a series of elements are added to each transaction to denote exactly what action was taken on that traffic. Remember I said that building in this fashion allows us to properly interlock capabilities? As soon as we added this feature to the platform, our customers could write Kochava queries to alert them on any number of Traffic Verification states or indicators. What does this look like? Send me a text when a new site ID has a 50% increase in fraud detected in one hour. Or, send me a Slack if a tracker has over 10% of its traffic fail verification in 24 hours. That’s the power of building on true row-level (and real-time) data.

How much of my data is available to access?

The short answer? All of it. To make our data immediately available to customers or internal downstream services in new and developing ways, we’ve created a common structure among our ingestion services. This allows all our customers to plug directly into our core processing engine as though they had internal access to our processing pipelines. It means that our ingestion service can accept a native Kochava object that represents a real-time transaction processing through our system. It takes transparency to an entirely new level.

This value is demonstrated as we introduce percent-weighted waterfalls for impression tracking of viewed videos. This configurable setting allows you to weight impressions, not just by the last impression but also across the waterfall based on the percentage of video completed. This inter-waterfall weighting model is new to Kochava but will be immediately available to any of our consuming services, and even our customers subscribing to our common object via our postback engine. What’s beautiful about this is that because we also adhere to this common module within our attribution modules, we’ve automatically leveled the landscape among SAN and Kochava network partners.

{
“attribution_influence”: “priority_imp_progress”,
“video_progress”: 0.7
}

We’ve also innovated methods to present our customers with a familiar way to access their data in an ad-hoc fashion. One of our most recent forays into this arena resulted in the BI tool, Kochava Query. It’s a full SQL engine that allows access directly into every single (yes, no joke) transaction we process on behalf of customers within seconds of being received. This means you can custom-query your clicks, impressions, sessions and purchases from this massive processing warehouse. Like anything else we build, this isn’t a fragile service. Have billions of impressions per day? Select “from impressions.” Do it. I dare you!

Is my data really row level? And, what is “log level”?

Row-level data is key not only to a continuous conversion of transparency but also for any critical question that has to be answered or need that must to fulfilled at a moment’s notice. You’ll never see an attribution decision or picture of the world presented in aggregate that isn’t represented with row-level data underneath. We make no claim to have a visionary understanding of each customer’s goals, unique funnel characteristics or future needs based on data we’re processing. This is why it’s so important to maintain row-level data. You can pivot data for any question.

And then there’s log-level data.

This one makes me chuckle. I hear “We keep log-level data.” The part you don’t hear is, “No, you can’t access it on your own, and yes, it will take us a while to retrieve/parse/present it to you.” There’s a big difference between log-level and row-level. Let’s take a look at some log-level data (we keep both). This was pulled directly from one of our NGINX proxies.

A block of garbled data feedback considered

Now, I could write a parser to look through that log-level data, join it back to an internal user ID, and set up an ETL pipeline to ensure this data is in a data warehouse or feeds an internal service. But, the offensive part? You can’t analyze it. You can’t ask it questions. How log-level data is stored and made available is key to proving its value. As noted above, we’ve made row-level data directly actionable and attainable for our customers. Check out how much more useful structured, row-level data is.

An organized list of data considered

A critical difference for marketers between actionable row-level and log-level data is install deduplication over large periods of time. We use key indicators (primarily device IDs but also custom persistent Kochava IDs) to prevent a device that may be unaware that it’s already reported its install (e.g., reinstall or device restore) from reporting and creating another install profile. If this data is aggregated, we lose this ability to lookup over days, months, years, really for all time, which devices have installed and which identifiers were present on that first install to create a history of a device/user transitioning among devices, restores, and reinstalls.

To simply understand if a device or user has installed an application doesn’t, in all scenarios, require a full lookup of device history. But, populating caches like Redis or Memcache with billions of devices requires a historical store that is used to repopulate after an update or cache bust.

In the end, we at Kochava live and breathe data. We think in row level, we dream in real time. The ability to peer inside of a data feed, look past the noise, and ask that data questions is incredibly powerful, if the right tools are used.

See why we're the best

About the Author

Eric Mann, Director of Product EngineeringEric Mann is the Director of Product Engineering at Kochava where he spends his days both hands-on with the codebase, as well as alongside fellow developers to architect, build, and sustain Kochava products and services. In addition to his role inside of Product/Development, Eric enjoys working directly with clients on implementations of the Kochava platform.

The post The Questions No One is Asking: Three Keys to Actionable Data appeared first on Kochava.

]]>
The Business of Real Time https://www.kochava.com/blog/building-real-time-system-measurement/ Mon, 24 Apr 2017 17:57:44 +0000 https://www.kochava.com/?p=9312 The post The Business of Real Time appeared first on Kochava.

]]>

Eric Mann, Director of Product Engineering at Kochava, authors, “The Code We Live By,” a new blog series that gives us a behind-the-scenes look at what makes a real-time measurement data provider work smoothly.

This first post introduces us to the complex world of recording data globally in the most efficient manner. Mann and his team have designed a fault-tolerant system that constantly improves itself. As you can imagine, building an international data system is no small feat. It is the challenge of solving a million-piece jigsaw puzzle with multiple players within fractions of a second. Welcome to the world of Kochava developers.


a metronome that ticks in milliseconds where each tick represents a user action

There are certain challenges when building real-time systems that require detailed coordination and infrastructure definition when compared to other services in ad tech. In our world, we’re bound by a requirement of FIFO at the millisecond level. I liken this to putting together a million-piece puzzle with 100 colleagues where each colleague is required to attach the next piece while keeping in time with a metronome ticking every millisecond. Everyone still with me? Let’s continue!

Often the question of application architecture, even service architecture, starts with the definition of “real time.” I’ve heard tales of companies willing to throw cloud-based solutions off the table claiming that self-hosting hardware gives the advantage of resiliency at a lower cost. Or, alternatively, throw self-hosted solutions off the table and hand over their system under a single provider.

We’ve architected Kochava with the best of both worlds, allowing for this time-correct puzzle, that we call real-time attribution, to be assembled correctly, in the right order, with infrastructure spread across the globe with cloud-based providers, and co-located in data center hubs on our own hardware. Let’s unpack some advantages and considerations of building a high-volume system with this hybrid approach.

Data ingestion on a global scale

“The cloud” has become a buzzword for either the solution to or cause of (depending on your perspective) many issues with scaling tech infrastructure. However, it’s important to remember that ”the cloud” is just somebody else’s computer. Don’t be tricked into thinking that either a) cloud computing is better or, b) on-premise computing is more flexible. Both can be true, both can be false. It’s kind of like the choice between renting or buying a home. If you buy, you are the master of your destiny, but you’re also responsible for all maintenance and upkeep costs. If you rent, the infrastructure is someone else’s problem, but you probably can’t knock down the wall between the kitchen and living room. Make sure you’re not creating a false dichotomy when considering different architectural approaches.

Using a cloud provider with a private backhaul, such as Google, allows us to house clusters of ingestion systems, all automatically deployed, in different regions of the globe—all sitting on privately owned (and pretty quick) Google fiber. For example, under normal load, we float around 65 ingestion point instances, regionally distributed, primarily via the Google’s Compute Engine.

Why is this important?

  • Lower SDK network footprint
  • Snappy click redirects
  • Reduced latency for our partners
  • Global network redundancy

Picture, for instance, that we decided to use a single datacenter (or region) as our ingestion center for global traffic. I’ve taken an average sample of the latency profile (how long a system takes) for click redirects and/or SDK communication in this scenario, which I’ve setup in Frankfurt.

[64 bytes from 178.162.216.219: ttl=40 time=283.805 ms]

Let’s run the same test but instead resolve the Kochava ingestion system.

[64 bytes from 107.178.254.148: ttl=54 time=16.572 ms]

A time range from 16 ms to 283 ms is a latency improvement of 17x. This is made possible because our endpoint auto-resolves to the lowest latency ingestion point. In this scenario, it resolved to one of our ingestion points in Oregon. At scale, this creates a measurable impact. Not only for user experiences like click redirection but also reducing our SDK’s network footprint and external partner communication latency. For our customers using our S2S offerings, having an endpoint respond 17x faster also saves significant infrastructure as applications are able to complete a request to Kochava and move on to their next task.

This model of geographic load balancing becomes even more advantageous to Kochava customers when there is a large-scale internet disruption in a certain region or with certain backhaul providers. Our load balancers automatically reroute traffic to compute instances in normalized traffic instances, allowing an uninterrupted traffic flow and a maintained level of latency. Interrupted response thresholds have impacts beyond simply click redirects. More network latency means potential SDK interruption and external partner impact. We’ve packaged internal queues and retry logic into our SDKs for scenarios when disruption is unavoidable. Minimizing the opportunity for traffic interruptions is critical to maintaining top north syndication to partner systems and real- time decisions in ad buys.

Variable traffic throttling

A challenge of any distributed computing system is how each instance should react when there are interruptions in peered systems across multiple datacenters. We’ve designed our ingestion system to be aware of network connectivity issues that may be sending larger swaths of traffic to certain regions. and to update traffic processing to maintain our processing order cross-region.

Each of our ingestion points communicates with all peer instances via an encrypted representation of their entire processing state ten times per second. This state includes things such as how large is each traffic queue, a representation of time-sorted traffic, network latency, incoming traffic volumes, and over 100 additional data points that keep these systems in sync across the globe. The result is that at any time, each ingestion point knows how to behave in relation to all other ingestion peers.

With 65 ingestion points representing over 100 states every 100 milliseconds, our incoming traffic queue profile is evaluated globally over 65,000 times per second for FIFO consistency. The result? We clear each second of traffic before moving on to the next, across the globe. These ingestion systems unfold the traffic into our core processing systems for a true FIFO result and help to have strong data consistency.

Commodity hardware and systems: Processing billions of records

Over the past two years, we’ve moved away from languages such as Node.js and PHP (and a few others) to Go. Go has allowed our team to create a rich repository of lightning-fast applications and packages for quick and agile development of new services. What used to take 100+ cores to run, we can now optimize down to running on the equivalent of a couple of Raspberry Pis.

Take our Global Fraud Blocklist, a new system at Kochava that processes billions of real-time transactions and validates them against our library of known fraud metrics and blocklists. During a normalized traffic load, this system runs on just a fraction of the cores it would have previously required and scales up to 100x capacity at a moment’s notice.

These types of optimizations allow us to set up commodity clusters of machines to run dockerized instances of our application stack—either behind a dynamic proxy or as a consumer service—any of which scale with the traffic load. Our applications run with a command as simple as “docker run traffic-verification” and scale with a simple POST request, or click to our clustering service.

The majority of our data processing (once external ingestion has been completed by our ingestion cloud) is done in a co-located data center. These clusters are an extremely cost-effective way to process cloud-scale transactions on owned hardware. Our move to containerization and a “run local” model means we can move these services to any machine, cloud or owned, at a moment’s notice or in failover scenarios.

We believe strongly that continuously improving our system is critical to maintaining our technological advantage over others in the space. It ensures that our customers are interacting with state-of-the-art tech—data that’s valid and consistent against an ever-changing industry—and systems that don’t buckle under pressure, even as new marketers move to mobile at a global scale.

See why we're the best

About the Author

Eric Mann, Director of Product EngineeringEric Mann is the Director of Product Engineering at Kochava where he spends his days both hands-on with the codebase, as well as alongside fellow developers to architect, build, and sustain Kochava products and services. In addition to his role inside of Product/Development, Eric enjoys working directly with clients on implementations of the Kochava platform.

The post The Business of Real Time appeared first on Kochava.

]]>