Analytics | Microsoft Open Source Blog

•

October 22, 2024

•

8 min read

Detect and react intelligently to changes in data with Drasi

This introductory post will focus on the core concepts of Drasi, and its major components such as Sources.

•

November 12, 2020

•

2 min read

Cloudera Data Platform’s integration with Azure delivers enterprise security and governance

Modern analytics and the resulting business insights unlock new opportunities to optimize company performance and open new revenue streams. Since these initiatives also heighten the need for greater security and governance of company data, Identity and Access Management (IAM) needs to be a foundational component of any corporate security plan that covers company data.

•

June 30, 2020

•

1 min read

Hyperspace, an indexing subsystem for Apache Spark™, is now open source

For Microsoft’s internal teams and external customers, we store datasets that span from a few GBs to 100s of PBs in our data lake. The scope of analytics on these datasets ranges from traditional batch-style queries (e.g., OLAP) to explorative ”finding the needle in a haystack” type of queries (e.g., point-lookups, summarization).

•

June 23, 2020

•

2 min read

What’s new in SandDance 3

SandDance, the open source data visualization tool from Microsoft Research, is launching several new features in version 3. Facets on all chart types We’ve added much more control to faceted data. All chart types now have the Facet By column feature.

•

October 10, 2019

•

2 min read

Microsoft open sources SandDance, a visual data exploration tool

SandDance, the beloved data visualization tool from Microsoft Research, has been re-released as an open source project on GitHub. This new version of SandDance has been re-written from the ground up as an embeddable component that works with modern JavaScript toolchains.

•

August 13, 2019

•

8 min read

Trill 103: Ingress, Egress, and Trill’s notion of time

Congratulations! You’ve made it to the next installment of our overview of Trill, Microsoft’s open source streaming data engine. As noted in our previous posts about basic queries and joins, Trill is a temporal query processor. Trill works with data that has some intrinsic notion of time.

•

July 1, 2019

•

4 min read

AzureR now available: Create, manage, and monitor Azure services with R

AzureR, a family of packages that provides tools to manage Azure resources from the open source R language, is now available. If you code in Python, C#, Java or JavaScript, you already have a rich selection of SDKs to choose from to interact with Azure.

A woman smiles at coworker in an office.

•

May 1, 2019

•

5 min read

Trill 102: Temporal Joins

This post is the second in a sequence intended to introduce developers to the Trill streaming query engine, its programming model, and its capabilities. We introduced in the previous post the concept of snapshot semantics for temporal query processing.

data accelerator (race car illustration)

•

April 16, 2019

•

4 min read

Microsoft open sources Data Accelerator for Apache Spark

Welcome to Data Accelerator! Data Accelerator for Apache Spark simplifies streaming big data using Spark. Data Accelerator has been used for two years within Microsoft for processing streamed data across many internal deployments handling data volumes at Microsoft scale.

•

March 28, 2019

•

7 min read

Trill 101: how to add temporal queries to your applications

Last December, we released Trill, an open source .NET library designed to process one trillion events a day. Trill provides a temporal query language enabling you to embed real-time analytics in your own application. In this blog post, we spend some time introducing how to get started using Trill.

•

December 17, 2018

•

1 min read

Microsoft open sources Trill, a powerful query processor for analytics at incredible speeds

In today’s demanding business environment, processing massive amounts of data each millisecond is becoming a common business requirement. We are excited to be announcing that an internal Microsoft project known as Trill—for processing “a trillion events per day”—is now being open sourced.

•

July 9, 2018

•

22 min read

How to process streams of data with Apache Kafka and Spark

Data is produced every second, it comes from millions of sources and is constantly growing. Have you ever thought how much data you personally are generating every day? Data: direct result of our actions There’s data generated as a direct result of our actions and activities: Obviously, that’s not it.