Apache Kylin’s Performance Boost from Apache HBaseHBaseCon
Hongbin Ma and Luke Han (Kyligence)
Apache Kylin is an open source distributed analytics engine that provides a SQL interface and multi-dimensional analysis on Hadoop supporting extremely large datasets. In the forthcoming Kylin release, we optimized query performance by exploring the potentials of parallel storage on top of HBase. This talk explains how that work was done.
Apache Kylin Extreme OLAP Engine for Big DataLuke Han
This document provides an overview of Apache Kylin, an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. It discusses Kylin's features such as fast query performance, SQL interface, seamless integration with BI tools, and job management capabilities. It also describes Kylin's technical architecture including its use of MapReduce for cube building, storage of cubes in HBase, and routing of SQL queries to the query engine. The document outlines Kylin's roadmap including plans to improve cube building algorithms and support real-time analysis using streaming and Spark.
Apache Kylin is an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) on extremely large datasets in Hadoop. It allows for interactive queries on datasets with billions of rows in seconds by pre-building a data cube. Kylin provides ANSI SQL support, seamless integration with BI tools, and scale-out architecture to support thousands of concurrent users. It aims to balance query performance and data storage size through techniques like partial cube modeling, dictionary encoding, and incremental cube builds.
Apache Kylin - Balance between space and time - Hadoop Summit 2015Debashis Saha
This document provides an overview of Apache Kylin, an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for extremely large datasets. The document discusses Kylin's features such as fast OLAP capabilities, integration with BI tools, job management and monitoring. It also covers Kylin's technical aspects including its architecture, cube building process, storage in HBase and query optimization techniques like building partial cubes and incremental builds. The presenters provide an update on Kylin's adoption and roadmap for further improvements.
Apache Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets and subsecond query latency.
Talks about best practices and patterns on how to design an efficient cube in Kylin. Covers concepts like mandatory dimension, hierarchy dimension, derived dimension, incremental build, aggregation group etc.
This document provides an overview of Apache Kylin, an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) capabilities on Hadoop for extremely large datasets. The presentation covers what Kylin is, its key features and technical highlights including performance, its roadmap, and concludes with a Q&A section.
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
Apache Kylin, which started as a big data OLAP engine, is reaching its v2.0. Yang Li explains how, armed with snowflake schema support, a full SQL interface, spark cubing, and the ability to consume real-time streaming data, Apache Kylin is closing the gap to becoming a real-time data warehouse.
Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets
Kylin Open Source Web Site: http://kylin.io
This presentation provides an overview of installing and using Apache Kylin, an open source distributed analytics engine for extremely large datasets. It discusses prerequisites, downloading Kylin, starting the Kylin service, building a sample cube by loading sample sales data into Hive and HBase, and querying the cube using SQL through the Kylin Web interface. The presentation is intended for using Kylin on Hortonworks Sandbox HDP 2.2.
The document discusses new features in Apache Kylin, an open source distributed analytics engine. It introduces a plugin architecture that allows Kylin to use different engines, sources, and storage. A new MapReduce cube engine uses in-memory cubing to build cubes 1.5x faster. Parallel scans in the HBase storage improve query performance 2x. Streaming cubing enables near real-time analysis by ingesting from Kafka. User-defined aggregations and integrations with Excel, PowerBI and Zeppelin are also covered. The plugin-based design provides freedom, extensibility and flexibility to the system.
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
Apache Kylin (incubating) is a new project to bring OLAP cubes to Hadoop. I walk through the project and describe how it works and how users see the project.
Apache Kylin on HBase: Extreme OLAP engine for big dataShi Shao Feng
Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets.
Apache Kylin general introduction, including background, business needs and technical challenges, theory and architecture, features and some tech detail. Following with performance and benchmark, finally, ecosystem and roadmap.
More detail, please visit http://kylin.io or follow @ApacheKylin.
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
Apache Kylin is an open source distributed analytics engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. The presentation discusses Kylin's plugin architecture which allows it to integrate with different data engines and storage systems. It also describes Kylin's fast cubing approach which calculates the entire cube in one MapReduce job to minimize overhead compared to the earlier layered cubing approach. Streaming cubing is also discussed which allows incremental updates to the cube in micro-batches from data sources like Kafka.
This document discusses data cubes in Apache Hive. It provides background on Hive and why it is used at Inmobi for analytics. It describes how data cubes are modeled and stored in Hive, including facts, dimensions, and storage. Examples of cube queries in Hive Query Language (HQL) are shown. The document also introduces Grill, Inmobi's analytics platform that utilizes Hive and provides additional capabilities like query scheduling and multiple execution engines.
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
The document outlines the roadmap and community growth of Apache Kylin, an open source analytics platform. It discusses Kylin's evolution from an initial prototype in 2013 to adding features like streaming and real-time capabilities. The roadmap also details future plans for advanced OLAP functions, in-memory analysis, and more. Additionally, the document summarizes Kylin's expanding community through meetups, conferences, and new committers from various companies. It concludes by encouraging collaboration to advance the project.
1. Apache Kylin is an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets and sub-second response times.
2. It originated from a project at eBay and was later donated to the Apache Software Foundation. Major versions were released in 2014, 2015, and 2016 with new features like a plugin architecture, improved cube building engines, and support for additional BI tools.
3. The architecture is designed for flexibility and extensibility with pluggable components for sources, storage, and engines as well as a focus on performance improvements through techniques like parallel scanning of cubes.
The document discusses Apache Kylin, an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for extremely large datasets. It provides an overview of Kylin's features such as sub-second query latency, ANSI SQL support, and seamless integration with BI tools. The document also covers Kylin's architecture, cube storage in HBase, query processing using Calcite, and optimization techniques for cube building.
During Kylin OLAP development, we setup many engineering principles in the team. These principles are very important to delivery Kylin with high quality and on schedule.
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets.
If you want to do multi-dimension analysis on large data sets (billion+ rows) with low query latency (sub-seconds), Kylin is a good option. Kylin also provides seamless integration with existing BI tools (e.g Tableau).
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
This document provides an overview of Apache Kylin, an open source distributed analytics engine from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. Key points include: Kylin is designed to accelerate analytics queries on billions of rows of data on Hadoop; it provides ANSI SQL, full OLAP capability, and seamless integration with BI tools; and performance tests show it can return results over 100x faster than Hive for both high-level and drill-down queries.
This presentation contains following slides,
Introduction To OLAP
Data Warehousing Architecture
The OLAP Cube
OLTP Vs. OLAP
Types Of OLAP
ROLAP V/s MOLAP
Benefits Of OLAP
Introduction - Apache Kylin
Kylin - Architecture
Kylin - Advantages and Limitations
Introduction - Druid
Druid - Architecture
Druid vs Apache Kylin
References
For any queries
Contact Us:- argonauts007@gmail.com
The document discusses the MapR Big Data platform and Apache Drill. It provides an overview of MapR's M7 which makes HBase enterprise-grade by eliminating compactions and enabling a unified namespace. It also describes Apache Drill, an interactive query engine inspired by Google's Dremel that supports ad-hoc queries across different data sources at scale through its logical and physical query planning. The document demonstrates simple queries and provides details on contributing to and using Apache Drill.
Apache kylin 2.0: from classic olap to real-time data warehouseYang Li
Apache Kylin, which started as a big data OLAP engine, is reaching its v2.0. Yang Li explains how, armed with snowflake schema support, a full SQL interface, spark cubing, and the ability to consume real-time streaming data, Apache Kylin is closing the gap to becoming a real-time data warehouse.
Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets
Kylin Open Source Web Site: http://kylin.io
This presentation provides an overview of installing and using Apache Kylin, an open source distributed analytics engine for extremely large datasets. It discusses prerequisites, downloading Kylin, starting the Kylin service, building a sample cube by loading sample sales data into Hive and HBase, and querying the cube using SQL through the Kylin Web interface. The presentation is intended for using Kylin on Hortonworks Sandbox HDP 2.2.
The document discusses new features in Apache Kylin, an open source distributed analytics engine. It introduces a plugin architecture that allows Kylin to use different engines, sources, and storage. A new MapReduce cube engine uses in-memory cubing to build cubes 1.5x faster. Parallel scans in the HBase storage improve query performance 2x. Streaming cubing enables near real-time analysis by ingesting from Kafka. User-defined aggregations and integrations with Excel, PowerBI and Zeppelin are also covered. The plugin-based design provides freedom, extensibility and flexibility to the system.
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
Apache Kylin (incubating) is a new project to bring OLAP cubes to Hadoop. I walk through the project and describe how it works and how users see the project.
Apache Kylin on HBase: Extreme OLAP engine for big dataShi Shao Feng
Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets.
Apache Kylin general introduction, including background, business needs and technical challenges, theory and architecture, features and some tech detail. Following with performance and benchmark, finally, ecosystem and roadmap.
More detail, please visit http://kylin.io or follow @ApacheKylin.
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...Luke Han
Apache Kylin is an open source distributed analytics engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. The presentation discusses Kylin's plugin architecture which allows it to integrate with different data engines and storage systems. It also describes Kylin's fast cubing approach which calculates the entire cube in one MapReduce job to minimize overhead compared to the earlier layered cubing approach. Streaming cubing is also discussed which allows incremental updates to the cube in micro-batches from data sources like Kafka.
This document discusses data cubes in Apache Hive. It provides background on Hive and why it is used at Inmobi for analytics. It describes how data cubes are modeled and stored in Hive, including facts, dimensions, and storage. Examples of cube queries in Hive Query Language (HQL) are shown. The document also introduces Grill, Inmobi's analytics platform that utilizes Hive and provides additional capabilities like query scheduling and multiple execution engines.
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
The document outlines the roadmap and community growth of Apache Kylin, an open source analytics platform. It discusses Kylin's evolution from an initial prototype in 2013 to adding features like streaming and real-time capabilities. The roadmap also details future plans for advanced OLAP functions, in-memory analysis, and more. Additionally, the document summarizes Kylin's expanding community through meetups, conferences, and new committers from various companies. It concludes by encouraging collaboration to advance the project.
1. Apache Kylin is an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets and sub-second response times.
2. It originated from a project at eBay and was later donated to the Apache Software Foundation. Major versions were released in 2014, 2015, and 2016 with new features like a plugin architecture, improved cube building engines, and support for additional BI tools.
3. The architecture is designed for flexibility and extensibility with pluggable components for sources, storage, and engines as well as a focus on performance improvements through techniques like parallel scanning of cubes.
The document discusses Apache Kylin, an open source distributed analytics engine that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for extremely large datasets. It provides an overview of Kylin's features such as sub-second query latency, ANSI SQL support, and seamless integration with BI tools. The document also covers Kylin's architecture, cube storage in HBase, query processing using Calcite, and optimization techniques for cube building.
During Kylin OLAP development, we setup many engineering principles in the team. These principles are very important to delivery Kylin with high quality and on schedule.
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets.
If you want to do multi-dimension analysis on large data sets (billion+ rows) with low query latency (sub-seconds), Kylin is a good option. Kylin also provides seamless integration with existing BI tools (e.g Tableau).
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
This document provides an overview of Apache Kylin, an open source distributed analytics engine from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets. Key points include: Kylin is designed to accelerate analytics queries on billions of rows of data on Hadoop; it provides ANSI SQL, full OLAP capability, and seamless integration with BI tools; and performance tests show it can return results over 100x faster than Hive for both high-level and drill-down queries.
This presentation contains following slides,
Introduction To OLAP
Data Warehousing Architecture
The OLAP Cube
OLTP Vs. OLAP
Types Of OLAP
ROLAP V/s MOLAP
Benefits Of OLAP
Introduction - Apache Kylin
Kylin - Architecture
Kylin - Advantages and Limitations
Introduction - Druid
Druid - Architecture
Druid vs Apache Kylin
References
For any queries
Contact Us:- argonauts007@gmail.com
The document discusses the MapR Big Data platform and Apache Drill. It provides an overview of MapR's M7 which makes HBase enterprise-grade by eliminating compactions and enabling a unified namespace. It also describes Apache Drill, an interactive query engine inspired by Google's Dremel that supports ad-hoc queries across different data sources at scale through its logical and physical query planning. The document demonstrates simple queries and provides details on contributing to and using Apache Drill.
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
Machine Learning at the Limit
John Canny, UC Berkeley
How fast can machine learning and graph algorithms be? In "roofline" design, every kernel is driven toward the limits imposed by CPU, memory, network etc. This can lead to dramatic improvements: BIDMach is a toolkit for machine learning that uses rooflined design and GPUs to achieve two- to three-orders of magnitude improvements over other toolkits on single machines. These speedups are larger than have been reported for *cluster* systems (e.g. Spark/MLLib, Powergraph) running on hundreds of nodes, and BIDMach with a GPU outperforms these systems for most common machine learning tasks. For algorithms (e.g. graph algorithms) which do require cluster computing, we have developed a rooflined network primitive called "Kylix". We can show that Kylix approaches the rooline limits for sparse Allreduce, and empirically holds the record for distributed Pagerank. Beyond rooflining, we believe there are great opportunities from deep algorithm/hardware codesign. Gibbs Sampling (GS) is a very general tool for inference, but is typically much slower than alternatives. SAME (State Augmentation for Marginal Estimation) is a variation of GS which was developed for marginal parameter estimation. We show that it has high parallelism, and a fast GPU implementation. Using SAME, we developed a GS implementation of Latent Dirichlet Allocation whose running time is 100x faster than other samplers, and within 3x of the fastest symbolic methods. We are extending this approach to general graphical models, an area where there is currently a void of (practically) fast tools. It seems at least plausible that a general-purpose solution based on these techniques can closely approach the performance of custom algorithms.
Bio
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He is currently a Data Science Senior Fellow in Berkeley's new Institute for Data Science and holds a INRIA (France) International Chair. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, Quantcast and Microsoft. He currently works on several applications of data mining for human learning (MOOCs and early language learning), health and well-being, and applications in the sciences.
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
Talk at Philly ETE Apr 28 2017
We will talk about Spotify’s story of migrating our big data infrastructure to Google Cloud. Over the past year or so we moved away from maintaining our own 2500+ node Hadoop cluster to managed services in the cloud. We replaced two key components in our data processing stack, Hive and Scalding, with BigQuery and Scio and are able to iterate at a much faster speed. We will focus the technical aspect of Scio, a Scala API for Apache Beam and Google Cloud Dataflow and how it changed the way we process data.
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
Matei Zaharia is an assistant professor of computer science at Stanford University, Chief Technologist and Co-founder of Databricks. He started the Spark project at UC Berkeley and continues to serve as its vice president at Apache. Matei also co-started the Apache Mesos project and is a committer on Apache Hadoop. Matei’s research work on datacenter systems was recognized through two Best Paper awards and the 2014 ACM Doctoral Dissertation Award.
Let's make a brief introduction to Azure Data eXplorer, with many examples using Kusto dialect and C# client.
With a particular focus on IIoT contexts and proces control data, let's discover how to implement time series analysis in terms of pattern recognition, and trend correlation.
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonData Con LA
Introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures (e.g., streaming, real-time intelligence, and analytics). We will review the AWS big data portfolio of services including Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR), Redshift, Aurora and Machine Learning, and learn how customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
This document discusses trends in cutting edge technologies, including the evolution of platforms and programming languages. It covers the shift from mainframes and client-server architectures to modern mobile and cloud platforms driven by technologies like JavaScript, HTML5, and cloud computing. Key areas like big data, IoT, and DevOps are also summarized.
This document discusses using open source tools and data science to drive business value. It provides an overview of Pivotal's data science toolkit, which includes tools like PostgreSQL, Hadoop, MADlib, R, Python, and more. The document discusses how MADlib can be used for machine learning and analytics directly in the database, and how R and Python can also interface with MADlib via tools like PivotalR and pyMADlib. This allows performing advanced analytics without moving large amounts of data.
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
Using popular data science tools such as Python and R, the book offers many examples of real-life applications, with practice ranging from small to big data.
Running Presto and Spark on the Netflix Big Data PlatformEva Tse
This document summarizes Netflix's big data platform, which uses Presto and Spark on Amazon EMR and S3. Key points:
- Netflix processes over 50 billion hours of streaming per quarter from 65+ million members across over 1000 devices.
- Their data warehouse contains over 25PB stored on S3. They read 10% daily and write 10% of reads.
- They use Presto for interactive queries and Spark for both batch and iterative jobs.
- They have customized Presto and Spark for better performance on S3 and Parquet, and contributed code back to open source projects.
- Their architecture leverages dynamic EMR clusters with Presto and Spark deployed via bootstrap actions for scalability.
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
This document summarizes Scio, a Scala API for Google Cloud Dataflow and Apache Beam. Scio provides a DSL for writing pipelines in Scala to process large datasets. It originated from Scalding and was moved to use Dataflow/Beam for its managed service, integration with Google Cloud Platform services, and unified batch and streaming model. Scio aims to make Beam concepts accessible from Scala and provides features like type-safe BigQuery and Bigtable access, distributed caching, and future-based job orchestration to make Scala pipelines on Dataflow/Beam more productive.
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
Flowcon keynote was a few days before CMG, a few tweaks and some extra content added at the start and end. Opening Keynote talk for both conferences on how Speed Wins and how Netflix is doing Continuous Delivery
This document summarizes a presentation on Apache SystemML, an open source machine learning framework that provides scalable machine learning capabilities. It discusses SystemML's support for deep learning algorithms like convolutional neural networks and its ability to optimize machine learning workloads through techniques like operator fusion. It also demonstrates SystemML running image classification and medical image segmentation deep learning models on IBM Power systems and provides performance comparisons between Power and x86 architectures.
Serverless SQL provides a serverless analytics platform that allows users to analyze data stored in object storage without having to manage infrastructure. Key features include seamless elasticity, pay-per-query consumption, and the ability to analyze data directly in object storage without having to move it. The platform includes serverless storage, data ingest, data transformation, analytics, and automation capabilities. It aims to create a sharing economy for analytics by allowing various users like developers, data engineers, and analysts flexible access to data and analytics.
Managing Changing Data with FME: Part 2 – Flexible Approaches to Tracking Cha...Safe Software
Your data is always changing – but are you tracking it efficiently? By using change detection methods in FME, you can streamline your workflows, reduce manual effort, and boost productivity.
In Part 1, we explored a basic method for detecting changes using the ChangeDetector transformer. But what if your use case requires a more tailored approach?
In this webinar, we’ll go beyond basic comparison and explore more flexible, customizable methods for tracking data changes.
Join us as we explore these three methods for tracking data changes:
- Filtering by modification date to instantly pull updated records.
-Using database triggers in shadow tables to capture changes at the column level.
-Storing all changes in a transaction log to maintain a history of all changes with transactional databases.
Whether you’re handling a simple dataset or managing large-scale data updates, learn how FME provides the adaptable solutions to track changes with ease.
Manufacturing organizations are under constant pressure to streamline operations, improve agility, and make better use of the data they already have. Yet, many teams still struggle with disconnected systems and fragmented information that slow decision-making and reduce productivity. This webinar explores how AI-powered search and structured metadata can address these challenges by making enterprise data more accessible, actionable, and aligned with business needs.
Participants will gain practical insights into how modern search technologies are being applied to unify data across platforms, improve findability, and surface hidden insights—all without replacing core systems. Whether you're responsible for IT infrastructure, operations, or digital transformation, this session offers strategies to reduce friction and get more value from your existing information ecosystem.
Key Topics Covered:
The realities of managing disparate data in manufacturing and business operations
Leveraging AI to improve data discoverability and support better decision-making
Using structured metadata to unlock insights from existing platforms
Strategies for deploying intelligent search solutions across enterprise systems
"It's not magic, folks. It really does need that data. Now, what we can do is we can accelerate this. We can accelerate the derivation of an information architecture product, data architecture, content architecture, knowledge architecture, and apply it to the content, to the product data, to whatever it is."- Seth Earley
"You can have the best systems in the world, but if your teams are still spending hours finding specs and product data, that investment all just sits there idle." - Crys Black
Introduction to LLM Post-Training - MIT 6.S191 2025Maxime Labonne
In this talk, we will cover the fundamentals of modern LLM post-training at various scales with concrete examples. High-quality data generation is at the core of this process, focusing on the accuracy, diversity, and complexity of the training samples. We will explore key training techniques, including supervised fine-tuning, preference alignment, and model merging. The lecture will delve into evaluation frameworks with their pros and cons for measuring model performance. We will conclude with an overview of emerging trends in post-training methodologies and their implications for the future of LLM development.
Introducing Agnetic AI: Redefining Intelligent Customer Engagement for the Future of Business
In a world where data is abundant but actionable insights are scarce, Agnetic AI emerges as a transformative force in AI-powered customer engagement and predictive intelligence solutions. Our cutting-edge platform harnesses the power of machine learning, natural language processing, and real-time analytics to help businesses drive deeper connections, streamline operations, and unlock unprecedented growth.
Whether you're a forward-thinking startup or an enterprise scaling globally, Agnetic AI is designed to automate customer journeys, personalize interactions at scale, and deliver insights that move the needle. Built for performance, agility, and results, this AI solution isn’t just another tool—it’s your competitive advantage in the age of intelligent automation.
The Gold Jacket Journey - How I passed 12 AWS Certs without Burning Out (and ...VictorSzoltysek
Only a few hundred people on the planet have done this — and even fewer have documented the journey like this.
In just one year, I passed all 12 AWS certifications and earned the ultra-rare AWS Gold Jacket — without burning out, without quitting my job, and without wasting hours on fluff.
My secret? A completely AI-powered study workflow using ChatGPT, custom prompts, and a technique I call DeepResearch — a strategy that pulls high-signal insights from Reddit, blogs, and real-world exam feedback to shortcut the noise and fast-track what actually matters.
This is the slide deck from my live talk — it breaks down everything:
✅ How I used ChatGPT to quiz, explain, and guide me
✅ How DeepResearch helped me prioritize the right content
✅ My top 80/20 study tips, service-specific rules of thumb, and real-world exam traps
✅ The surprising things that still trip up even experienced cloud teams
If you’re considering AWS certifications — or want to learn how to study smarter using AI — this is your blueprint.
UiPath Automation Developer Associate 2025 Series - Career Office HoursDianaGray10
This event is being scheduled to check on your progress with your self-paced study curriculum. We will be here to answer any questions you have about the training and next steps for your career
You know you need to invest in a CRM platform, you just need to invest in the right one for your business.
It sounds easy enough but, with the onslaught of information out there, the decision-making process can be quite convoluted.
In a recent webinar we compared two options – HubSpot’s Sales Hub and Salesforce’s Sales Cloud – and explored ways to help you determine which CRM is better for your business.
A11y Webinar Series - Level Up Your Accessibility Game_ A11y Audit, WCAG, and...Julia Undeutsch
Are you ready to level up your accessibility knowledge? In this session, we’ll walk through my A11y audit template, learn how it’s helped me understand WCAG guidelines, and discover how you can use it to make impactful changes. I'll take a quick detour into how A11y can help you become active in open source, and how open source can help you improve your a11y skills.
Laura Wissiak will also join the session, and together we’ll dive deep into the POUR principles (Perceivable, Operable, Understandable, Robust) and discuss how to turn audit results into meaningful, actionable tickets that improve accessibility.
With her Pokédex of accessibility you will understand why aiming for AAA accessibility standards isn’t just a goal—it’s about striving for the best, just like in video games. Why play to just pass when you can master the game? Let’s elevate our accessibility efforts together!
Focus: A11y Audit, WCAG 2.1, POUR, Ticketing, Open Source
Target audience: Everyone (Project Managers, Designers, Developers, Testers, and Pokémon Lovers)
Beginners: Radio Frequency, Band and Spectrum (V3)3G4G
Welcome to this tutorial where we break down the complex topic of radio spectrum in a clear and accessible way.
In this video, we explore:
✅ What is spectrum, frequency, and bandwidth?
✅ How does wavelength affect antenna design?
✅ The difference between FDD and TDD
✅ 5G spectrum ranges – FR1 and FR2
✅ The role of mmWave, and why it's misunderstood
✅ What makes 5G Non-Standalone (NSA) different from 5G Standalone (SA)
✅ Concepts like Carrier Aggregation, Dual Connectivity, and Dynamic Spectrum Sharing (DSS)
✅ Why spectrum refarming is critical for modern mobile networks
✅ Evolution of antennas from legacy networks to Massive MIMO
Whether you're just getting started with wireless technology or brushing up on the latest in 5G and beyond, this video is designed to help you learn and stay up to date.
👍 Like the video if you find it helpful
🔔 Subscribe for more tutorials on 5G, 6G, and mobile technology
💬 Drop your questions or comments below—we’d love to hear from you!
All our #3G4G5G slides, videos, blogs and tutorials are available at:
Tutorials: https://www.3g4g.co.uk/Training/
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
Our channels:
3G4G Website – https://www.3g4g.co.uk/
The 3G4G Blog – https://blog.3g4g.co.uk/
Telecoms Infrastructure Blog – https://www.telecomsinfrastructure.com/
Operator Watch Blog – https://www.operatorwatch.com/
Connectivity Technology Blog – https://www.connectivity.technology/
Free 5G Training – https://www.free5gtraining.com/
Free 6G Training – https://www.free6gtraining.com/
Private Networks Technology Blog - https://blog.privatenetworks.technology/
Are you spending too much time pulling data, fixing AP delays, and manually processing reports in QuickBooks?
You’re not alone. Many finance teams hit a point where QuickBooks holds them back more than it helps.
The good news – there’s a better way.
Those who moved off QuickBooks instantly noticed how fast they can close the books, automate their cumbersome practices, and be able to create custom reports.
Join us for a 30-minute virtual Lunch & Learn where we’ll break down what it really means to outgrow QuickBooks, and how to take the next step with confidence.
During this session, you’ll learn:
The top signs it’s time to graduate from QuickBooks
Common challenges finance teams face and how modern ERPs solve them
Tips to evaluate and select a more comprehensive ERP system
QuickBooks vs. Cloud ERPs: A side-by-side look
Live Q&A to get all your questions answered
Ready to take the first step toward more automation, faster close, and better reporting?
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersLynda Kane
Slide Deck from Automation Dreamin'2022 presentation Sharing Some Gratitude with Your Users on creating a Flow to present a random statement of Gratitude to a User in Salesforce.
The History of Artificial Intelligence: From Ancient Ideas to Modern Algorithmsisoftreview8
The history of Artificial Intelligence: From Ancient Ideas to Modern Algorithms is a remarkable journey through time—one that blends human curiosity with technological breakthroughs. While the dream of intelligent machines dates back to ancient civilizations, it wasn’t until the 20th century that the idea began to take scientific shape.
In 1950, British mathematician Alan Turing introduced a revolutionary concept: that machines could imitate human thought. His creation of the "Turing Test" provided a framework for measuring machine intelligence. This milestone was one of the first major chapters in the history of Artificial Intelligence: From Ancient Ideas to Modern Algorithms.
By 1956, the term "Artificial Intelligence" had been officially coined during the Dartmouth Conference, igniting decades of innovation. From symbolic AI in the 1960s to expert systems in the 1980s, and the rise of machine learning and neural networks in the 1990s and 2000s, each era brought us closer to what we now recognize as modern AI. Technologies like deep learning, real-time automation, and natural language processing have turned AI into a powerful tool used in everyday life.
The ongoing evolution in the history of Artificial Intelligence: From Ancient Ideas to Modern Algorithms reveals how ancient visions are becoming today’s realities—and tomorrow’s possibilities.
1. The Evolution of Apache Kylin
Realtime & Plugin Architecture in Kylin 1.5
Li, Yang | 李扬
2. Agenda
What’s Apache Kylin?
New Features in Kylin 1.5
Plugin Architecture
Fast Cubing
Parallel Scan
Streaming Cubing
User Defined Aggregation
Summary
3. Extreme OLAP Engine for Big Data
Kylin is an open source Distributed Analytics Engine from eBay that
provides SQL interface and multi-dimensional analysis (OLAP) on
Hadoop supporting extremely large datasets
What’s Kylin
kylin / ˈkiːˈlɪn / 麒麟
--n. (in Chinese art) a mythical animal of composite form
• Open Sourced on Oct 1st, 2014
• Accepted as Apache Incubator Project on Nov 25th, 2014
9. Agenda
What’s Apache Kylin?
New Features in Kylin 1.5
Plugin Architecture
Fast Cubing
Parallel Scan
Streaming Cubing
User Defined Aggregation
Summary
10. Cube Builder (MapReduce…)
SQL
Low Latency -
SecondsRouting
3rd Party App
(Web App, Mobile…)
Metadata
SQL-Based Tool
(BI Tools: Tableau…)
Query Engine
Hadoop
Hive
REST API JDBC/ODBC
Online Analysis Data Flow
Offline Data Flow
Clients/Users interactive with
Kylin via SQL
OLAP Cube is transparent to
users
Star Schema Data Key Value Data
Data
Cube
OLAP
Cubes
(HBase)
SQL
REST ServerDataSource
Abstraction
Engine
Abstraction
Storage
Abstraction
Plugin Architecture Overview
14. Freedom
Zoo break, not bound to Hadoop any more
Free to go to a better engine or storage
Extensibility
Accept any input, e.g. Kafka
Embrace next-gen distributed platform, e.g. Spark
Flexibility
Choose different engine for different data set
The Freedom, Extensibility, Flexibility
15. Full Data
0-D Cuboid
1-D Cuboid
2-D Cuboid
3-D Cuboid
4-D Cuboid
MR
MR
MR
MR
MR
A,B,C,D
A,B,C A,B,D A,C,D B,C,D
Layered Cubing (MR Engine V1)
Pros
Simple implementation, depends
on MR shuffle to merge sort and
then aggregate
Little requirement on memory
Cons
Aggregation happens at reducer
side
Mapper outputs raw data thus
shuffle is huge
Multiple rounds of MR overhead
Shuffle can be 100x of cube size,
big I/O pressure
16. mapper mapper mapper
reducer
Fast Cubing
Pros
In-mem cubing algorithm that can
be reused by Streaming, Spark etc.
Mapper side aggregation
Lesser shuffling given the right data
split
One round MR
Cons
Code complexity
High mapper CPU/Mem
consumption
Data Split Data Split Data Split
……
Final Cube
Merge Sort
(Shuffle)
17. If data splits are unique
Fast cubing wins
If data splits are common
Layer cubing wins
New cube engine chooses
the right algorithm based on
data sampling.
Overall build time is 1.5x
faster, sum results from 500
jobs.
Fast Cubing (MR Engine V2)
18. Slow queries are 5-10x
faster.
New Hbase storage
enables partition on
cuboids that are big
enough.
Overall query time is 2x
faster than before, sum
results from 10,000+
queries.
Parallel Scan
Query
Cuboid A
Cuboid B
Query
A1 B1
A2 B2
A3 C
Cuboid C
Server 1
Server 2
Server 3
Server 1
Server 2
Server 3
20. Cube StorageReal-time In-Mem Store
streaming Kafka
SQL Query
minute batch
Latest second
Inverted
Index
Hybrid Storage
Interface
Cube
Future Lambda Architecture for Realtime
21. Use Case: SEO Operational Dashboard
eBay Site
ebay.com, ebay.co.uk, ebay.de
Buyer Country
US, CN, RU
Search Engine
Google, Bing, Yahoo!
Referrer
google.com, google.co.uk
Page
Search, View Item, Product
User Experience
Desktop, Mobile APP, mWeb
• Visits, GMB $, GMB share,
conversion rate, bounce rate, # of
view items, # of bought items etc.
Dimensions
Measurements
22. HyperLogLog Count Distinct
TopN
BitMap Precise Count Distinct
from Sun, Yerui (netease.com)
Raw Records
from Wang, Xiaoyu (jd.com)
Domain specific aggregations now become easy
aggregate user events to detect time serials or access patterns
draw a sketch of certain user groups
pre-calculate clusters of data points
histogram…
User Defined Aggregation Types
23. DT,LOC TopN
2015-10-1,CN Item A, $500
Item B, $300
…
TopN Support
select dt, loc, item, sum(gmv)
from test_kylin_fact
where dt=‘2015-10-1’ and loc=‘CN’
group by dt, loc, item
order by 4 desc
limit 100 cube pre-calculation
TopN as a measure
Approximate algorithm
SpaceSaving TopN
Ahmed Metwally, et al. “Efficient computation of frequent and top-k elements in data streams”.
Proceeding ICDT'05 Proceedings of the 10th international conference on Database Theory, 2005.
A parallel version
Massimo Cafaro, et al. “A parallel space saving algorithm for frequent items and the Hurwitz zeta
distribution”. Proceeding arXiv: 1401.0702v12 [cs.DS] 19 Setp 2015.
Answer TopN queries directly from pre-calculation
24. Works with Tableau 9.1
Works with MS Excel
Works with MS Power BI
ODBC Enhancement
26. Agenda
What’s Apache Kylin?
New Features in Kylin 1.5
Plugin Architecture
Fast Cubing
Parallel Scan
Streaming Cubing
User Defined Aggregation
Summary
27. New in Apache Kylin 1.5
Plugin-able architecture
New MR Cube Engine with fast cubing (1.5x faster)
New HBase Storage with parallel scan (2x faster)
Near real-time analysis (experimental)
User defined aggregations
Excel / PowerBI / Zeppelin integration
Summary
#13: A High Level Architecture for Kylin which is a Standard MOLAP Architecture built on Hadoop.
Data Sources to build your MOLAP Cubes primarily Hive, We have a fantastic project in the works for a Storage Abstraction Layer and support other NoSQL Stores such as Cassandra/CouchBase.
An Engine Abstraction which maintains the Cube Metadata and a Cube Builder. Today a set of Map Reduce Jobs to build the cubes.
A storage layer to store the Cubes in Hbase, primarily through a Bulk Load of the aggregrates into Hbase.
We are looking for active community participation to build out additional Data Source, Engine and Storage plugins into Kylin.
A Query Engine that directly index into the multi-dimensional arrays built into Hbase.