CdapIO provides I/O transforms for CDAP plugins.
CDAP is an application platform for building and managing data applications in hybrid and multi-cloud environments. It enables developers, business analysts, and data scientists to use a visual rapid development environment and utilize common patterns, data, and application abstractions to accelerate the development of data applications, addressing a broader range of real-time and batch use cases.
CDAP plugins types:
- Batch source
- Batch sink
- Streaming source
To learn more about CDAP plugins please see io.cdap.cdap.api.annotation.Plugin and Data Integrations plugins repository.
CdapIO supports CDAP Batch plugins based on Hadoop InputFormat and OutputFormat. CDAP batch plugins support is implemented using HadoopFormatIO.
CdapIO currently supports the following CDAP Batch plugins by referencing CDAP plugin
class:
- Hubspot Batch Source
- Hubspot Batch Sink
- Salesforce Batch Source
- Salesforce Batch Sink
- ServiceNow Batch Source
- Zendesk Batch Source
It means that all these plugins can be used like this:
CdapIO.withCdapPluginClass(HubspotBatchSource.class)
CDAP Batch plugin should be based on HadoopFormat
implementation.
To add CdapIO support for a new CDAP Batch Plugin perform the following steps:
- Find CDAP plugin artifacts in the Maven Central repository. Example: Hubspot plugin Maven repository. Note: To add a custom CDAP plugin, please follow Sonatype publishing guidelines.
- Add the CDAP plugin Maven dependency to the
build.gradle
file. Example:implementation "io.cdap:hubspot-plugins:1.0.0"
. - Here are two ways of using CDAP batch plugin with CdapIO:
- Using
Plugin.createBatch()
method. Pass Cdap Plugin class and correctInputFormat
(orOutputFormat
) andInputFormatProvider
(orOutputFormatProvider
) classes to CdapIO. Example:
CdapIO.withCdapPlugin( Plugin.createBatch( EmployeeBatchSource.class, EmployeeInputFormat.class, EmployeeInputFormatProvider.class));
- Using
MappingUtils
.- Navigate to MappingUtils class.
- Modify
getPluginClassByName()
method: - Add the code for mapping Cdap Plugin class name and
Input/Output Format
andFormatProvider
classes. Example:
if (pluginClass.equals(EmployeeBatchSource.class)){ return Plugin.createBatch(pluginClass, EmployeeInputFormat.class, EmployeeInputFormatProvider.class); }
- After these steps you will be able to use Cdap Plugin by class name like this:
CdapIO.withCdapPluginClass(EmployeeBatchSource.class)
- Using
To learn more, please check out complete examples.
CdapIO supports CDAP Streaming plugins based on Apache Spark Receiver. CDAP streaming plugins support is implemented using SparkReceiverIO.
- CDAP Streaming plugin should be based on
Spark Receiver
. - CDAP Streaming plugin should support work with offsets.
- Corresponding Spark Receiver should implement HasOffset interface.
- Records should have the numeric field that represents record offset. Example:
RecordId
field for Salesforce andvid
field for Hubspot plugins. For more details please see GetOffsetUtils class from examples.
To add CdapIO support for a new CDAP Streaming SparkReceiver Plugin, perform the following steps:
- Find CDAP plugin artifacts in the Maven Central repository. Example: Hubspot plugin Maven repository. Note: To add a custom CDAP plugin, please follow Sonatype publishing guidelines.
- Add CDAP plugin Maven dependency to the
build.gradle
file. Example:implementation "io.cdap:hubspot-plugins:1.0.0"
. - Implement function that will define how to get
Long offset
from the record of the Cdap Plugin. Example: see GetOffsetUtils class from examples. - Here are two ways of using Cdap streaming Plugin with CdapIO:
- Using
Plugin.createStreaming()
method. Pass Cdap Plugin class, correctgetOffsetFn
(from step 3) and SparkReceiver
class to CdapIO. Example:
CdapIO.withCdapPlugin( Plugin.createStreaming( HubspotStreamingSource.class, offsetFnForHubspot, HubspotReceiver.class)));
- Using
MappingUtils
.- Navigate to MappingUtils class.
- Modify
getPluginClassByName()
method: - Add the code for mapping Cdap Plugin class name,
getOffsetFn
function and SparkReceiver
class. Example:
if (pluginClass.equals(HubspotStreamingSource.class)){ return Plugin.createStreaming(pluginClass, getOffsetFnForHubpot(), HubspotReceiverClass.class); }
- After these steps you will be able to use Cdap Plugin by class name like this:
CdapIO.withCdapPluginClass(HubspotStreamingSource.class)
- Using
To learn more, please check out complete examples.
To use CdapIO please add a dependency on beam-sdks-java-io-cdap
.
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-cdap</artifactId>
<version>...</version>
</dependency>
The documentation and usage examples are maintained in JavaDoc for CdapIO.java.