ContentConnector Guide

Overview
File Processing
Upload Products
Create the DataSource Database
ContentConnector Download
ContentConnector Configuration
External Datasets
Secure FTP
ContentConnector Operation
Troubleshooting
Log files
Database Connection
Database Verification
All files in Schema
Missed Increments
Full Download

Overview

The ContentConnector will dramatically reduce the complexity of database creation, data retrieval, and database maintenance processes. The products upload process is also automated for those customers that request content on specific products and utilize CNET’s product mapping services.

In order to use the ContentConnector, the server that will host the application will need the following environment:

  1. Enough disk space for your entire uncompressed catalog
  2. Java Runtime Environment (JRE)

The ContentConnector requires JRE version 1.8.

  1. Database Server Application

The ContentConnector currently supports most common installations: 

  • SQL Server on Windows
  • MySQL on Windows and Linux
  • Oracle on Windows and Linux
  • PostgreSQL on Windows and Linux
  • Microsoft Azure (using SQL Server Settings)
  • MariaDB (using MySQL Settings)

File Processing

It is recommended to schedule the ContentConnector to run at least twice a day at different times as the download.zip file delivery times vary for each distribution cycle . The ContentConnector automatically detects new data, and therefore only will perform the database maintenance and any related tasks as necessary.


Upload Products

Customers who maintain a defined catalog and will be requesting specific SKUs from CNET (a.k.a. DataSource Pull Mode), are required to send specific product information to CNET in the form of Manufacturer (MFC) and Product (PRC) files.

The ContentConnector automatically generates and transfers these files to the customer’s FTP account using a table called cds_sku_request, which is structured as follows:

Ref Column Name Type Description
1 CustomerPartNumber VARCHAR(40) Customer part number
2 GTIN VARCHAR(128) UPC/EAN code (optional)
3 ManufacturerCode VARCHAR(40) Customer manufacturer code
4 ManufacturerName VARCHAR(255) Customer manufacturer name
5 ManufacturerPartNumber VARCHAR(40) Manufacturer part number
6 ProductDescription VARCHAR(255) Product description
7 IsActive BIT SKU status (1=active; 0=inactive)
8 RequestStatus BIT SKU request status (0=not yet requested; -1=requested)
9 FirstInserted DATETIME Date/time SKU was first inserted
10 Checksum VARCHAR(32) A hash code used to detect changes to columns 2-6
11 FirstRequestDate DATETIME Date/time SKU was first requested
12 LastRequestDate DATETIME Date/time SKU was last requested

The cds_sku_request table is only generated if the UploadProducts option is enabled in ContentConnector.xml, and the application is run in CREATE mode. This option is disabled by default.

The customer is responsible for inserting new records and updating existing records, and including values in fields 1-9. RequestStatus should be set to 0. FirstInserted should have the system timestamp. All other fields are managed by the ContentConnector. The ContentConnector automatically detects new records and changes to existing records using the Checksum column, and then generates and transfers the PRC/MFC file set as necessary. By default, the PRC/MFC file set extension starts with “001” and increments to “999”, before resetting to “001”. The extension information itself is persisted in the .prcmfcCounter text file and can be manipulated if necessary. This file is automatically generated the first time the ContentConnector generates a PRC/MFC file set.

The PRC/MFC files are stored locally in the directory specified in the LocalUploadPath setting (ContentConnector.xml). By default, this value is set to "data/DataSource/Upload".

The ContentConnector will generate and upload a MFC/PRC file set when running with IMPORT command or legacy FULL or INCREMENT commands (not with IMPORTNODOWNLOAD, FULLNODOWNLOAD or INCREMENTNODOWNLOAD commands). Uploads can be done explicitly by calling the ContentConnector in UPLOAD mode.

 

Create the DataSource Database

Create a blank database on the server, which will host the DataSource data. The database will contain both the repository as well as staging tables. The following examples reference a database named “DataSource”.

Once the database is created you must create a user for the DataSource database or add access privileges for an existing user. The database user will need table creation, read, write, and delete privileges. Depending on the database, the user may also need special privileges to truncate tables and perform bulk inserts.

 

ContentConnector Download

  1. Download the latest ContentConnector version.
  2. Uncompress the file using either WinZip (Windows) or Zip (Linux) to any directory. By default, all configuration files use relative paths, so any directory is acceptable. Once extracted, the directory structure should look similar to the following:

connector_files_structure

 

ContentConnector Configuration

For the DataSource service, the ContentConnector requires two configuration files: ConnectConnector.xml and DataSource\DataSource-DATABASE.xml.

The DataSource-DATABASE.xml file contains platform dependent SQL instructions:

  • DataSource-SQLServer.xml
    • Use this option if using the Microsoft Azure Platform
  • DataSource-MySQL.xml
    • Use this option if using the MariaDB Platform
  • DataSource-Oracle.xml
  • DataSource-PostgreSQL.xml

The configuration/ContentConnector.xml file contains the setup information for the ContentConnector. Locate and edit the ContentConnector.xml file and make the following edits to align the behavior of the application to your needs:

 

1. ContentConnector Options:

Option

Option Overview

Comments

Related Tables

All Services

KeepArchivedZipsFor days="30" Delete archived files 'download.zip' after they have been kept for the specified amount of time If the tag is not present (or the value of the attribute 'days' is less than 0) the archived downloads will be kept.  
IndexCompression type="ROW" Sets data compression type ('ROW', 'PAGE', or 'NONE') for all indexes that do not have their own compressionType attribute Only supported by SQL Sever 2008+ Enterprise Edition or Developer Edition  
TableCompression type="PAGE" Sets data compression type ('ROW', 'PAGE', or 'NONE') for all tables that do not have their own compressionType attribute Only supported by SQL Sever 2008+ Enterprise Edition or Developer Edition  
UseTruncate TRUNCATE will be used instead of DELETE to clear tables TRUNCATE is faster, takes less transaction log space, but breaks replication  

UpdateStatisticsOnBulkInsert

Update table statistics after a bulk insert

Can cleanup table structures after a bulk insert

 

ConnectorStatus

This option is used for providing the status of each ContentConnector run

Keep enabled

cds_ConnectorStatus
cds_ConnectorStatusCode

JVMProps

Display JVM properties in the log file

For debugging purposes

 

DisplayInsertRecordCount

Display the number of records that were bulk inserted in the log file

For debugging purposes. This causes an extra select.

 

NoMySQLCharacterSetSupport

Do not send character set property to MySQL

Certain older versions of MySQL did not support character sets for bulk inserts for multiple languages

 

DisableTabLock

Do not use TABLOCK during SQL Server bulk inserts

This can prevent table locking, but also slows down inserts

 

OracleUseUnixNewLine

Use Unix new lines for Oracle bulk inserts

Can be necessary if bulk inserts fail because of lines

 

DataSource Options

KeepACK

Triggers the delivery of the new increment download.zip file automatically once the current file is processed successfully by the ContentConnector

Enable only if the delivery of the new increment file should not be triggered automatically. It can be done manually by deleting the dataout.trxt file from the ack folder on your account's FTP.

 

SearchableAttributes

Loads searchable attributes either the full set or Product Line and Model only in case the full set is not a part of the contract

Keep enabled

cds_Atr

NewUNSPSC

Loads UNSPSC data

Enable only when your account is set up to receive this data.

cds_UNSPSC_Versioned_Commodities
cds_UNSPSC_Versioned_Links
cds_UNSPSC_Versions

UNSPSC

Legacy option for UNSPSC data.

Keep disabled

cds_UNSPSC
cds_UNSPSCCommodity
cds_UNSPSCVersion

Metamap

Loads the data which defines a link between Customer SKU and CNET ProdID as well as UPC codes and Distributors' Part Numbers when the account is set up to receive these options

Keep enabled

cds_Metamap
cds_Distivoc

NVARCHAR This option should be set for SQLServer UTF8/UTF16 encoding for double bytes support Enable when needed  

RelatedProducts

Legacy option for ContentConnector 2.2.3.4 or later.

Loads Related Products data (RP Links and/or RP Rules)

Enable only when your account is set up to receive Related Products Links and/or Related Products Rules

cds_Acc_Updates

RelatedProductsLinks

Loads Related Products Links, i.e. a link between an Accessory SKU and a Parent SKU

Enable only when your account is set up to receive OEM Related Products

cds_Acc_Links

RelatedProductsRules

Loads Related Products Rules, i.e. a link between an Accessory SKU and a Parent Product Line and Model

Enable only when your account is set up to receive Compatibility Rules

cds_AccCompat
cds_AccCompat_Complement_xx

DigitalContent

Loads digital content. The list of the digital content options is available in the config file in the MediaType section

Enable only when your account is set up to receive this data.

cds_DigContent
cds_DigContent_Lang_Links
cds_DigContent_Langs
cds_DigContent_Links
cds_DigContent_Media_Types
cds_DigContent_Prod
cds_DigContent_Region_Links
cds_DigContent_Regions

DigitalContentMeta

Loads Images Meta Attributes data, i.e. Image Size, Image Type, Image Angle, etc.

Enable only when your account is set up to receive this data.

cds_DigContent_Meta
cds_DigContent_Meta_AtrVoc
cds_DigContent_Meta_ValVoc

DigitalContentText

Loads digital content text; an alternative delivery method for all text components existing in Digital Content, including:

  • Marketing Description
  • Key Selling Points (KSP)
  • Product Features
  • What’s In The Box
Enable only when your account is set up to receive this data. cds_DigContent_Text

DownloadDigitalContent

This option is used for downloading the digital content.

Enable if you would like the digital content to be downloaded. You may choose to disable this option if you would like to use the direct links.

 

ProcessThumbnails

This option is used for thumbnail generation. Thumbnails are defined in the config file in the Thumbnails section.
By default, it is configured to create a 75x75 image from the 200x150 CCS image.

Disable if the image reducing to thumbnails is not needed

 

MainSpecs

Loads Main Specifications

Enable only when your account is set up to receive this data.

cds_Mspecxx
cds_Mvocxx

ExtendedSpecs

Loads Extended Specifications

Enable only when your account is set up to receive this data.

cds_Especxx
cds_Evocxx

MarketingText

This is a legacy option for when the marketing text was delivered not in digital content

Keep disabled

cds_Mktxx

MarketingTextDigContent

Loads the marketing text which is delivered in digital content

Keep enabled

The digital content tables

Atomic

Loads Atomic Data

Keep disabled

cds_Atr_full
cds_Vocxx_full

CatalogInfo

Loads the Catalog information which includes all SKUs with their current status, last delivery date, etc.

Keep enabled

cds_Catalog

Lifecycle

Loads Products Lifecycle data (Mfg Release Date, Mfg Discontinue Date)

Keep enabled

cds_Lifecycle

Categorization

Loads Alternative Categorization

Keep enabled

cds_Cct_Categories
cds_Languages
cds_Cct_Products
cds_Cct_Version
cds_Cct_Vocabulary

CheckMissedIncrement

This option is used for checking that there is no missing data. If any data is missing the ContentConnector will exit with an  error "An increment has been missed"

Keep enabled

 

CheckIsFullDownload

Legacy option for ContentConnector 2.2.3.4 or later.

This option is for checking if the delivered file is full when the ContentConnector runs in increment mode. If the Connector detects that it is a full file then it switches automatically to processing it in full mode

Keep enabled

 

SecureFTP

This option is for downloading  the files via SFTP (FTP over SSH)

Enable if SFTP is used

 

External:AlternativeSearchableAttributes

This option is used for loading the external files with Alternative Searchable Attributes

Enable if your account is set up with the full set of searchable attributes and if would like to get Alternative Searchable Attributes

cds_Asa_Cat
cds_Asa_Cat_Alt
cds_Asa_Cat_Alt_Updates
cds_Asa_Cat_Updates
cds_Asa_Metric
cds_Asa_Ranges_Imperial
cds_Asa_Ranges_Metric
cds_Asa_Voc

External:AttributeDisplayOrder

Loads a file that contains the logical ordering of searchable attributes, including those that are not part of the alternative set

Enable if needed along with the AlternativeSearchableAttributes option

cds_Atr_Logical_Order

External:LanguageMapping

Loads a file that maps CNET language codes to ISO codes

Enable if needed

cds_ISO_Languages

UploadProducts

This option is used for uploading the products

Enable only when your account is set up for requesting the SKUs and downloading from the same account

 

StageOnly

Only load the staging tables for an increment.

Does not apply the update policies. Production tables will be empty.

 

KeepStage

Do not truncate the staging tables at the end of the Connector run.

Staging tables will still be truncated at the beginning.

 
PIMFeed Loads PIM feed Enable only when your account is set up to receive this data.

cds_Pim_Atr
cds_Pim_Atr_Composable
cds_Pim_Atr_Voc
cds_Pim_Atr_Standard_Units
cds_Pim_Atr_Order
cds_Pim_Atr_Model

 

2. Language Settings

Uncomment the lines containing the required languages.

For example, if your DataSource account is set up to receive the data in English language with the Imperial Units of measure, the following language parameter would need to be uncommented:

<Language CNETFileCode="ez" ISOCode="" TableCode="ez" Create="True" Load="True"/>

If your account is set up with English and French languages with Metrics units of measure then the following parameters would need to be enabled:

<Language CNETFileCode="fr" ISOCode="" TableCode="fr" Create="True" Load="True"/>
<Language CNETFileCode="en" ISOCode="" TableCode="en" Create="True" Load="True"/>

 

3. Data Exchanges

Locate the element <DataExchanges> in the configuration file and uncomment the block that corresponds to your database and character encoding.

If your account is set up with SQL Server and Windows encoding, and your code page is other than 1252, add the Encoding attribute with the corresponding value. You may find the code per language in the Encoding/Codepage Documentation section in the configuration file.

For example for Japanese language:

<DataExchanges ServiceName="DataSource" LocalDownloadPath="data/DataSource/zipped/download.zip"
  LocalUnzippedPath="data/DataSource/unzipped" ImageRepository="data/DataSource/digitalcontent"
  LocalUploadPath="data/DataSource/Upload"
  ThumbnailRepository="data/DataSource/digitalcontent" CNETFTPMinutesBetweenAttempts="5" CNETFTPMaxAttempts="3" Encoding="932" > <!-- Not UTF -->
  ...
  ...
</DataExchanges>

Please customize the following:

Attribute Replace with
LocalDownloadPath The location where you would like the files to be downloaded to (example: C:\DataSource\data\zipped\)
LocalUnzippedPath The location where you would like the files to be extracted/uncompressed to (example: C:\DataSource\data\unziped for example)
LocalUploadPath The local directory where PRC/MFC files will be stored before being transferred to CNET. This setting is intended for customers who request specific SKUs from CNET (i.e. “PULL mode” subscribers).
ImageRepository The location where you would like the images to be copied to. This can be a folder that is directly visible by your web-application through a virtual directory. (example: C:\DataSource\data\images)
ThumbnailRepository The location where any thumbnails will be stored. Often, this is the same as ImageRepository.
<CNET …Login Your DataSource Account FTP username.
<CNET … Password Your DataSource Account FTP password.

 

4. Digital Content

The DigitalContent section provides all available digital content types.

        <DigitalContent Download="NewOrChangedSinceLastRun" NbThreads="10" ConnectionAttempts="20">
            <MediaTypeRestriction>
                <MediaType ID="1" Directory="JPG_200x150" Description="Standard image (200x150)">
                    <Thumbnails>
                        <Thumbnail ID="1"/>
                    </Thumbnails>
                </MediaType>
                <MediaType ID="2" Directory="JPG_400x300" Description="Large image (400x300)"/>
                <MediaType ID="3" Directory="SSA" Description="Multi-angle images"/>
                <MediaType ID="4" Directory="MARKETING_TEXT" Description="Localized marketing text"/>
                <MediaType ID="5" Directory="KEY_SELLING_POINTS" Description="Key selling points"/>
                <MediaType ID="10" Directory="WHATS_IN_THE_BOX" Description="What's in the Box"/>
                <MediaType ID="11" Directory="PRODUCT_DATA_SHEET" Description="Product data sheet"/>
                <MediaType ID="12" Directory="USER_MANUAL" Description="User manual"/>
                <MediaType ID="13" Directory="QUICK_START_GUID" Description="Quick start guide"/>
                <MediaType ID="14" Directory="PRODUCT_FEATURES" Description="Product features"/>
                <MediaType ID="15" Directory="CCS" Description="CCS Product Images">
                    <Thumbnails>
                        <Thumbnail ID="1" Width="200" Height="150"/>
                    </Thumbnails>
                </MediaType>
                <MediaType ID="17" Directory="MANUFACTURER_LOGOS" Description="Manufacturer Logos"/>
            </MediaTypeRestriction>
            <MimeTypeRestriction>
                <MimeType MimeID="image/gif" Extension="gif"/>
                <MimeType MimeID="image/jpeg" Extension="jpg"/>
                <MimeType MimeID="image/x-photoshop" Extension="psd"/>
                <MimeType MimeID="application/x-photoshop" Extension="psd"/>
                <MimeType MimeID="application/xml" Extension="xml"/>
                <MimeType MimeID="application/pdf" Extension="pdf"/>
            </MimeTypeRestriction>
        </DigitalContent>
        <Thumbnails>
            <Thumbnail ID="1" Directory="JPG_75x75" Width="75" Height="75"/>
        </Thumbnails>

The MediaTypeRestriction lists the media types that will be downloaded. MediaType ID matches the MediaTypeID in the database.

The Directory is where each media type is stored and it is going to be created in \data\DataSource\digitalcontent folder once the ContentConnector runs. Each directory is relative to the ImageRepository directory.

ContentConnector also supports additional thumbnail generation. The thumbnails the size 75x75 is provided for all images as part of the CCS Product Image media type.

The additional other than 75x75 size thumbnails can be generated by including Thumbnails section within the original media type. Multiple thumbnail sections can be defined as well. Each Thumbnails section will reference an actual thumbnail size and directory by ID. By default, Thumbnails section is configured to create a 75x75 image from the 200x150 image (as shown above).

MimeTypeRestriction is similar to MediaTypeRestriction. It limits the mime types the ContentConnector will download.

 

5. Database Location

The Schema element references the database-specific configuration file located in the configuration/DataSource directory. Uncomment the schema that corresponds to your database and comment out all other schemas as necessary.

        <!-- Enable one of the following options -->
        <Schema definition="configuration/DataSource/DataSource-SQLServer.xml"/>
        <!--Schema definition="configuration/DataSource/DataSource-MySQL.xml"/-->
        <!--Schema definition="configuration/DataSource/DataSource-Oracle.xml"/-->
        <!--Schema definition="configuration/DataSource/DataSource-PostGreSql.xml"/-->

Locate the ServerURL field. Depending on your database, you may need to specify your server address, port number, and database name. Replace all fields that are in all capital letters with the information for your own environment.

If you are running multiple instances of SQL Server, your server address may be something like “MYSERVER\MYINSTANCE”.

Next, locate the ProductDataDatabase field. Replace DATABASE with the name of your database.

The stagingTablePrefix attribute tells the application how to prefix the tables that will store increment data. Those tables are then synchronized with the main tables (which have no prefix). You may change this entry to suit your naming conventions.

 

6. Database Credentials

Locate the Login section and enter the user ID and password for a user with table creation/deletion/update rights on the database created.

 

7. Validation (Oracle only)

In order to speed up the update in Oracle, there is an option to turn validation off. When constraints are added or enabled, this option will tell Oracle not to verify the existing data. However, any future data operations will be affected by the constraints. To enable this option, add the line <NoValidate>true</NoValidate> just after the Loader element in the Databases section.

 

8. Monitoring

The ContentConnector uses a library called Log4j which allows for application events to be distributed in several ways (email alerts, HTML report, Windows event log, web-service, database etc….) and for a configurable set of levels (FATAL, ERROR, WARN, INFO,DEBUG). The monitoring configuration file, which can be found at logs/log4j-DataSource.properties, specifies where the notifications should be sent and establishes minimal levels for the notifications events.

The defaults are as follows:

  • A detailed (debugging level and above) log of all events goes to an text report
  • A report (info level and above) goes into an HTML file

You may change those settings to suit your particular logging needs and in any case will need to modify log4j-DataSource.properties to reference local file paths.

 

9. Optional: Network Proxy Configuration

When running the ContentConnector through a proxy server, you will need to configure extra optional settings in ContentConnector.xml.

Insert the following text at the very end of the <DataExchanges> section and modify to fit your proxy definition:

<Proxy Host="proxy.mycompany.com” Port=”3000” Username=”” Password=””/>

 

10. Remote Loading

If ContentConnector and the database are located on different servers follow the recommendations below.

MySQL, SQLServer, and Oracle support loading the database from a remote workstation or server.

MySQL

1.      Loader should be Local (local refers to client file).

<!-- MySQL -->
Databases Type="MySQL4.0CompatibilityMode">
    <Driver>com.mysql.jdbc.Driver</Driver>
    <ServerURL>jdbc:mysql://testserver:11533/ConnectorTest</ServerURL>
    <ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
    <Login User="xxx" Password="xxx"/>
    <TableType>MyISAM</TableType>
    <Loader>Local</Loader>
</Databases>

SQLServer

1.      Loader should be BCP

2.      BCP utility must exist on client machine

3.      JDBC string must contain hostname, port and database name

<Databases Type="SQLServer2000">
    <Driver>com.microsoft.sqlserver.jdbc.SQLServerDriver</Driver>
    <ServerURL>jdbc:sqlserver://testserver:1433;database=ConnectorTest</ServerURL>
    <ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
    <Login User="connector" Password="connector2"/>
    <Loader>BCP</Loader>
</Databases>

4.      Local user must be created on a server machine

5.      SQL Server Services - SQL Server Browser must be running

6.      SQL Server Network Configuration - Protocols - TCP/IP - IP Addresses - TCP Port should be static (1433 for example).

Oracle

1.      SQLLDR utility must exist on client machine

2.      tnsnames.ora must contain valid remote service string with ADDRESS and SID correctly configured to remote database

3.      ProductDataDatabase element must be configured to use tnsnames.ora service entry

<Databases Type="Oracle9i">      
    <Driver>oracle.jdbc.driver.OracleDriver</Driver>
    <ServerURL>jdbc:oracle:thin:@remotehost:1521:ConnectorTest</ServerURL>
    <ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
    <Login User="ConnectorTest" Password="xxx"/>     
    <Loader>Native</Loader>
    <NoValidate>true</NoValidate>
</Databases>

ConnectorTest =
 (DESCRIPTION =
   (ADDRESS_LIST =
     (ADDRESS = (PROTOCOL = TCP)(HOST = remotehost)(PORT = 1521))
   )
 (CONNECT_DATA =
   (SERVICE_NAME = ConnectorTest)
 )
)

To test tnsnames.ora, log into sqlplus using sid. It should work so the contentConnector could work as well.

sqlplus ConnectorTest@XE

 

External Datasets

The ContentConnector has the ability to load external files during the maintenance process. "External files" are data files that are not delivered within download.zip file. They can originate from CNET or your own organization.

Preset External Datasets

Preset External DataSets are defined and can be activated in the configuration/ContentConnector.xml file.The following External Datasets are a part of the standard release but not activated by default:

 

Option to activate

Description

External:AlternativeSearchableAttributes

Loads the alternative searchable attributes tables (cds_asa*).

External:AttributeDisplayOrder

Loads a file that contains the logical ordering of searchable attributes, including those that are not part of the alternative set (cds_Atr_Logical_Order table).

External:LanguageMapping

Loads a file that maps CNET language codes to ISO codes (cds_ISO_Languages table).

Defining a new External Dataset in DataSource-XXX.xml

Three steps need to be taken to use external datasets:

  1. Define the location and protocol for downloading the external file
  2. Define tables that will be loaded with the external dataset data
  3. Define maintenance policies that will sync those tables between runs

Define the location and protocol

In the configuration/ContentConnector.xml file you will notice an <ExternalDataSet> tag with <ExternalDataSetFile> children for preset datasets. You will need to define a new <ExternalDataSetFile> to load your data.

The example below illustrates how to load CNET's Alternative Searchable Attributes:

<ExternalDataSet>

  <ExternalDataSetFile

     description="CNET Alternative attributes"

     remoteFileName="cds_asa.zip"

     compressed="true"

     localFolder="cds_asa"

     tiedToOption="External:AlternativeSearchableAttributes">

   <ExternalDataSetFileHTTP  HTTPServerURL=  "http://www.cnetcontentsolutions.com/ support/datasource/asa/cds_asa.zip">

   </ExternalDataSetFileHTTP>

  </ExternalDataSetFile>

</ExternalDataSet>

 

ExternalDataSetFile definition

description

The dataset name as it will appear in the connector logs

remoteFileName

Name of the remote file - this will also be the name of the local file

compressed

"true" if the file needs to be unzipped after download

"false" otherwise

Note that you will need to set compressed to true if you intend to download a zip file that itself contains multiple files

localFolder

Location where the file will be downloaded locally. This is relative to the location defined in ContentConnector.xml by the <DataExchanges> LocalDownloadPath attribute.

tiedToOption

This external dataset will be active only if the according option is set in ContentConnector.xml, for example

<Option>External:AlternativeSearchableAttributes</Option>

<ExternalDataSetFileHTTP>

If the remote file is accessible via a public HTTP URL, use this tag and populate the attribute described below

HTTPServerURL

Public URL from where the file can be downloaded

<ExternalDataSetFileFTP>

If the remote file is accessible via FTP or SFTP, use this tag and populate the attributes described below

FTPServer

Server name or IP address

FTPLogin

User name

FTPPassword

Password

FTPRemoteFolder

Remote folder in which file resides

FTPSecure

"true" if SFTP access needs to be used to connect. Note that the remote server will need to be added to your SSH KnowHosts file

"false" if standard FTP access is to be used to connect

Create tables

Each file that needs to be loaded in the database needs to have a corresponding table created. Follow the syntax that is used to create classic DataSource tables to do this in DataSource-XXX.xml. Note that the fileName attribute needs to be prefixed with the "external/" folder name as this is the location under the LocalUnzippedPath, the value given in LocalFolder in the ExternalDataSet should follow.
that all external files will be placed/unzipped into. Use the same tiedToOption value that was defined in the ExternalDataSet.

Example:

<DBTable name="cds_Asa_Cat"

        localized="false"

        fileName="external/cds_asa/cds_Asa_Cat.txt"

        tiedToOption="External:AlternativeSearchableAttributes>

...

</DBTable>

Define maintenance policies

Maintenance policies are located at the end of the  DataSource-XXX.xml file. They define how data is synchronized between the staging tables and the main tables and end up being translated into SQL statements in all supported database dialects.

If the data that is loaded is always a full dataset (not incremental) then just add a policy that does a truncate/insert all between staging and main:

<UpdatePolicyMethod tableName="cds_Asa_Cat" methodName="truncateThenCopy"

                   description="Full reload of cds_Asa_Cat table ">

  <UpdatePolicyArg name="DUMMY" value="REMOVE"/>

</UpdatePolicyMethod>

If more complex logic needs to be implemented, other policies exist but they are more complicated to use. You may want to consult CNET if you have the intention of using such policies.

 

updateExistingAndInsertNew

Updates records in main that also exist in stage and that have changed (changes concern all columns except the idenfiying colums), inserts records from stage that do not exist in main.

On some databases (Oracle, SQL Server), the SQL that is generated leverages the MERGE keyword, on others it will use comparisons between all column values to identify changes.

 

Specify the column(s) that are used to uniquely identify records:

<UpdatePolicyArg name="column" value="ID"/>

deleteExistingAndInsertNew

Deletes all records from main that are present in stage and re-inserts them from stage.

mergeTablesWithSwap

This method is functionally identical to deleteExistingAndInsertNew but it instead relies on table swaps to speed up the synchronization of record-heavy tables. Table swap operations create temporary copies of destination tables to perform only insert operations (no deletes, no updates) restricted to the data that needs to be kept. This data can come from the existing main table or the staging table.

After all inserts are performed, the swap table replaces the main table, replicating foreign keys and indexes in the process.

Since the original main table is dropped,  mergeTablesWithSwap cannot be used on tables that are part of materialized views.

 

Specify the column(s) that are used to uniquely identify records:

<UpdatePolicyArg name="column" value="ID"/>

insertNew

Inserts records in main that are only present in staging

 

Specify the column(s) that are used to uniquely identify records:

<UpdatePolicyArg name="column" value="ID"/>

replaceTableWithSwap

Performs a reload of the main table by using table swaps. This should be used when foreign keys point to the main table.

truncateThenCopy

Performs a reload of the main table by truncating and reloading it. This should be used when no foreign keys point to the main table.

deleteObsoleteContent

Deletes all content from the repository which we know will not be used.

The method has 4 arguments:

  1. deleteType:

    • "WhenInBoth" means that records from the repository with an existing key in staging are deleted.

    • "WhenOnlyInProd" means that only records from the repository which don't exist in the  staging are deleted

  2. column: the column from the repository's "tableName" table which will be used as a key

  3. stageTable: the table from the staging dataset which. when it doens't contains a record (keyed by "stageColumn"), lets us know that the content is obsolete

  4. stageColumn: the key column for stagetable

 

Example:

<UpdatePolicyMethod

             tableName="cds_Distivoc"

             methodName="deleteObsoleteContent"

             description="Remove stale metamapping contributor information">

  <UpdatePolicyArg name="deleteType" value="WhenOnlyInProd"/>

  <UpdatePolicyArg name="column" value="ID"/>

  <UpdatePolicyArg name="stageTable" value="cds_Distivoc"/>

  <UpdatePolicyArg name="stageColumn" value="ID"/>

</UpdatePolicyMethod>

deleteOrphanedContent

Deletes all content from the repository when records aren't being used by another table in the repository.

 

Example: remove all records from cds_DigContent_Regions that cds_DigContent_Region_Links aren't using anymore:

 

<UpdatePolicyMethod

            tableName="cds_DigContent_Regions"

            methodName="deleteOrphanedContent"

            description="Remove unused regions">

  <UpdatePolicyArg name="column" value="RegionCode"/>

  <UpdatePolicyArg name="repositoryTable" value="cds_DigContent_Region_Links"/>

  <UpdatePolicyArg name="repositoryColumn" value="RegionCode"/>

</UpdatePolicyMethod>

 

Finalizing changes

Once the three steps above have been completed, you will need to re-run the connector in create mode to get it to create the missing tables, including staging tables (prefixed with "in_").
It is also important to know that external datasets are fetched, even when running the connector in "nodownload" mode (fullnodownload or incrementnodownload keywords).  

Secure FTP

 

Since release 2.2.0.x, the ContentConnector has the ability to download files via SFTP (FTP over SSH). This protocol allows for encrypted transfers; note that it is different from FTPS which isn't supported.

To activate SFTP transfers, you will need to activate the option and point the connector to the CNET SFTP server instead of the regular FTP server. You will also need to register the server as being trusted '("Known Host") 

ContentConnector.xml changes

  1. Uncomment or add <Option>SecureFTP</Option>
  2. Modify the element DataExchanges/CNET's FTPURL to sftp.cnetcontentsolutions.com instead of the janus.cnetdata.com standard value.

Adding server to your trusted list

 

SFTP support is a standard Unix/Linux/OSX feature while it isn't part of Windows.

 

Unix/Linux/OSX

The ContentConnector will first look for the known_hosts file in the ContentConnector's main folder. If it can't be found, it will be searched for under the usual key register (~/.ssh/known_hosts). 

Adding the CNET server to your list of trusted SSH endpoints is easy: SFTP to the server and confirm that you trust it:

$sftp USERID@sftp.cnetcontentsolutions.com

The authenticity of host 'sftp.cnetcontentsolutions.com (40.79.74.37)' can't be established.

RSA key fingerprint is 73:8c:c8:f8:52:73:7e:c9:ec:6d:a6:14:63:62:50:75.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'sftp.cnetcontentsolutions.com,40.79.74.37' (RSA) to the list of known hosts.

 

Windows

You first need to install an SSH client - many exist and some such as SSHWindows are free (http://sshwindows.sourceforge.net). 

Once the SSH client is installed, create a known_hosts file under the ContentConnector main directory:
ssh-keyscan -t rsa sftp.cnetcontentsolutions.com > known_hosts 

 

Making sure SFTP is being used

 

The connector log files will show that SFTP is activated:

2013-11-18 17:51:26,542 INFO   {unknown} Connected to sftp.cnetcontentsolutions.com using secured FTP [FileTransfer] (FileTransferSFTP.java:144)

 

ContentConnector Operation

This section gives the syntax of the calls which can be issued to the ContentConnector. These commands are performed at the command prompt.

1. Basic Command Structure

The basic command structure for calling the ContentConnector is as follows:

java -jar contentConnector-2.2.3.x.jar <service> <configuration> <action>

Example:

java -jar contentConnector-2.2.3.4.jar datasource configuration\ContentConnector.xml create

2. Available Action Calls

import

New command for ContentConnector 2.2.3.4 and later.

  • Downloads download.zip file from FTP
  • Identifies the delivery type (full or increment) and processes accordingly.

Full file - truncates all existing tables and loads new data into the DataSource tables. 
Increment file - performs a series of deletes and inserts to update existing rows and insert new records. 

  • Upload products if applicable.
importnodownload

New command for ContentConnector 2.2.3.4 and later.

Same as “import” but uses local data instead of retrieving new data from FTP. A download.zip file may also be specified to extract data from a location other than the default download location.

create

Builds the table structure and relationships for DataSource

Note: The DataSource database must be manually created prior to running the create action.

full

Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'import' for ContentConnector 2.2.3.4 and later.

Downloads download.zip file from FTP. Truncates all existing tables and loads new data into the DataSource Tables. The upload action is also performed, if applicable.

fullnodownload

Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'importnodownload' for ContentConnector 2.2.3.4 and later.

Same as "full" but uses local data instead of retrieving new data from CNET. A download.zip file may also be specified to extract data from a location other the default download location.

increment

Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'import' for ContentConnector 2.2.3.4 and later.

Downloads download.zip file from FTP. Performs a series of delete and inserts to update existing rows and insert new records. (Note: This is the command used on a daily basis). The upload action is also performed, if applicable

incrementnodownload

Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'importnodownload' for ContentConnector 2.2.3.4 and later.

Same as "increment" but uses local data instead of retrieving new data from CNET. A download.zip file may also be specified to extract data from a location other the default download location.

digitalcontentonly Does not update the database. Only downloads the digital content files and creates thumbnails to what is currently in the database.
rebuildthumbnails Does not download any files or update the database. Only recreates thumbnails for existing digital content files to what is in the database.
upload Uploads MFC/PRC files to FTP. No other actions are performed.

NOTE: If you are running ContentConnector 2.2.3.3 and older, the full mode should be used on the initial load only and after that the ContentConnector should be scheduled to run in increment mode. If the full refresh is required the full file delivery should be requested. The ContentConnector may still run in increment mode on a full file as long as the option CheckIsFullDownload is enabled in the ContentConnector.xml file.

 

Troubleshooting

There are several options available to help troubleshoot ContentConnector. In the ContentConnector.xml file, uncomment the JVMProps option to print out a listing of the current Java environment in which the ContentConnector is executing.

In the Monitoring section of the ContentConnector.xml file, add the following attribute: “traceSQL=true”. This setting will print out all SQL that is generated by the ContentConnector. The SQL output will be placed in the scripts directory and will be organized by database and action.

In configuration\log4j.xml file replace <priority value="info"/> with <priority value="debug"/> With this option enabled ContentConnector will log additional debug information to logs\log.txt file.

 

Log files

The log files contain detailed and useful information. In order to ensure that the database connection is working properly and that the data is correct, ContentConnector performs several checks. If any of these checks fail, the ContentConnector will log the results and exit before the main tables are altered.

 

Database Connection

ContentConnector will make sure that the JDBC driver can be loaded. It will also make sure that the login information can be found. If a connection cannot be made to the database, it will log the error message and exit.

 

Database Verification

ContentConnector will print out the database product name, database product version, driver name, driver version, and catalog. It will make sure that the database type in the ContentConnector.xml file (e.g. SQLServer2000”) matches the database that ContentConnector has actually connected to. It will also make sure that the versions of the database and driver are compatible. In the case of MySQL, if the JDBC MySQL driver is prior to version 3.1, then it also makes sure that ISO-8859-1 is not being used.

 

All files in Schema

Before loading any data, ContentConnector will look at DataSource-xxx.xml and make sure that the files specified in the schema configuration match the files contained in the current download. If they do not match, ContentConnector will exit.

 

Missed Increments

In the ContentConnector.xml file, the CheckMissedIncrement option is turned on by default. This tells ContentConnector to load the data into staging and to then check the catalog to see if an increment has been missed. If an increment has been missed, ContentConnector will exit before updating the main tables.

 

Full Download

In the ContentConnector.xml file, the CheckFullDownload option is turned on by default. This tells ContentConnector to load the data into staging and to then check the data to see if it is a full dump of the catalog. The ContentConnector will automatically switch to full mode when full dumps are detected and perform a complete refresh of the main tables. Although full dumps can be processed in increment mode, full mode offers the best performance.