ContentConnector Guide
Overview
The ContentConnector will dramatically reduce the complexity of database creation, data retrieval, and database maintenance processes. The products upload process is also automated for those customers that request content on specific products and utilize CNET’s product mapping services.
In order to use the ContentConnector, the server that will host the application will need the following environment:
- Enough disk space for your entire uncompressed catalog
- Java Runtime Environment (JRE)
The ContentConnector requires JRE version 1.8.
- Database Server Application
The ContentConnector currently supports most common installations:
- SQL Server on Windows
- MySQL on Windows and Linux
- Oracle on Windows and Linux
- PostgreSQL on Windows and Linux
- Microsoft Azure (using SQL Server Settings)
- MariaDB (using MySQL Settings)
File Processing
It is recommended to schedule the ContentConnector to run at least twice a day at different times as the download.zip file delivery times vary for each distribution cycle . The ContentConnector automatically detects new data, and therefore only will perform the database maintenance and any related tasks as necessary.
Upload Products
Customers who maintain a defined catalog and will be requesting specific SKUs from CNET (a.k.a. DataSource Pull Mode), are required to send specific product information to CNET in the form of Manufacturer (MFC) and Product (PRC) files.
The ContentConnector automatically generates and transfers these files to the customer’s FTP account using a table called cds_sku_request, which is structured as follows:
Ref | Column Name | Type | Description |
1 | CustomerPartNumber | VARCHAR(40) | Customer part number |
2 | GTIN | VARCHAR(128) | UPC/EAN code (optional) |
3 | ManufacturerCode | VARCHAR(40) | Customer manufacturer code |
4 | ManufacturerName | VARCHAR(255) | Customer manufacturer name |
5 | ManufacturerPartNumber | VARCHAR(40) | Manufacturer part number |
6 | ProductDescription | VARCHAR(255) | Product description |
7 | IsActive | BIT | SKU status (1=active; 0=inactive) |
8 | RequestStatus | BIT | SKU request status (0=not yet requested; -1=requested) |
9 | FirstInserted | DATETIME | Date/time SKU was first inserted |
10 | Checksum | VARCHAR(32) | A hash code used to detect changes to columns 2-6 |
11 | FirstRequestDate | DATETIME | Date/time SKU was first requested |
12 | LastRequestDate | DATETIME | Date/time SKU was last requested |
The cds_sku_request table is only generated if the UploadProducts option is enabled in ContentConnector.xml, and the application is run in CREATE mode. This option is disabled by default.
The customer is responsible for inserting new records and updating existing records, and including values in fields 1-9. RequestStatus should be set to 0. FirstInserted should have the system timestamp. All other fields are managed by the ContentConnector. The ContentConnector automatically detects new records and changes to existing records using the Checksum column, and then generates and transfers the PRC/MFC file set as necessary. By default, the PRC/MFC file set extension starts with “001” and increments to “999”, before resetting to “001”. The extension information itself is persisted in the .prcmfcCounter text file and can be manipulated if necessary. This file is automatically generated the first time the ContentConnector generates a PRC/MFC file set.
The PRC/MFC files are stored locally in the directory specified in the LocalUploadPath setting (ContentConnector.xml). By default, this value is set to "data/DataSource/Upload".
The ContentConnector will generate and upload a MFC/PRC file set when running with IMPORT command or legacy FULL or INCREMENT commands (not with IMPORTNODOWNLOAD, FULLNODOWNLOAD or INCREMENTNODOWNLOAD commands). Uploads can be done explicitly by calling the ContentConnector in UPLOAD mode.
Create the DataSource Database
Create a blank database on the server, which will host the DataSource data. The database will contain both the repository as well as staging tables. The following examples reference a database named “DataSource”.
Once the database is created you must create a user for the DataSource database or add access privileges for an existing user. The database user will need table creation, read, write, and delete privileges. Depending on the database, the user may also need special privileges to truncate tables and perform bulk inserts.
ContentConnector Download
- Download the latest ContentConnector version.
- Uncompress the file using either WinZip (Windows) or Zip (Linux) to any directory. By default, all configuration files use relative paths, so any directory is acceptable. Once extracted, the directory structure should look similar to the following:
ContentConnector Configuration
For the DataSource service, the ContentConnector requires two configuration files: ConnectConnector.xml and DataSource\DataSource-DATABASE.xml.
The DataSource-DATABASE.xml file contains platform dependent SQL instructions:
- DataSource-SQLServer.xml
- Use this option if using the Microsoft Azure Platform
- DataSource-MySQL.xml
- Use this option if using the MariaDB Platform
- DataSource-Oracle.xml
- DataSource-PostgreSQL.xml
The configuration/ContentConnector.xml file contains the setup information for the ContentConnector. Locate and edit the ContentConnector.xml file and make the following edits to align the behavior of the application to your needs:
1. ContentConnector Options:
Option |
Option Overview |
Comments |
Related Tables |
All Services |
|||
KeepArchivedZipsFor days="30" | Delete archived files 'download.zip' after they have been kept for the specified amount of time | If the tag is not present (or the value of the attribute 'days' is less than 0) the archived downloads will be kept. | |
IndexCompression type="ROW" | Sets data compression type ('ROW', 'PAGE', or 'NONE') for all indexes that do not have their own compressionType attribute | Only supported by SQL Sever 2008+ Enterprise Edition or Developer Edition | |
TableCompression type="PAGE" | Sets data compression type ('ROW', 'PAGE', or 'NONE') for all tables that do not have their own compressionType attribute | Only supported by SQL Sever 2008+ Enterprise Edition or Developer Edition | |
UseTruncate | TRUNCATE will be used instead of DELETE to clear tables | TRUNCATE is faster, takes less transaction log space, but breaks replication | |
UpdateStatisticsOnBulkInsert |
Update table statistics after a bulk insert |
Can cleanup table structures after a bulk insert |
|
ConnectorStatus |
This option is used for providing the status of each ContentConnector run |
Keep enabled |
cds_ConnectorStatus
|
JVMProps |
Display JVM properties in the log file |
For debugging purposes |
|
DisplayInsertRecordCount |
Display the number of records that were bulk inserted in the log file |
For debugging purposes. This causes an extra select. |
|
NoMySQLCharacterSetSupport |
Do not send character set property to MySQL |
Certain older versions of MySQL did not support character sets for bulk inserts for multiple languages |
|
DisableTabLock |
Do not use TABLOCK during SQL Server bulk inserts |
This can prevent table locking, but also slows down inserts |
|
OracleUseUnixNewLine |
Use Unix new lines for Oracle bulk inserts |
Can be necessary if bulk inserts fail because of lines |
|
DataSource Options |
|||
KeepACK |
Triggers the delivery of the new increment download.zip file automatically once the current file is processed successfully by the ContentConnector |
Enable only if the delivery of the new increment file should not be triggered automatically. It can be done manually by deleting the dataout.trxt file from the ack folder on your account's FTP. |
|
SearchableAttributes |
Loads searchable attributes either the full set or Product Line and Model only in case the full set is not a part of the contract |
Keep enabled |
cds_Atr |
NewUNSPSC |
Loads UNSPSC data |
Enable only when your account is set up to receive this data. |
cds_UNSPSC_Versioned_Commodities
|
UNSPSC |
Legacy option for UNSPSC data. |
Keep disabled |
cds_UNSPSC
|
Metamap |
Loads the data which defines a link between Customer SKU and CNET ProdID as well as UPC codes and Distributors' Part Numbers when the account is set up to receive these options |
Keep enabled |
cds_Metamap
|
NVARCHAR | This option should be set for SQLServer UTF8/UTF16 encoding for double bytes support | Enable when needed | |
RelatedProducts |
Legacy option for ContentConnector 2.2.3.4 or later. Loads Related Products data (RP Links and/or RP Rules) |
Enable only when your account is set up to receive Related Products Links and/or Related Products Rules |
cds_Acc_Updates |
RelatedProductsLinks |
Loads Related Products Links, i.e. a link between an Accessory SKU and a Parent SKU |
Enable only when your account is set up to receive OEM Related Products |
cds_Acc_Links |
RelatedProductsRules |
Loads Related Products Rules, i.e. a link between an Accessory SKU and a Parent Product Line and Model |
Enable only when your account is set up to receive Compatibility Rules |
cds_AccCompat
|
DigitalContent |
Loads digital content. The list of the digital content options is available in the config file in the MediaType section |
Enable only when your account is set up to receive this data. |
cds_DigContent
|
DigitalContentMeta |
Loads Images Meta Attributes data, i.e. Image Size, Image Type, Image Angle, etc. |
Enable only when your account is set up to receive this data. |
cds_DigContent_Meta
|
DigitalContentText |
Loads digital content text; an alternative delivery method for all text components existing in Digital Content, including:
|
Enable only when your account is set up to receive this data. | cds_DigContent_Text |
DownloadDigitalContent |
This option is used for downloading the digital content. |
Enable if you would like the digital content to be downloaded. You may choose to disable this option if you would like to use the direct links. |
|
ProcessThumbnails |
This option is used for thumbnail generation. Thumbnails are defined in the config file in the Thumbnails section.
|
Disable if the image reducing to thumbnails is not needed |
|
MainSpecs |
Loads Main Specifications |
Enable only when your account is set up to receive this data. |
cds_Mspecxx
|
ExtendedSpecs |
Loads Extended Specifications |
Enable only when your account is set up to receive this data. |
cds_Especxx
|
MarketingText |
This is a legacy option for when the marketing text was delivered not in digital content |
Keep disabled |
cds_Mktxx |
MarketingTextDigContent |
Loads the marketing text which is delivered in digital content |
Keep enabled |
The digital content tables |
Atomic |
Loads Atomic Data |
Keep disabled |
cds_Atr_full
|
CatalogInfo |
Loads the Catalog information which includes all SKUs with their current status, last delivery date, etc. |
Keep enabled |
cds_Catalog |
Lifecycle |
Loads Products Lifecycle data (Mfg Release Date, Mfg Discontinue Date) |
Keep enabled |
cds_Lifecycle |
Categorization |
Loads Alternative Categorization |
Keep enabled |
cds_Cct_Categories
|
CheckMissedIncrement |
This option is used for checking that there is no missing data. If any data is missing the ContentConnector will exit with an error "An increment has been missed" |
Keep enabled |
|
CheckIsFullDownload |
Legacy option for ContentConnector 2.2.3.4 or later. This option is for checking if the delivered file is full when the ContentConnector runs in increment mode. If the Connector detects that it is a full file then it switches automatically to processing it in full mode |
Keep enabled |
|
SecureFTP |
This option is for downloading the files via SFTP (FTP over SSH) |
Enable if SFTP is used |
|
External:AlternativeSearchableAttributes |
This option is used for loading the external files with Alternative Searchable Attributes |
Enable if your account is set up with the full set of searchable attributes and if would like to get Alternative Searchable Attributes |
cds_Asa_Cat
|
External:AttributeDisplayOrder |
Loads a file that contains the logical ordering of searchable attributes, including those that are not part of the alternative set |
Enable if needed along with the AlternativeSearchableAttributes option |
cds_Atr_Logical_Order |
External:LanguageMapping |
Loads a file that maps CNET language codes to ISO codes |
Enable if needed |
cds_ISO_Languages |
UploadProducts |
This option is used for uploading the products |
Enable only when your account is set up for requesting the SKUs and downloading from the same account |
|
StageOnly |
Only load the staging tables for an increment. |
Does not apply the update policies. Production tables will be empty. |
|
KeepStage |
Do not truncate the staging tables at the end of the Connector run. |
Staging tables will still be truncated at the beginning. |
|
PIMFeed | Loads PIM feed | Enable only when your account is set up to receive this data. |
cds_Pim_Atr
|
2. Language Settings
Uncomment the lines containing the required languages.
For example, if your DataSource account is set up to receive the data in English language with the Imperial Units of measure, the following language parameter would need to be uncommented:
<Language CNETFileCode="ez" ISOCode="" TableCode="ez" Create="True" Load="True"/>
If your account is set up with English and French languages with Metrics units of measure then the following parameters would need to be enabled:
<Language CNETFileCode="fr" ISOCode="" TableCode="fr" Create="True" Load="True"/>
<Language CNETFileCode="en" ISOCode="" TableCode="en" Create="True" Load="True"/>
3. Data Exchanges
Locate the element <DataExchanges> in the configuration file and uncomment the block that corresponds to your database and character encoding.
If your account is set up with SQL Server and Windows encoding, and your code page is other than 1252, add the Encoding attribute with the corresponding value. You may find the code per language in the Encoding/Codepage Documentation section in the configuration file.
For example for Japanese language:
<DataExchanges ServiceName="DataSource" LocalDownloadPath="data/DataSource/zipped/download.zip"
LocalUnzippedPath="data/DataSource/unzipped" ImageRepository="data/DataSource/digitalcontent"
LocalUploadPath="data/DataSource/Upload"
ThumbnailRepository="data/DataSource/digitalcontent" CNETFTPMinutesBetweenAttempts="5" CNETFTPMaxAttempts="3" Encoding="932" > <!-- Not UTF -->
...
...
</DataExchanges>
Please customize the following:
Attribute | Replace with |
LocalDownloadPath | The location where you would like the files to be downloaded to (example: C:\DataSource\data\zipped\) |
LocalUnzippedPath | The location where you would like the files to be extracted/uncompressed to (example: C:\DataSource\data\unziped for example) |
LocalUploadPath | The local directory where PRC/MFC files will be stored before being transferred to CNET. This setting is intended for customers who request specific SKUs from CNET (i.e. “PULL mode” subscribers). |
ImageRepository | The location where you would like the images to be copied to. This can be a folder that is directly visible by your web-application through a virtual directory. (example: C:\DataSource\data\images) |
ThumbnailRepository | The location where any thumbnails will be stored. Often, this is the same as ImageRepository. |
<CNET …Login | Your DataSource Account FTP username. |
<CNET … Password | Your DataSource Account FTP password. |
4. Digital Content
The DigitalContent section provides all available digital content types.
<DigitalContent Download="NewOrChangedSinceLastRun" NbThreads="10" ConnectionAttempts="20">
<MediaTypeRestriction>
<MediaType ID="1" Directory="JPG_200x150" Description="Standard image (200x150)">
<Thumbnails>
<Thumbnail ID="1"/>
</Thumbnails>
</MediaType>
<MediaType ID="2" Directory="JPG_400x300" Description="Large image (400x300)"/>
<MediaType ID="3" Directory="SSA" Description="Multi-angle images"/>
<MediaType ID="4" Directory="MARKETING_TEXT" Description="Localized marketing text"/>
<MediaType ID="5" Directory="KEY_SELLING_POINTS" Description="Key selling points"/>
<MediaType ID="10" Directory="WHATS_IN_THE_BOX" Description="What's in the Box"/>
<MediaType ID="11" Directory="PRODUCT_DATA_SHEET" Description="Product data sheet"/>
<MediaType ID="12" Directory="USER_MANUAL" Description="User manual"/>
<MediaType ID="13" Directory="QUICK_START_GUID" Description="Quick start guide"/>
<MediaType ID="14" Directory="PRODUCT_FEATURES" Description="Product features"/>
<MediaType ID="15" Directory="CCS" Description="CCS Product Images">
<Thumbnails>
<Thumbnail ID="1" Width="200" Height="150"/>
</Thumbnails>
</MediaType>
<MediaType ID="17" Directory="MANUFACTURER_LOGOS" Description="Manufacturer Logos"/>
</MediaTypeRestriction>
<MimeTypeRestriction>
<MimeType MimeID="image/gif" Extension="gif"/>
<MimeType MimeID="image/jpeg" Extension="jpg"/>
<MimeType MimeID="image/x-photoshop" Extension="psd"/>
<MimeType MimeID="application/x-photoshop" Extension="psd"/>
<MimeType MimeID="application/xml" Extension="xml"/>
<MimeType MimeID="application/pdf" Extension="pdf"/>
</MimeTypeRestriction>
</DigitalContent>
<Thumbnails>
<Thumbnail ID="1" Directory="JPG_75x75" Width="75" Height="75"/>
</Thumbnails>
The MediaTypeRestriction lists the media types that will be downloaded. MediaType ID matches the MediaTypeID in the database.
The Directory is where each media type is stored and it is going to be created in \data\DataSource\digitalcontent folder once the ContentConnector runs. Each directory is relative to the ImageRepository directory.
ContentConnector also supports additional thumbnail generation. The thumbnails the size 75x75 is provided for all images as part of the CCS Product Image media type.
The additional other than 75x75 size thumbnails can be generated by including Thumbnails section within the original media type. Multiple thumbnail sections can be defined as well. Each Thumbnails section will reference an actual thumbnail size and directory by ID. By default, Thumbnails section is configured to create a 75x75 image from the 200x150 image (as shown above).
MimeTypeRestriction is similar to MediaTypeRestriction. It limits the mime types the ContentConnector will download.
5. Database Location
The Schema element references the database-specific configuration file located in the configuration/DataSource directory. Uncomment the schema that corresponds to your database and comment out all other schemas as necessary.
<!-- Enable one of the following options -->
<Schema definition="configuration/DataSource/DataSource-SQLServer.xml"/>
<!--Schema definition="configuration/DataSource/DataSource-MySQL.xml"/-->
<!--Schema definition="configuration/DataSource/DataSource-Oracle.xml"/-->
<!--Schema definition="configuration/DataSource/DataSource-PostGreSql.xml"/-->
Locate the ServerURL field. Depending on your database, you may need to specify your server address, port number, and database name. Replace all fields that are in all capital letters with the information for your own environment.
If you are running multiple instances of SQL Server, your server address may be something like “MYSERVER\MYINSTANCE”.
Next, locate the ProductDataDatabase field. Replace DATABASE with the name of your database.
The stagingTablePrefix attribute tells the application how to prefix the tables that will store increment data. Those tables are then synchronized with the main tables (which have no prefix). You may change this entry to suit your naming conventions.
6. Database Credentials
Locate the Login section and enter the user ID and password for a user with table creation/deletion/update rights on the database created.
7. Validation (Oracle only)
In order to speed up the update in Oracle, there is an option to turn validation off. When constraints are added or enabled, this option will tell Oracle not to verify the existing data. However, any future data operations will be affected by the constraints. To enable this option, add the line <NoValidate>true</NoValidate> just after the Loader element in the Databases section.
8. Monitoring
The ContentConnector uses a library called Log4j which allows for application events to be distributed in several ways (email alerts, HTML report, Windows event log, web-service, database etc….) and for a configurable set of levels (FATAL, ERROR, WARN, INFO,DEBUG). The monitoring configuration file, which can be found at logs/log4j-DataSource.properties, specifies where the notifications should be sent and establishes minimal levels for the notifications events.
The defaults are as follows:
- A detailed (debugging level and above) log of all events goes to an text report
- A report (info level and above) goes into an HTML file
You may change those settings to suit your particular logging needs and in any case will need to modify log4j-DataSource.properties to reference local file paths.
9. Optional: Network Proxy Configuration
When running the ContentConnector through a proxy server, you will need to configure extra optional settings in ContentConnector.xml.
Insert the following text at the very end of the <DataExchanges> section and modify to fit your proxy definition:
<Proxy Host="proxy.mycompany.com” Port=”3000” Username=”” Password=””/>
10. Remote Loading
If ContentConnector and the database are located on different servers follow the recommendations below.
MySQL, SQLServer, and Oracle support loading the database from a remote workstation or server.
MySQL
1. Loader should be Local (local refers to client file).
<!-- MySQL -->
Databases Type="MySQL4.0CompatibilityMode">
<Driver>com.mysql.jdbc.Driver</Driver>
<ServerURL>jdbc:mysql://testserver:11533/ConnectorTest</ServerURL>
<ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
<Login User="xxx" Password="xxx"/>
<TableType>MyISAM</TableType>
<Loader>Local</Loader>
</Databases>
SQLServer
1. Loader should be BCP
2. BCP utility must exist on client machine
3. JDBC string must contain hostname, port and database name
<Databases Type="SQLServer2000">
<Driver>com.microsoft.sqlserver.jdbc.SQLServerDriver</Driver>
<ServerURL>jdbc:sqlserver://testserver:1433;database=ConnectorTest</ServerURL>
<ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
<Login User="connector" Password="connector2"/>
<Loader>BCP</Loader>
</Databases>
4. Local user must be created on a server machine
5. SQL Server Services - SQL Server Browser must be running
6. SQL Server Network Configuration - Protocols - TCP/IP - IP Addresses - TCP Port should be static (1433 for example).
Oracle
1. SQLLDR utility must exist on client machine
2. tnsnames.ora must contain valid remote service string with ADDRESS and SID correctly configured to remote database
3. ProductDataDatabase element must be configured to use tnsnames.ora service entry
<Databases Type="Oracle9i">
<Driver>oracle.jdbc.driver.OracleDriver</Driver>
<ServerURL>jdbc:oracle:thin:@remotehost:1521:ConnectorTest</ServerURL>
<ProductDataDatabase name="ConnectorTest" repositoryTablePrefix="" stagingTablePrefix="in_"/>
<Login User="ConnectorTest" Password="xxx"/>
<Loader>Native</Loader>
<NoValidate>true</NoValidate>
</Databases>
ConnectorTest =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = remotehost)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = ConnectorTest)
)
)
To test tnsnames.ora, log into sqlplus using sid. It should work so the contentConnector could work as well.
sqlplus ConnectorTest@XE
External Datasets
The ContentConnector has the ability to load external files during the maintenance process. "External files" are data files that are not delivered within download.zip file. They can originate from CNET or your own organization.
Preset External Datasets
Preset External DataSets are defined and can be activated in the configuration/ContentConnector.xml file.The following External Datasets are a part of the standard release but not activated by default:
Option to activate |
Description |
External:AlternativeSearchableAttributes |
Loads the alternative searchable attributes tables (cds_asa*). |
External:AttributeDisplayOrder |
Loads a file that contains the logical ordering of searchable attributes, including those that are not part of the alternative set (cds_Atr_Logical_Order table). |
External:LanguageMapping |
Loads a file that maps CNET language codes to ISO codes (cds_ISO_Languages table). |
Defining a new External Dataset in DataSource-XXX.xml
Three steps need to be taken to use external datasets:
- Define the location and protocol for downloading the external file
- Define tables that will be loaded with the external dataset data
- Define maintenance policies that will sync those tables between runs
Define the location and protocol
In the configuration/ContentConnector.xml file you will notice an <ExternalDataSet> tag with <ExternalDataSetFile> children for preset datasets. You will need to define a new <ExternalDataSetFile> to load your data.
The example below illustrates how to load CNET's Alternative Searchable Attributes:
<ExternalDataSet>
<ExternalDataSetFile
description="CNET Alternative attributes"
remoteFileName="cds_asa.zip"
compressed="true"
localFolder="cds_asa"
tiedToOption="External:AlternativeSearchableAttributes">
<ExternalDataSetFileHTTP HTTPServerURL= "http://www.cnetcontentsolutions.com/ support/datasource/asa/cds_asa.zip">
</ExternalDataSetFileHTTP>
</ExternalDataSetFile>
</ExternalDataSet>
ExternalDataSetFile definition
description |
The dataset name as it will appear in the connector logs |
remoteFileName |
Name of the remote file - this will also be the name of the local file |
compressed |
"true" if the file needs to be unzipped after download "false" otherwise Note that you will need to set compressed to true if you intend to download a zip file that itself contains multiple files |
localFolder |
Location where the file will be downloaded locally. This is relative to the location defined in ContentConnector.xml by the <DataExchanges> LocalDownloadPath attribute. |
tiedToOption |
This external dataset will be active only if the according option is set in ContentConnector.xml, for example <Option>External:AlternativeSearchableAttributes</Option> |
<ExternalDataSetFileHTTP> |
If the remote file is accessible via a public HTTP URL, use this tag and populate the attribute described below |
HTTPServerURL |
Public URL from where the file can be downloaded |
<ExternalDataSetFileFTP> |
If the remote file is accessible via FTP or SFTP, use this tag and populate the attributes described below |
FTPServer |
Server name or IP address |
FTPLogin |
User name |
FTPPassword |
Password |
FTPRemoteFolder |
Remote folder in which file resides |
FTPSecure |
"true" if SFTP access needs to be used to connect. Note that the remote server will need to be added to your SSH KnowHosts file "false" if standard FTP access is to be used to connect |
Create tables
Each file that needs to be loaded in the database needs to have a corresponding table created. Follow the syntax that is used to create classic DataSource
tables to do this in DataSource-XXX.xml. Note that the fileName attribute needs to be prefixed with the "external/" folder name as this is the
location under the LocalUnzippedPath, the value given in LocalFolder in the ExternalDataSet should follow.
that all external files will be placed/unzipped into. Use the same tiedToOption value that was defined in the ExternalDataSet.
Example:
<DBTable name="cds_Asa_Cat"
localized="false"
fileName="external/cds_asa/cds_Asa_Cat.txt"
tiedToOption="External:AlternativeSearchableAttributes>
...
</DBTable>
Define maintenance policies
Maintenance policies are located at the end of the DataSource-XXX.xml file. They define how data is synchronized between the staging tables and the main tables and end up being translated into SQL statements in all supported database dialects.
If the data that is loaded is always a full dataset (not incremental) then just add a policy that does a truncate/insert all between staging and main:
<UpdatePolicyMethod tableName="cds_Asa_Cat" methodName="truncateThenCopy"
description="Full reload of cds_Asa_Cat table ">
<UpdatePolicyArg name="DUMMY" value="REMOVE"/>
</UpdatePolicyMethod>
If more complex logic needs to be implemented, other policies exist but they are more complicated to use. You may want to consult CNET if you have the intention of using such policies.
updateExistingAndInsertNew |
Updates records in main that also exist in stage and that have changed (changes concern all columns except the idenfiying colums), inserts records from stage that do not exist in main. On some databases (Oracle, SQL Server), the SQL that is generated leverages the MERGE keyword, on others it will use comparisons between all column values to identify changes.
Specify the column(s) that are used to uniquely identify records: <UpdatePolicyArg name="column" value="ID"/> |
deleteExistingAndInsertNew |
Deletes all records from main that are present in stage and re-inserts them from stage. |
mergeTablesWithSwap |
This method is functionally identical to deleteExistingAndInsertNew but it instead relies on table swaps to speed up the synchronization of record-heavy tables. Table swap operations create temporary copies of destination tables to perform only insert operations (no deletes, no updates) restricted to the data that needs to be kept. This data can come from the existing main table or the staging table. After all inserts are performed, the swap table replaces the main table, replicating foreign keys and indexes in the process. Since the original main table is dropped, mergeTablesWithSwap cannot be used on tables that are part of materialized views.
Specify the column(s) that are used to uniquely identify records: <UpdatePolicyArg name="column" value="ID"/> |
insertNew |
Inserts records in main that are only present in staging
Specify the column(s) that are used to uniquely identify records: <UpdatePolicyArg name="column" value="ID"/> |
replaceTableWithSwap |
Performs a reload of the main table by using table swaps. This should be used when foreign keys point to the main table. |
truncateThenCopy |
Performs a reload of the main table by truncating and reloading it. This should be used when no foreign keys point to the main table. |
deleteObsoleteContent |
Deletes all content from the repository which we know will not be used. The method has 4 arguments:
Example: <UpdatePolicyMethod tableName="cds_Distivoc" methodName="deleteObsoleteContent" description="Remove stale metamapping contributor information"> <UpdatePolicyArg name="deleteType" value="WhenOnlyInProd"/> <UpdatePolicyArg name="column" value="ID"/> <UpdatePolicyArg name="stageTable" value="cds_Distivoc"/> <UpdatePolicyArg name="stageColumn" value="ID"/> </UpdatePolicyMethod> |
deleteOrphanedContent |
Deletes all content from the repository when records aren't being used by another table in the repository.
Example: remove all records from cds_DigContent_Regions that cds_DigContent_Region_Links aren't using anymore:
<UpdatePolicyMethod tableName="cds_DigContent_Regions" methodName="deleteOrphanedContent" description="Remove unused regions"> <UpdatePolicyArg name="column" value="RegionCode"/> <UpdatePolicyArg name="repositoryTable" value="cds_DigContent_Region_Links"/> <UpdatePolicyArg name="repositoryColumn" value="RegionCode"/> </UpdatePolicyMethod> |
Finalizing changes
Once the three steps above have been completed, you will need to re-run the connector in create mode to get it to create the missing tables, including
staging tables (prefixed with "in_").
It is also important to know that external datasets are fetched, even when running the connector in "nodownload" mode (fullnodownload or incrementnodownload
keywords).
Secure FTP
Since release 2.2.0.x, the ContentConnector has the ability to download files via SFTP (FTP over SSH). This protocol allows for encrypted transfers; note that it is different from FTPS which isn't supported.
To activate SFTP transfers, you will need to activate the option and point the connector to the CNET SFTP server instead of the regular FTP server. You will also need to register the server as being trusted '("Known Host")
ContentConnector.xml changes
- Uncomment or add <Option>SecureFTP</Option>
- Modify the element DataExchanges/CNET's FTPURL to sftp.cnetcontentsolutions.com instead of the janus.cnetdata.com standard value.
Adding server to your trusted list
SFTP support is a standard Unix/Linux/OSX feature while it isn't part of Windows.
Unix/Linux/OSX
The ContentConnector will first look for the known_hosts file in the ContentConnector's main folder. If it can't be found, it will be searched for under the usual key register (~/.ssh/known_hosts).
Adding the CNET server to your list of trusted SSH endpoints is easy: SFTP to the server and confirm that you trust it:
$sftp USERID@sftp.cnetcontentsolutions.com
The authenticity of host 'sftp.cnetcontentsolutions.com (40.79.74.37)' can't be established.
RSA key fingerprint is 73:8c:c8:f8:52:73:7e:c9:ec:6d:a6:14:63:62:50:75.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'sftp.cnetcontentsolutions.com,40.79.74.37' (RSA) to the list of known hosts.
Windows
You first need to install an SSH client - many exist and some such as SSHWindows are free (http://sshwindows.sourceforge.net).
Once the SSH client is installed, create a known_hosts file under the ContentConnector main directory:
ssh-keyscan -t rsa sftp.cnetcontentsolutions.com > known_hosts
Making sure SFTP is being used
The connector log files will show that SFTP is activated:
2013-11-18 17:51:26,542 INFO {unknown} Connected to sftp.cnetcontentsolutions.com using secured FTP [FileTransfer] (FileTransferSFTP.java:144)
ContentConnector Operation
This section gives the syntax of the calls which can be issued to the ContentConnector. These commands are performed at the command prompt.
1. Basic Command Structure
The basic command structure for calling the ContentConnector is as follows:
java -jar contentConnector-2.2.3.x.jar <service> <configuration> <action>
Example:
java -jar contentConnector-2.2.3.4.jar datasource configuration\ContentConnector.xml create
2. Available Action Calls
import |
New command for ContentConnector 2.2.3.4 and later.
Full file - truncates all existing tables and loads new data into the DataSource tables.
|
importnodownload |
New command for ContentConnector 2.2.3.4 and later. Same as “import” but uses local data instead of retrieving new data from FTP. A download.zip file may also be specified to extract data from a location other than the default download location. |
create |
Builds the table structure and relationships for DataSource Note: The DataSource database must be manually created prior to running the create action. |
full |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'import' for ContentConnector 2.2.3.4 and later. Downloads download.zip file from FTP. Truncates all existing tables and loads new data into the DataSource Tables. The upload action is also performed, if applicable. |
fullnodownload |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'importnodownload' for ContentConnector 2.2.3.4 and later. Same as "full" but uses local data instead of retrieving new data from CNET. A download.zip file may also be specified to extract data from a location other the default download location. |
increment |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'import' for ContentConnector 2.2.3.4 and later. Downloads download.zip file from FTP. Performs a series of delete and inserts to update existing rows and insert new records. (Note: This is the command used on a daily basis). The upload action is also performed, if applicable |
incrementnodownload |
Legacy command. Use it with ContentConnector 2.2.3.3 and older. Use 'importnodownload' for ContentConnector 2.2.3.4 and later. Same as "increment" but uses local data instead of retrieving new data from CNET. A download.zip file may also be specified to extract data from a location other the default download location. |
digitalcontentonly | Does not update the database. Only downloads the digital content files and creates thumbnails to what is currently in the database. |
rebuildthumbnails | Does not download any files or update the database. Only recreates thumbnails for existing digital content files to what is in the database. |
upload | Uploads MFC/PRC files to FTP. No other actions are performed. |
NOTE: If you are running ContentConnector 2.2.3.3 and older, the full mode should be used on the initial load only and after that the ContentConnector should be scheduled to run in increment mode. If the full refresh is required the full file delivery should be requested. The ContentConnector may still run in increment mode on a full file as long as the option CheckIsFullDownload is enabled in the ContentConnector.xml file.
Troubleshooting
There are several options available to help troubleshoot ContentConnector. In the ContentConnector.xml file, uncomment the JVMProps option to print out a listing of the current Java environment in which the ContentConnector is executing.
In the Monitoring section of the ContentConnector.xml file, add the following attribute: “traceSQL=true”. This setting will print out all SQL that is generated by the ContentConnector. The SQL output will be placed in the scripts directory and will be organized by database and action.
In configuration\log4j.xml file replace <priority value="info"/> with <priority value="debug"/> With this option enabled ContentConnector will log additional debug information to logs\log.txt file.
Log files
The log files contain detailed and useful information. In order to ensure that the database connection is working properly and that the data is correct, ContentConnector performs several checks. If any of these checks fail, the ContentConnector will log the results and exit before the main tables are altered.
Database Connection
ContentConnector will make sure that the JDBC driver can be loaded. It will also make sure that the login information can be found. If a connection cannot be made to the database, it will log the error message and exit.
Database Verification
ContentConnector will print out the database product name, database product version, driver name, driver version, and catalog. It will make sure that the database type in the ContentConnector.xml file (e.g. SQLServer2000”) matches the database that ContentConnector has actually connected to. It will also make sure that the versions of the database and driver are compatible. In the case of MySQL, if the JDBC MySQL driver is prior to version 3.1, then it also makes sure that ISO-8859-1 is not being used.
All files in Schema
Before loading any data, ContentConnector will look at DataSource-xxx.xml and make sure that the files specified in the schema configuration match the files contained in the current download. If they do not match, ContentConnector will exit.
Missed Increments
In the ContentConnector.xml file, the CheckMissedIncrement option is turned on by default. This tells ContentConnector to load the data into staging and to then check the catalog to see if an increment has been missed. If an increment has been missed, ContentConnector will exit before updating the main tables.
Full Download
In the ContentConnector.xml file, the CheckFullDownload option is turned on by default. This tells ContentConnector to load the data into staging and to then check the data to see if it is a full dump of the catalog. The ContentConnector will automatically switch to full mode when full dumps are detected and perform a complete refresh of the main tables. Although full dumps can be processed in increment mode, full mode offers the best performance.