Once you have FairCom DB integrated into your application, you will be faced with a real-world problem: loading massive amounts of data into your database. This can be a time-consuming task.
The first temptation might be to write a script that inserts your data into the database one record at a time. Fortunately, FairCom DB offers many features that streamline this process. Instead of records trickling in one at a time, your DB database can be gulping multiple records simultaneously.
In one customer case where we’ve used this process, the time to load billions of records—with several indices—went from approximately two weeks to less than two days.
The following tips can be used to speed up the process of inserting data into your FairCom DB database:
1. Turn off transaction processing
Transaction processing control can be turned off during these steps. Assuming you have the data preserved where you can start over in case of a problem, you don't need transaction processing control during migration.
If you want transaction processing control down the road, ensure you create the data and index files with TRNLOG file mode active. Once you create the file initially with TRNLOG enabled, you can speed up the operations by disabling TRNLOG programmatically as indicated in the FairCom DB Programmer Reference Guide topic Transaction Processing On/Off.
Or you can call the cttrnmod program, explained in the cttrnmod—Change Transaction Mode Utility.
Don’t forget to turn transaction processing back on after you have completed the data load. (The topics cited above explain how to do this.)
2. Use SHARED MEMORY protocol
If possible, run the data load program on the same machine hosting the FairCom DB data. This will allow the FairCom DB Server to use the shared memory communication protocol, which is much faster than TCP/IP.
If you need to use TCP/IP, increase the number of threads loading data to multiple threads per CPU core to compensate for the network latency.
3. Use Direct I/O (v11 and later only)
When using FairCom DB v11 and later, please review Direct I/O support. This will provide some help when building and working with larger files.
4. Multithread the inserts
The next way to boost performance is to use one of the non-relational FairCom DB APIs, such as the ISAM or FairCom DB API.
If you can break the data coming into the program into multiple chunks, these APIs allow you to insert using multithreading. A good rule of thumb is to use one-to-two threads for each virtual CPU core.
5. Disable indices using CTOPEN_DATAONLY file mode
You can drop the index support when you are doing the data load. This will get the data into the data file in the fastest manner and will avoid the time it takes to update your indices on the fly.
See Opening a Table in the FairCom DB C API Developer's Guide.
6. Insert in batches
With FairCom DB v10 and newer, you can use batch inserts. This is quicker than individual adds because we can maximize the OS packet size and get the maximum amount of data fed into the FairCom DB Server process with each batch call.
Review FairCom DB batches here.
7. 'Rebuild' to create indices
Once you have all of the data loaded into the data files, do a rebuild to generate the indices. This is the fastest way to build the indices because you now have all the data in the FairCom data files, so the indices can be built from scratch with a known set of data. To generate your indices, use the function call ctdbRebuildTable.
Or you can call the ctrbldif program.
To improve the performance of an index rebuild through the Server, increase these two settings in your ctsrvr.cfg file:
These tips above should help you complete the data load process in much less time than a single-threaded program using ctdbWriteRecord() inserts.