The article bellow contains suggestion that should help you to improve the performance of the migration-center and to set up the adequate environment for your migration project.
Client and Database:
The performance of the client is highly dependent on the database performance thus it's recommended that the client and the database to be in the same segment of network. If possible delete the older objects that were migrated in the past in order to reduce the number of source objects and migsets. Please be aware doing so will not allow the possibility of delta migration for those old objects.
Make sure the Oracle statistics are up to date. Especially for the table source_objects and its indexes.
Monitor the slowest query reported by oracle enterprise manager. In some cases it may happen the right indexes are not used. A common scenario is the one described in this article: https://migrationcenter.zendesk.com/hc/en-us/articles/209865893. That may apply not only to filesystem scanner.
Be aware that if the Jobserver runs as a 32-bit Java process it can only use about 1.5GB of RAM effectively. If it runs on 64-bit Java, the physical RAM of the machine is the limitation.
Initialy, the heapspace memory is set to 1GB but you can increase the heapsize by changing the following parameter in the wrapper.conf:
# Maximum Java Heap Size (in MB)
Every environment is different, not just in terms of performance, but also in terms of people working on that system (end users, project managers, IT managers see and react differently to the migration and its requirements and implications), so you need to consider not just the technical factors but also the needs and possibilities of the people working with or managing the systems involved.
1) Make sure the jobservers and the export location are deployed in the same network segment with the source and target repository. The performance of the communication between jobserver and source and target repository is very important for the overall performance.
2) Consider running multiple scanners in parallel on multiple jobservers deployed on several machines. Make sure you don't scan the same objects on multiple parallel jobs. In this way you can scale the performance up to the limit of the Source and target Content server.
3) Since the scanning performance is very dependent by the Content server performance, you should make sure the source and target Content Server performs well especially when querying data.
There are no hard limits. The limitation comes mostly from the throughput you can get from your source system (while scanning) and the target system respectively (while importing). You can start with 2-4 jobs, and work your way up from there; i.e. if performance is not satisfactory with 2-4 jobs, try launching 1-2 more jobs, and observe whether throughput increases accordingly or not. This may not be the case, as several other factors lie between the jobserver and the source/target system: network throughput and latency, storage system throughput, repository throughput, repository database performance, migration-center database performance, etc.
Again, there is no hard limit regarding the number of objects you can scan in one go. Objects will be scanned in succession, and are not held in memory, but committed to the migration-center database (metadata) and the export location specified in the scanner (content files). This means you can safely start to scan a repository with several million objects with only 1-2 scanners (e.g. scan half of the cabinets with one scanner, the other half with a second scanner). As above, you can create more scanners if the source system and underlying network/storage infrastructure can handle the load.
The more there is to scan, the more time it will take, obviously, and you cannot work on the scanned data until the job is finished. So you may want to run smaller scans that finish quickly, so you're able to start processing this data while the rest is still being scanned. Whether such an approach makes sense for your current project or not is up to you to determine. If the source/target system can give/take more than what a single jobserver can scan or import, then adding more jobservers on additional machines will roughly increase the performance by that number. But as always when it comes to parallelism, the many factors affecting the complex chain of hardware/software systems involved in a migration will at some point put a stop to just adding more and more processes, as either throughput or latency will become a problem if you intend to run massive amounts of migration jobs.
Do not rely on the fact that massive numbers of parallel jobs will take care of performance automatically - yes, if everything works well this provides a boost in throughput during migration, but also increases overhead significantly. One fast repositories, with mc also running in the same environment on fast network and storage links can easily take in 20k or 30k regular office documents per hour.
No downtime is needed during scanning, access to the source system is read-only. Depending on the performance of the source system and the load, the scanning process could be anywhere from disruptive to not even noticeable. This is something that needs to be determined during test runs in the preparatory phases of the migration project. An old, heavily loaded and badly maintained system would most likely be a pain to work with during scanning, both from the end users', as well as the migration user's perspective.
Conclusions: you need to run extensive tests (both functional and performance related) before planning the production phase. The more complex the migration requirements and the infrastructure, the more important the test phase is.
Article is closed for comments.