Tuesday, November 27, 2012

Comparison of Golden gate Vs Streams

The purpose of this note is to present a high level comparison of the operation of Oracle Streams version 11.2.0.3 and Oracle GoldenGate commencing version 11.1.1.

The note is only applicable to operation against an Oracle database. No new Streams features will be added after 11.2.0.3 . This is outlined in Oracle's statement of direction for GoldenGate - which can be found
here.  Oracle Streams will continue to be supported and maintained in future releases. The strategic direction for logical replication within Oracle is now Oracle GoldenGate. The intention is to integrate the 'best' features of Streams into Oracle GoldenGate.

Streams will continue to support exactly what it supports today in 11.2 in future releases. New replication features in relation to the capture of additional data types, etc. will not be supported by Streams as Streams will not be enhanced. Customers who wish to take advantage of new replication related features will need to license GoldenGate.

Oracle GoldenGate is a separate product from Oracle Streams. Oracle GoldenGate Extract runs externally to the Oracle Database and is licensed seprately from the Oracle Database. Oracle Streams Capture runs as a process within the Oracle instance. Both products effectively offer the same level of functionality by different means. Hereafter, Streams will refer to Oracle Streams and GoldenGate will refer to Oracle GoldenGate (OGG).
Each product deals with activity which appears in the Oracle redo logs. There are certain objects which neither product are able to Capture. Large and long running transactions (LRTs) need special consideration in both products. DDL operations will also introduce new tables, change existing tables and may result in others being dropped - this also presents a challenge to determine exactly what the details in the Oracle redo logs relate to.

This is not an exhaustive reference on the subject. Always consult the Oracle product documentation for more details to any questions which may arise. Most of this information has been in the public domain for a long time. Considering the details, both products are working at achieving the same goal albeit in slightly different ways.
Staging
Staging is the idea of temporarily storing only the changes a process is interested in so they can be processed by the next activity. Where relevant, staging can be employed at both the Source and the Remote databases. Streams and GoldenGate do things differently in this area as a result of different design considerations. In this document, the focus is only on the Capture / Extract process staging changes from the Oracle redo logs.
A significant difference between the two products is that Streams stages all committed and uncommitted transaction changes. GoldenGate only stages committed changes. This has a significant implication in the way the products work.

Whilst it is not possible to view the content of an Oracle buffered queue, as it is an in memory structure, changes which GoldenGate Extract has staged can be viewed with the Logdump utility.
Streams: Streams: Streams may store or stage changes in a database buffered queue. The benefit is that changes are stored in memory which has a fast access time. Whilst a queue is created, it is not managed by Streams AQ. In 11.2 with Streams operating in CCA mode (the default mode, where possible), Capture interacts with Apply. As such, there no real staging involved.
GoldenGate: When processing a one-time activity : an initial load or a batch run, GoldenGate stages extracted changes in an extract file. During normal operation , GoldenGate will stage changes in an extract trail. The difference is that no checkpoint information is maintained for an extract file whereas it is for a trail. The OGG trail is comprised of  multiple, unstructured files characterized by filenames with a common 2-characater prefix and an ascending six digit suffix. Thus, we refer to trail 'ab'  which is composed of trail files 'ab000000', 'ab000001', 'ab000002', etc.
How Changes are Staged
Streams:  Streams creates a queue but now does not use Streams AQ. The queue only serves as a means to record subscriber information - who has an interest in changes. Enqueue relates to storing a change in memory whilst dequeue is the corresponding action of referencing that change - AQ is not used. In CCA Mode, if propagation is configured, the Capture process communicates directly with a propagation receiver on the remote database. If CCA Mode is not used, a separate propagation activity dequeues the message and sends it to the destination.
The propagation receiver enqueues the message onto the destination queue. An Apply process reader will dequeue messages connected with committed transactions from this queue.
CCA Mode means that the Capture process fulfills both activities : capturing changes and , where relevant, propagating them . There is no separate background propagation activity involved as propagation is carried out by the Capture process itself. The CCA optimisation is not available for all Streams configurations but it can be seen that this simplifies some of the complexity at the local side.
CCA mode also means that users should monitor gv$propagation_sender and gv$propagation_receiver views to understand what the Capture process is doing and not refer to the dba_queue_schedules view . There is no queue schedule propagation job - it is being done by Capture itself.
CCA (default operation) is more efficient than non-CCA operation . It removes the possibility for queue spill which was possible in versions below 11g. In view of this , this document does not expand upon queue spill as it is anticipated that users adopt relevant conditions to ensure CCA operation. 

GoldenGate: GoldenGate stages changes outside the database in its own cache within process memory. Once transactions are committed, they are written to the trail file from the cache.
Memory Allocation
Both Steams and GoldenGate work more effectively if transactions are small and short lived. 
Streams: Streams requires memory to be allocated on the source and destination databases to cache changes . Staging these changes involves caching transaction information in the Streams pool which is located within the SGA. Users need to be aware therefore of the Streams pool related parameter : STREAMS_POOL_SIZE. As a starting measure , it is useful to preallocate at least 200MB. This is recommended for the Capture and Apply side databases. The Streams pool , as well as holding staged changes in the related buffered queue, also accommodates Logminer's work area for a Capture process. If required, changes are only written to a database table at the Apply side .
GoldenGate: Extract caches transaction information in OS memory . For Oracle database, caching is associated with the parameter CACHEMGR parameter which is relevant to the source Extract process.
Transaction Capture / Extract
Streams: Streams Capture stages all changes connected with an object whether committed or uncommitted . It is the responsibility of the Apply side to manage what to do with these. Once the Apply reader sees a commit , it can pass the transaction to the Apply coordinator process which can then assign an Apply server to do the work or processing the transaction. If a rollback is observed, the transaction is discarded by the Apply reader. Large or long running transactions which go uncommitted may need to be written to a database table.
GoldenGate: GoldenGate only stages committed transactions. Whilst the Extract process is reading redo information , it caches change data in process memory. Once a commit is observed , the transaction can be written to the Extract trail or file. If a rollback is observed, the transaction can be discarded by Extract. Very large transactions which may have an impact on memory usage may need to be flushed out to disk to make space for other transaction details; long running transactions can also be written out to disk. GoldenGate can also use Bounded Recovery for large and long running transactions to influence Extract restart activity.
Large Transactions
Streams: Large transactions are also referenced under 'Transaction Spill' below. These are managed at the Apply side and written to a partitioned table. Large transactions are written out to the same Apply spill table after the LCR count reaches the Apply parameter : TXN_LCR_SPILL_THRESHOLD. Streams Capture also reports the transaction id of a large transaction as it is being mined in the alert log file.
GoldenGate: Large transactions may be flushed from the Extract cache to disk to avoid affecting smaller transactions in the cache . The size of the cache is controlled by the CACHESIZE parameter . Unlike Streams there is no transaction size (number of rows affected) at which Goldengate writes large transactions out to disk.
Transaction Spill
This is also referred to as Apply spill in a Streams environment . Apply spill, where possible, is written out to a partitioned table.
Streams: It is the responsibility of the Apply side to handle the spill of transactions from memory to a partitioned spill table under the following conditions :

- transactions are large ; or
- transactions are long running ;

Additionally:
- error transactions may also written out to the Apply spill table if they had already been spilled prior to a later error taking place.

An affect of Apply spill is to allow checkpoint activity to advance past the point these transactions have currently reached. In short, it means that Apply spilled long running transaction data will not have an affect on where Capture restarts from.
GoldenGate: GoldenGate only stages committed transactions. Whilst the Extract process is reading redo information , it caches uncommitted information. In view of the fact that only committed transactions are staged , the Extract process also needs to take into consideration large and long running activity. Again, large transactions may affect the cache of transactions with the same consequences that Streams would experience and therefore Extract may flush out large transactions to disk under dirtmp. Long running transaction can be envisaged as causing a problem on restart and GoldenGate addresses this by performing bounded recovery. This is discussed further under the section 'Bounded Recovery'.
Long Running transactions (LRTs)
Streams: Streams captures committed and uncommitted changes it is interested in. All txn changes - committed or uncommitted -  are sent to the Apply process. The Apply process will then write to the Apply spill table a transaction which has been open for a duration which exceeds the Apply parameter : TXN_AGE_SPILL_THRESHOLD seconds. Streams Capture also reports the transaction id of a long running transaction as it is being mined in the alert log file.
GoldenGate: Transaction activity is cached. Until Extract processes a commit or rollback, the transaction is considered open and its information continues to be collected in memory. The management of these open long running transaction is handled within the bounded recovery mechanism where , as with Streams , long running transactions will be written to disk. GoldenGate writes transaction to the trail in transaction commit order. This means that transactions may appear 'out of order' when viewed from the perspective of when they were started.
GoldenGate Extract will also warn users that long transactions are taking place if there is an attempt to shutdown Extract ; Streams doesn't do this.  If there are long running transactions, this will also provide checkpoint information which will show how far back the Extract process will go in the Oracle redo if it is forced to shutdown at this point - as a bounded recovery checkpoint may not have taken place recently. The user is therefore able to make an informed choice as to whether it is reasonable to perform a force shutdown of the Extract process. With Bounded Recovery taking place, even with long running transactions taking place , Extract should not need to go too far back in the redo logs on restart. Refer to 'Bounded Recovery' below.
Log Mining
Logminer might be used in a replication environment to verify the original transaction changes in the Oracle redo logs in the event of any uncertainty. In this situation, relevant redo logs containing the transaction would be added to an Ad-Hoc logminer session and v$logmnr_contents inspected to determine the details.
Streams: Streams uses logminer to retrieve changes from Oracle redo logs. The Streams capture process is a client of logminer.
GoldenGate: GoldenGate needs to interpret the content of the redo log files directly. With GoldenGate version 11.2.1 however, there is also be the option to use integrated Capture which will use logminer . Integrated Capture (as opposed to classic Capture - refer to the Oracle Installation and Setup Guide for 11.2.1) supports more data types and storage types as well as making it easier to work with RAC and ASM ; other benefits are also provided with integrated Capture.
Supplemental Logging
Both Streams and GoldenGate require Supplemental changes to be logged in Oracle's redo Stream . Refer to the product documentation for more details. This is an important consideration as it can affect the way changes are applied and therefore the integrity of the data.
The purpose of Supplemental logging relates to the following :
- detail needs to be written to the redo stream for a key column (primary key, unique key or substitute key) which will be passed across to the remote database to allow the equivalent row on the destination table to be accessed by index access and not Full table Scan (FTS);
- detail also needs to be logged for those columns which will be used in conflict resolution; these may be columns which were not specified in the operation.
The Supplemental logging requirements are different between Streams and Goldengate and the command appropriate to each product should be used.
Streams: Supplemental logging is added at the point the prepare command is issued. If additional supplemental logging is required , this needs to be added explicitly with the relevant 'ADD SUPPLEMENTAL LOG' command
GoldenGate: GoldenGate supplemental logging is added with the ADD TRANDATA command. Where DDL is handled, DDLOPTIONS ADDTRANDATA would also need to be used as an Extract parameter. Columns specified will be unconditionally logged in the redo stream.
For more detail and examples of the way Supplemental logging can affect data integrity , refer to 'Data Convergence and Conflict Resolution'. 
Extract / Database Persistence
Streams: Streams processes resides within the Oracle instance . Streams has its own internal state information which is maintained within Oracle database tables . All related processes are part of the Oracle instance. Streams is therefore protected by Oracle's recovery mechanism. Capture of changes in memory requires periodic checkpointing as memory is non permanent. Checkpoint information is written to the database.
GoldenGate: The Extract process resides apart from the Oracle database and writes no state information into the Oracle database. Again a checkpointing is required as again GoldenGate , like Streams, attempts to process as much as possible in memory. Checkpoint information is written outside the database.

Refer to '
Checkpoint Operation' below.
Transaction Ordering
Streams: Transaction SCN commit time ordering is managed implicitly. Furthermore , Apply parallelism takes into consideration transaction SCN commit ordering.
GoldenGate: GoldenGate enforces commit time ordering but does not offer parallel Replicat . Separate Extract / Replicat processes can be used. Again, like Streams , each Extract/Replicat pair enforces transaction ordering based on commit sequence.

Where multiple Capture / Extract processes are defined for a source database , the table groups should be logically partitioned or separated, i.e. there should not be referential integrity across tables associated with different processes. Furthermore, any application which uses this data should be aware that the X tables connected with Process X' may not be at the same point in time as the Y tables connected with the different process Y' - since the processes may be processing at different rates.
Handling DDL Activity
Streams: The Multi Version Data Dictionary (MVDD) is used by Logminer to mine redo. This allows logminer to go back to a point in the past , provided the redo logs exist , and mine forward. The MVDD records maintain changes to the object over time so that exact object information is available at the SCN it is mined. Some object information is also sent to the Apply database for replicated objects , again so that the object detail is known at any point in time.

If there is DDL activity (such as tables being dropped or truncated) taking place against the database, these operations will be recorded in the MVDD. Also, if columns are being dropped or added, the MVDD will hold this information to give precise detail of what the object is at a point in time.
GoldenGate: GoldenGate does not have an MVDD as Streams does. However Goldengate still faces the same challenge of how to translate object ids, columns ids, partition numbers, etc. back into high level meaningful information.  If tables are different on source and destination DEFGEN may need to be run to allow these differences to be reconciled.

If an Extract process is stopped, transaction changes occur to an object, the object is dropped then recreated, GoldenGate is still able to capture these transaction changes after it is restarted. In order to do this, DDL support needs to be installed which comes from the ddl_setup.sql script ; the tables created are used by Extract to determine what the related object, etc. was . Oracle's data dictionary only reflects the current state of the database but does show the history of DDL activity that has taken place. At the time a statement is processed by Extract, the object may actually have changed or have been dropped altogether. This however, should not affect the ability to process all activity on an object up until the point it was removed on a source database.
In order to maintain the above information , GoldenGate creates a 'before ddl on database' trigger which can record details at the time the operation starts.
Instantiating Objects
Instantiation in a replication environment is basically the same process for Streams and GoldenGate  . The aim is to have an exact copy of the source tables at the destination database before the Capture or Extract process is started.

A brief comparision of the activity is detailed as follows . Details have been expanded so that it is evident what needs to be considered in the process. Allowance is given to an active / busy database in details .
Streams: Streams: The Streams Administration guide is the definitive location to reference this activity as there are significantly more details presented there.

- perform a build and/or create the Capture process to create the MVDD;
- prepare all the relevant objects which are involved in replication which will result in an MVDD entry on the Apply side database being populated correctly. This step has to come after the creation of the Capture process . The prepare call also adds supplemental logging for the object;
- if there are txns active against the object being prepared, the operation will hang until the txn has completed;
- once completed, the current system SCN or flashback SCN can be determined and with this a copy of the objects can be moved to the Apply database , as follows :
- select dbms_flashback.get_system_change_number from dual;
- perform an export, specifying the flahsback scn and import into the destination database; or
- use the flashback scn restore operation and subsequent incomplete recovery at the destination;
- specify the flashback scn as the instantion scn for Apply side tables and then start the replication processes.
GoldenGate: The equivalent to the above would be as follows :

- there is no MVDD, therefore there is no build operation;
- prepare is not relevant as there is no MVDD;
- Add supplemental logging for all tables on the source using add trandata; adding supplemental requires that there are no active transactions running against the table therefore should this hang for an extended duration it must mean that there is an active transaction ; these need to be killed off if this is the case - as would be the case with Streams as well;
- Record the active transactions just before the Extract is created (gv$transaction) - just as a precaution (as mentioned in
Note:1276058.1;
- Create the Extract process.

Note : As is the case with Streams Capture , we want to make sure that there are no active transactions running on the system at the time when we create the Extract process and then determine the flashback SCN . Once we have the flashback SCN we can commence with the instantiation process. It would be impossible for the Extract process to capture complete changes in the trail file connected with active txns which occurred before it existed !.

- Once transactions have been completed (or killed off), the data can be instantiated :
- get the current system change number (SCN);
- select dbms_flashback.get_system_change_number from dual;
- export specifying the flashback scn and import into the destination database; or
- use the flashback scn in the incomplete recovery restore at the destination;
- the Replicat process can then be started using the
AFTERSCN which maps to the flashback SCN.

More details are outlined in :
Note:1276058.1.
Checkpoint Operation
Both Streams and GoldenGate perform checkpoint operations. The objective of checkpointing is to avoid the Capture or Extract process having to go back to an old logfile and remining  perhaps a number of days worth of redo; worse still, if this does happen a redo logfile may be no longer be available.

Streams and Goldengate do things slightly differently in this area but the intention is the same : to record the state of activity at periodic intervals.
Streams: As discussed in Note:418755.1 Streams uses logminer checkpointing. Checkpointing can be controlled as discussed in the Oracle product documentation as well as in Note:335516.1 - Master Note for Streams Performance Recommendations. Checkpointing is done by the Capture process and is not done on a time basis but is driven by the amount of redo processed ; therefore, if the system is relatively inactive, it would seem reasonable to perform checkpoints relatively infrequently.

The redo log which the Capture process will restart on is indicated by the required_checkpoint_scn column located in the dba_capture view. This value is also printed out in the Streams Healthcheck report - refer to :
Note:273674.1.
GoldenGate: GoldenGate does not use logminer. It maintains a number of separate checkpoint files which  record where a process has reached. For more information refer to the GoldenGate administration guide.

The different OGG checkpoint files are located under the dirchk/ location, e.g.

- extract related : *.cpe;
- extract pump (datapump) related : *.cpe;
- bounded recovery related : *.cpb; - refer to '
Bounded Recovery'
- replicat related : *.cpr

Note: the above are binary files - they should not be modified; they are detailed merely to help understand what is going on in this area of discussion.

Looking at the above in relation to the flow of transactions through GoldenGate, first of all we have the Extract checkpoint file. Extract checkpointing is extended by Bounded Recovery - refer to '
Bounded Recovery' discussed separately below.

In the Extract checkpoint file, the read position in the Oracle redo stream is recorded together with the write position in the trail. The read position relates to the start of the oldest active transaction in the Oracle redo logs which the Extract process is interested in. There are always going to be active transactions but if these are also long running, this would affect the read position and it would remain unchanged.
The Extract pump checkpoint file will record the read position in the trail file as well as the write position in the remote trail.

The Replicat checkpoint file will record the read position in the trail.
Bounded Recovery
There is detailed information in the GoldenGate reference guide relating to Bounded Recovery.
Streams: Streams: The idea of bounded recovery is to write out long running transactions to disk. This is done at the Apply side database. Therefore, Streams effectively does the same as GoldenGate.
GoldenGate: Bounded Recovery adds to the standard Extract checkpointing mechanism in Goldengate.
Its purpose is to write out long running transactions from the Extract cache so that the bounded recovery read checkpoint can advance.

Bounded recovery writes out to disk transactions which are older than 1 bounded recovery interval. When a bounded recovery checkpoint takes place , as well as writing to disk long running transactions , the mechanism also remembers the RBA in the Oracle redo of the oldest or earliest active transaction in the last bounded recovery interval. It is this which Extract will use as a restart point in the Oracle redo.

In the Bounded Recovery checkpoint file, we need to keep a record of the long running transactions which have been written out to disk . We also need to record the read position in the Oracle redo of the oldest active transaction in the last bounded recovery period which will be the point Extract needs to restart from.
Capture / Extract Restart
A consequence of the checkpoint mechanisms used by both Streams and GoldenGate mean that redo will inevitably have to be reprocessed.  Both Streams Capture and  GoldenGate extract can reprocess transaction data more than once ; these transaction will not be reapplied however. Refer to 'Apply / Replicat Commit'.
Streams: As Streams uses logminer checkpointing , on restart , it will go back to the redo log file associated with the required_checkpoint_scn on restart of the database . This value basically means that all committed and uncommitted transaction data in the redo logs at this SCN value do not need to be reprocessed; this also takes account of the fact that open transactions may have been written to the Apply spill table.

The required_checkpoint_scn is associated with logminer . Streams will likely restart capturing changes at a later SCN and will filter out changes which it has already processed. This has to take into account open and long running Apply spilled transactions as well as transactions which have been applied. Where changes are recaptured , they will be discarded by the Apply process as it maintains a record of what has been applied.
GoldenGate: When Extract restarts , it will need to go back to the read position in the Oracle redo logs found in the Bounded recovery checkpoint file or that in the Extract checkpoint file.

When Extract starts, it will first of all look at the Bounded Recovery checkpoint details to see if they can be used. If details are present , this will avoid having to go further back in the redo to the start time of the logfile containing the oldest active long running transaction. The long running transaction will already be partially collected and can be added to as Extract advances - Streams will also do much the same and add to Apply spilled transaction data if additional changes are found for the transaction.
Apply / Replicat Commit
Both Streams and GoldenGate know what transactions have been applied to the database.
Streams: Streams Apply records the transaction ids of committed transactions on the Apply database as they occur so that in the event of a transaction being retransmitted, in the event of a restart or recovery, the transaction does not need to be reapplied - it can be discarded. The committed transaction ids are recorded within the database .
Streams Apply can run in parallel. Therefore, there may be numerous transactions which have no dependencies executing at the same time across the Apply servers. When these commit , the transaction ids are recorded for recovery purposes.
GoldenGate: GoldenGate: GoldenGate Replicat executes serially . The checkpoint table in the database is updated once the transaction has been applied to the database , prior to commit to the database. This guarantees that the transaction and the pointer to the Replicat's read location in the trail (pointing to the next transaction) is updated at the time of the commit. The use of a checkpoint table ensures that Replicat will not attempt to reapply a transaction which has already been applied.
Purging Activity
Streams and GoldenGate both purge details in the background periodically. The main activities are summarised below. As well as the details outlined , redo logs can be removed (after backup) once they have been accessed and are not longer required by Streams or GoldenGate.
Streams: Due to the checkpoint mechanism , Streams has to periodically purge out checkpoint data. This activity is performed the Capture process.
GoldenGate: GoldenGate needs to purge trail files (PURGEOLDEXTRACTS). This activity is normally managed by the Manager but can be done by extracts and replicats (a practice that is discouraged). Checkpointing in GoldenGate amounts to ensuring that pointers to relevant locations are up to date. GoldenGate may also need to purge discard files if there are errors.
Data Convergence and Conflict Resolution
Even with the best planned strategy for avoiding errors, conflict errors may eventually occur. If there are multiple locations where data is being changed, one site should own the ability to make change at any one time. If multiple sites can change the same rows at the same time , conflict resolution should be adopted.
It is important to recognise exactly what happens when changes are made to tables configured in replication and what can be expected. Streams and GoldenGate behave slightly differently in this area as will be shown below. The examples which follow are based on a simple table as detailed with replication taking place from source -> destination. The following highlights how data can diverge and the relevance of supplemental logging in the case of an update operation. 

No comments:

Post a Comment