Quantcast
Channel: Jimmy He – OracleBlog
Viewing all 129 articles
Browse latest View live

老书分享《让世界更美好:塑造了一个世纪和一家公司的理念》(IBM百年科技创新)

$
0
0

今天整理dropbox文件夹的时候,发现还保留着老东家IBM百年诞辰的时候,发的一本名为《让世界更美好:塑造一个世纪与一家公司的理念》的专刊。

百年老店自然有他存活百年的道理,当年这本书发给员工的时候,不知是基于成本控制的考虑,还是员工激励的考虑,并不是所有员工都能拿到纸质书。我只是“有幸”拿到了它的电子版。下面分别是中文版和英文版的两本书,分享给各位IBM的粉丝。

下载中文版:『2011_09_02_2301_让世界更美好_PDF』(File size:53MB)

下载英文版:『2011_06_12_3144_Centennial_book__eBook_format』(File size:31MB)


关于物理Dataguard切换导致索引坏块的问题

$
0
0

在11.2.0.2之后,有一个非常重要的dataguard的patch。在使用物理dataguard环境(包括ADG),进行switchover之后,存在导致index block上的invalid SCNs的坏块问题。

国内已经好几个行业的大客户,都遇到了这个问题。

在index block中的失效的ITL commit SCN,会违反scn依赖性检查,从而可能抛出如下报错:

ORA-1555
ORA-600 [2663]
ORA-600 [kdsgrp1]
ORA-600 [ktbdchk1: bad dscn]

注:此bug只是影响index block,不影响data block,所以不会造成数据丢失。可以通过重建索引修复。

但是如果你的索引很大,那么修复可能需要一段时间。

解决方法也历经多个补丁,一开始是Patch 8895202,后来是Patch 13513004,现在是Patch 22241601。

打了Patch 22241601这个补丁之后,不再需要手工设置_ktb_debug_flags=8,因为打完补丁自动设置了_ktb_debug_flags=8。在index block cleanout的时候,oracle会自动修复index block上的invalid SCNs的问题。同时,你可以在alertlog中看到相关修复的提示:

Healing Corrupt DLC ITL objd:%d objn:%d tsn:%d rdba:<rdba> itl:%d
 option:%d xid:<xid> cmtscn:<scn> curscn:<scn>

如果你目前的PSU只是打到11.2.0.4.8之后,11.2.0.4.161018之后,那么目前这个补丁只有linux X86和linux X86-64版本;如果你已经打PSU到11.2.0.4.161018了,那么目前基本全平台都有这个补丁。

对于没有补丁的平台,建议通过设置隐含参数_ktb_debug_flags=8来解决。

参考:
ALERT Bug 22241601 ORA-600 [kdsgrp1] / ORA-1555 / ORA-600 [ktbdchk1: bad dscn] / ORA-600 [2663] due to Invalid Commit SCN in INDEX (Doc ID 1608167.1)

Oracle支持在docker上跑oracle数据库了

$
0
0

Oracle开始支持在docker上oracle数据库了,注意,是单实例,RAC不支持。docker的操作系统需要时候Oracle linux 7或者RHEL 7。

Oracle will support customers running Oracle Database (single instance) in Docker containers. Oracle will only provide support when running the database in Docker containers running on Oracle Linux and Red Hat RHEL. Supported versions of these Linux distributions are

Oracle Linux 7
Red Hat Enterprise Linux 7 (RHEL)

Oracle does not support Oracle Database running in a Real Application Clusters (RAC) configuration in Docker containers.


另外,在github上也有一个非官方的如何制作oracle database on docker的image。

参考:
Oracle Support for Database Running on Docker (Doc ID 2216342.1)

下一站,DJI大疆

$
0
0

不少朋友可能已经知道我离职了,是的,我已经不在ORACLE公司,我去了大疆(DJI)。

加入大疆是个非常巧合的机缘。

2016年9月27日,大疆在纽约发布了他们的新的无人机:御(Mavic Pro)。



是的,就是这个小家伙,让我有了和当初第一眼见iPhone一代时候的感觉(我也是iPhone一代的用户),觉得这东西太让我震惊了,太值得拥有了。加上我平时也喜欢玩摄影,无人机的上帝视角更是让我趋之若鹜,顿时大喊shut up, take my money,我立马下单订购了这个小东西。

如果你是第一批订购Mavic的用户,必然知道这是很长的一次购物体验,由于产能有限,第一批Mavic需要等待6~8周才能到货。在等待的过程中,我也follow了他们的微信公共帐号,没事上上论坛,阅读一下说明书,和飞友们一起交流一下经验。

在等待了一个半月之后,大疆的微信公共帐号发布了一则消息,11月19日是大疆的open day,可以参加他们的活动,介绍企业文化,吃吃小点心,和工程师聊天,同时也可以投简历加入大疆。而我一直还在等待Mavic的到货,于是怀着“顺便去问问Mavic什么时候能到货”的心态,我报名了这次活动,投了简历。我当时也在大疆的论坛发了这则消息,问问有没有同路人。

过去之后,没想到上午就开始了招聘活动,经过了4轮面试,一切都挺顺利,大疆与我都挺认可对方的。他们问我,为什么想换工作,我也说了大实话,我觉得大疆能做出这样令人赞叹的无人机,非常了不起,也希望能为做出这样了不起的产品的公司尽一份力。

另外,也非常感谢大疆的DBA对我的极力推荐,让我脱颖而出。

于是我从去“追货”变成了被“招安”。

而更加巧合的是,我在平安的另外一位好朋友,也去了大疆。一天中午,他得知我将要离职的消息,问我下一家公司是哪里,我说我先保密,后面再公布好了,他说可以说说是哪个行业吗,我想了想,说如果说了是哪个行业,基本你就能猜到是哪家公司了,他说是不是大疆?我吃了一惊,没想到他一猜就中了,他把我叫到一旁,对我露出了一个神秘的微笑,说他也将要去大疆,呵呵……

没想到我们没在平安成为同事,在大疆却成了同事。真是无巧不成书!好期待与他的再次合作。

人的一生可能就只会遇到几个转变的机会,错过了就没了。这次离开ORACLE,加入大疆,也是对我一次比较大的转变,因为我不再局限在ORACLE产品,会有更多开源技术的挑战;不再是单兵作战,而是和团队一起努力,完成目标;不再是作为乙方、作为顾问的指导角色,而是真正的owner,完全对自己的系统负责。

想到这些转变和挑战,我心中既有不安,也充满热情。感谢过去4年多以来,ORACLE对我的磨练和培养,也感谢平安特别是汪总对我的支持和肯定,您一直是我的榜样。下一站,DJI,我来了!

Known Issues for Database and Query Performance Reported in 12C

$
0
0

LAST UPDATE:

Dec 13, 2016

APPLIES TO:

Oracle Database - Enterprise Edition - Version 12.1.0.1 to 12.1.0.2 [Release 12.1]
Information in this document applies to any platform.

PURPOSE:

In recent quarters, during non-code closure bug review for Performance, it was found that more than 50% of duplicates (36,96) closed by Sustaining Engineering (SE) belonged to version 12.1. As a result this document was created to list most commonly reported and share this with Tier-1 engineers so that they could check this before logging any new bugs. Thus the Goal of this document is to provide a list of known bugs reported in 12C for database & SQL performance for support analysts to check before logging new bugs.

In addition to the bugs listed in this article, there is also a useful list of Documented Database Bugs With High "Solved SR" Count where a high number of Service Requests means that there are more than 50 "Solved SR" entries for the bug - ie: all bugs in this doc have > 50 "Solved SR" entries.” :

Document 2099231.1 Documented Database Bugs With High "Solved SR" Count



DETAILS:
Query Optimizer / SQL Execution

Bug 20271226 - QUERY INVOLVING VIRTUAL COLUMN AND PRIMARY KEY CONSTRAINT CRASHES
Document 19046459.8 - Inconsistent behavior of OJPPD
Document 22077191.8 - Create Table As Select of Insert Select Statement on 12.1 is Much Slower Than on 11.2
Bug 20597568 - PJE INCORRECTLY APPLIED TO A CONNECT BY QUERY
Document 18456944.8 - Suboptimal plan for query fetching ROWID from nested VIEW with fix 10129357
Bug 20355502 - QUERY PARSE NEVER ENDS ( BURNING CPU ALL THE TIME)
Document 20355502.8 Limit number of branches on non-cost based OR expansion
Document 20636003.8 - Slow Parsing caused by Dynamic Sampling (DS_SRV) queries in 12.1.0.2
Bug 22513913 - QUERY FAILS WITH ORA-07445: [KKOSJT()+281] WHEN USING DATABASE LINK
Bug 22020067 - UPGRADE 11.2.0.2 TO 11.2.0.4 SCALAR SUBQUERY UNNEST DISABLED
Bug 21099502 - JPPD Not Happening In UNION ALL View Having Group-by and Aggregates
Bug 22862828 - Regression with 22706363, JPPD Does Not Occur Despite the Fix Control 9380298 is ON
Bug 19295003 - High CPU While Parsing Huge SQL [kkqtutlTravOptAndReplaceOJNNCols]
Document 21091518.8 - Suboptimal plan for SQLs using UNION-ALL with bug fix 18304693 enabled
Document 22113854.8 - Query Against ALL_SYNONYMS Runs Slow in 12C
Document 21871902.8 - SELECT query fails with ORA-7445 [qerixRestoreFetchState2] - superseded by
Document 22255113.8 - High parse time with high memory and CPU usage
Document 22339954.8 - Bug 22339954 - High Parse Time for Query Using UNPIVOT Operator
Document 19490852.8 - Excessive "library cache lock" waits for DML with PARALLEL hint when parallel DML is disabled
Bug 20226806 - QUERY AGAINST ALL_CONSTRAINTS AND ALL_CONS_COLUMNS RUNS SLOWER IN 12.1.0.2
duplicate of Bug 20355502 - QUERY PARSE NEVER ENDS ( BURNING CPU ALL THE TIME)
Document 20118383.8 - Long query parse time in 12.1 when many histograms are involved - superseded by
Document 18795224.8 - Hard parse time increases with fix 12341619 enabled, fixed in 12.2
Document 19475484.8 - Cardinality of 0 for range predicate with fix for bug 6062266 present (default), fixed in 12.2
Document 2182951.1 Higher Elapsed Time after Updating from 11.2.0.4 to 12.1.0.2
Document 18498878.8 - medium size tables do not cached consistently causing unnecessary waits on direct path read, fixed in 12.2
Bug 23516956 - DYNAMIC SAMPLING QUERIES CONTAINS INCORRECT HINTS
duplicate of Document 19631234.8 - Suboptimal execution plan for Dynamic Sampling Queries
Bug 20503656 - Remote SQLs having group by with bind variables throw ORA-00979, fixed in 12.2
Bug 19047578 - Optimizer No Longer Uses Function Based Index When CURSOR_SHARING=FORCE
Document 22734628.8 - Wrong results from UNION ALL using OJPPD with cost based transformation disabled, fixed in 12.2
Bug 21839477 - ORA-7445 [QESDCF_DFB_RESET] WITH FIX FOR BUG:20118383 ON, fixed in 12.2
Document 20774515.8 - Wrong results with partial join evaluation, fixed in 12.2
Bug 21303294 - Wrong Result Due to Lost OR Predicate in Bitmap Plan, fixed in 12.2
Document 22951825.8 - Wrong Results with JPPD, Concatenation and Projection Pushdown, fixed in 12.2
Bug 19847091 - HUGE INLIST SQL HANGS DURING PARSE
superseded by Bug 20384335 - CPU REGRESSION DUE TO PLAN CHANGE IN ONE SELECT SQL WITH IN PREDICATE, fixed in 12.2


SQL Plan Management (SPM)

Document 20877664.8 - SQL Plan Management Slow with High Shared Pool Allocations
Document 21075138.8 - SPM does not reproduce plan with SORT UNIQUE, affected from 11.2.0.4 and fixed in 12.2 only
Document 20978266.8 - SPM: SQL not using plan in plan baselines and plans showing as not reproducible - superseded, fixed in 12.2
Document 19141838.8 - ORA-600 [qksanGetTextStr:1] from SQL Plan Management after Upgrade to 12.1
Document 18961555.8 - Static PL/SQL baseline reproduction broken by fix for bug 18020394, fixed in 12.2
Document 21463894.8 - Failure to reproduce plan with fix for bug 20978266, fixed in 12.2
Document 2039379.1 ORA-04024: Self-deadlock detected while trying to mutex pin cursor" on AUD_OBJECT_OPT$ With SQL Plan Management (SPM) Enabled


Wrong Results

Document 20214168.8 - Wrong Results using aggregations of CASE expression with fix of Bug 20003240 present
Bug 21971099 - 12C WRONG CARDINALITY FROM SQL ANALYTIC WINDOWS FUNCTIONS
Document 16191689.8 - Wrong resilts from projection pruning with XML and NESTED LOOPS
Bug 18302923 - WRONG ESTIMATE CARDINALITY IN HASH GROUP BY OR HASH JOIN
Bug 21220620 - WRONG RESULT ON 12C WHEN QUERYING PARTITIONED TABLE WITH DISTINCT CLAUSE
Document 22173980.8 - Wrong results (number of rows) from HASH join when "_rowsets_enabled" = true in 12c (default)
Bug 20871556 - WRONG RESULTS WITH OPTIMIZER_FEATURES_ENABLE SET TO 12.1.0.1 OR 12.1.0.2
Bug 21387771 - 12C WRONG RESULT WITH OR CONDITION EVEN AFTER PATCHING FOR BUG:20871556
Bug 22660003 - WRONG RESULTS WHEN UNNESTING SET SUBQUERY WITH CORRELATED SUBQUERY IN A BRANCH
Bug 22373397 - WRONG RESULTS DUE TO MISSING PREDICATE DURING BITMAP OR EXPANSION
Bug 22365117 - SQL QUERY COMBINING TABLE FUNCTION WITH JOIN YIELDS WRONG JOIN RESULTS
Bug 20176675 - Wrong results for SQLs using ROWNUM<=N when Scalar Subquery is Unnested (VW_SSQ_1)
Document 19072979.8 - Wrong results with HASH JOIN and parameter "_rowsets_enable"
Document 22338374.8 - ORA-7445 [kkoiqb] or ORA-979 or Wrong Results when using scalar sub-queries
Document 22365117.8 - Wrong Results / ORA-7445 from Query with Table Function
Bug 19318508 - WRONG RESULT IN 12.1 WITH NULL ACCEPTING SEMIJOIN - fixed in 12.2
Bug 21962459 - WRONG RESULTS WITH FIXED CHAR COLUMN AFTER 12C MIGRATION WITH PARTIAL JOIN EVAL
Bug 23019286 - CARDINALITY ESTIMATE WRONG WITH HISTOGRAM ON COLUMN GROUP ON CHAR/NCHAR TYPES, fixed in 12.2
Bug 23253821 - Wrong Results While Running Merge Statement in Parallel, fixed in 12.2
Document 18485835.8 - Wrong results from semi-join elimination if fix 18115594 enabled, fixed in 12.2
Document 18650065.8 - Wrong Results on Query with Subquery Using OR EXISTS or Null Accepting Semijoin, fixed in 12.2
Document 19567916.8 - Wrong results when GROUP BY uses nested queries in 12.1.0.2 - superseded by
Document 20508819.8 - Wrong results/dump/ora-932 from GROUP BY query when "_optimizer_aggr_groupby_elim"=true - superseded by
Document 21826068.8 - Wrong Results when _optimizer_aggr_groupby_elim=true
Document 20634449.8 - Wrong results from OUTER JOIN with a bind variable and a GROUP BY clause in 12.1.0.2
Bug 20162495 - WRONG RESULTS FROM NULL ACCEPTING SEMI-JOIN, fixed in 12.2.


Statistics

Bug 22081245 - COPY_TABLE_STATS DOES NOT WORK PROPERLY FOR TABLES WITH SUB PARTITIONS
Bug 22276972 - DBMS_STATS.COPY_TABLE_STATS INCORRECTLY ADJUST MIN/MAX FOR PARTITION COLUMNS
Document 19450139.8 Slow gather table stats with incremental stats enabled
Bug 21258096 - UNNECESSARY INCREMENTAL STATISTICS GATHERED FOR UNCHANGED PARTITIONS DUE TO HISTOGRAMS
Bug 23100700 - PERFORMANCE ISSUE WITH RECLAIM_SYNOPSIS_SPACE, fixed in 12.2
Document 21171382.8 - Enhancement: AUTO_STAT_EXTENSIONS preference on DBMS_STATS - Enhancement To turn off automatic creation of extended statistics in 12c
Document 21498770.8 - automatic incremental statistics job taking more time with fix 16851194, fixed in 12.2
Document 22984335.8 - Unnecessary Incremental Partition Gathers/Histogram Regathers


Errors

Document 20804063.8 ORA-1499 as REGEXP_REPLACE is allowed to be used in Function-based indexes (FBI)
Document 17609164.8 ORA-600 [kkqctmdcq: Query Block Could] from ANSI JOIN with no a join predicate
Document 21482099.8 ORA-7445 [opitca] or ORA-932 errors from aggregate GROUP BY elimination
Document 21756734.8 - ORA-600 [kkqcsnlopcbkint : 1] from cost based query transformation
Bug 22566555 - ORA-00600 [KKQCSCPOPNWITHMAP: 0] FOR SUBQUERY FACTORING QUERY WITH DISTINCT
Bug 21377051 - ORA-600 [QKAFFSINDEX1] FROM DELETE STATEMENT
Document 21188532.8 - Unexpected ORA-979 with fix for bug 20177858 - superseded by
Bug 23343584 - GATHER_TABLE_STATS FAILS WITH ORA-6502/ORA-6512
duplicate of Bug 22928015 - GATHER DATABASE STATS IS RUNNING VERY SLOWLY ON A RAC ENVIRONMENT
Document 21038926.8 - ORA-7445 [qesdcf_dfb_reset] with fix for bug 20118383 present, fixed in 12.2
Document 18405192.8 - ORA-7445 under qervwFetch() or qervwRestoreViewBufPtrs() when deleting from view with an INSTEAD OF trigger
Document 21968539.8 - ORA-600 [kkqcscpopnwithmap: 0]
Document 18201352.8 - ORA-7445 [qertbStart] or similar errors from query with DISTINCT and partial join evaluation (PJE) transformation
Document 19472320.8 - ORA-600 [kkqtSetOp.1] from join factorization on encrypted column
Bug 22394273 - ORA-600 [QESMMCVALSTAT4] FROM SELECT STATEMENT
Document 21529241.8 - DBMS_STATS ORA-6502: "PL/SQL: numeric or value error"
Document 20505851.8 - ORA-600 / ORA-7445 qksqbApplyToQbcLoc from WITH query that has duplicate predicates
Bug 19952510 - ORA 600 [QERSCALLOCATE: BUFFER SIZE LIMIT] AND ORA 600 [KEUUXS_1]
Bug 21274291 - ORA-600 :[17147] AND [ORA-600 :[KGHALF] DURING AN INSERT
Bug 21524270 - ORA-28115 ON ORACLE 12C WITH VPD IN MERGE STMT, BUT INSERT SUCCEEDS
Bug 20660917 - INSTEAD OF TRIGGER IS FIRED TWICE INSTEAD OF ONCE WITH FIX TO 13812807, Causes ORA-07445 [kxtifir] in 12.1.0.2 - fixed in 12.2.


Adaptive Query Optimization

Bug 18745156 - ADAPTIVE DYNAMIC SAMPLING CAUSES BAD CARDINALITY ESTIMATE
Bug 16033838 - TST&PERF: ADAPTIVE JOINS LEADS TO WORSE PLAN FOR QUERY B4TKVSMPFSP85
Document 20465582.8 High parse time in 12c for multi-table join SQL with SQL plan directives enabled - superseded
Document 20807398.8 ORA-600 [kgl-hash-collision] with fix to Bug 20465582 installed
Document 18430870.8 Adaptive Plan and Left Join Give Wrong Result
Bug 21912039 : GSIAS: PERF REGRESSION: ADAPTIVE PLAN SQL_ID DN50XQR69FJ0F GETS WORSE
  duplicate of Bug 20243268 : EM QUERY WITH SQL_ID 4RQ83FNXTF39U PERFORMS POORLY ON ORACLE 12C RELATIVE TO 11G
Bug 19731829 - ISSUES WITH PACK AND UNPACK OF SQL PLAN DIRECTIVES
Document 20370037.8 - KGLH0 growth leading to ORA-4031 by cardinality feedback, fixed in 12.2
Document 20413540.8 - Excessive executions of SQL frjd8zfy2jfdq


Library cache / Rowcache / Shared Cursors / Dictionary / Buffer cache

Document 19450314.8 Unnecessary invalidations in 12c
Bug 21153142 - ROW CACHE LOCK SELF DEADLOCK ACCESSING SEED PDB
Bug 22081947 - ORA-4023 MAY OCCUR IN ACTIVE DATA GUARD ENVIRONMENTS
Bug 12387079 - THE CHILD CURSOR IS INCREASED WHEN THE CURSOR IS KEPT AND EXECUTED DDL
  superseded by Bug 19239846 - FIX FOR Bug 12387079 NEEDS TO BE RE-WORKED
Bug 12320556 - HIGH VERSION COUNTS OCCUR DUE TO AUTH_CHECK_MISMATCH, INSUFF_PRIVS_REM=Y
  superseded by Bug 21515534 - QUERY USING 2 REMOTE DB NOT SHARED - AUTH_CHECK_MISMATCH ,INSUFF_PRIVS_REM
Document 21515534.8 Query referencing multiple remote databases not shared with reason AUTH_CHECK_MISMATCH INSUFF_PRIVS_REM
Document 20907061.8 High number of executions for recursive call on col$
Document 20476175.8 High VERSION_COUNT (in V$SQLAREA) for query with OPT_PARAM('_fix_control') hint
Bug 21799609 - ORA-04024 DEADLOCK ON STARTUP ON SYS.ECOL$ ON LOAD HISTOGRAMS
  duplicate of Bug 19469538 - LOADING OF DATA IN ECOL$ TABLE IS NON-OPTIMAL FOR QUERIES
Bug 22586498 - HUGE M000 TRACEFILE DUE TO OBSOLETE CURSOR DUMPS
Document 20906941.8 - DBW0 spins with high CPU
Bug 17700290 - TST&PERF: LIBRARY CACHE MUTEX WAIT TIME INCREASED HUGELY IN MAIN - affected 12.1.0.2, fixed in 12.2 only
Bug 22733141 - GATHERING STATS ON X$KQLFBC HANGS
Bug 23098370 - DBWR CAN ISSUE MORE THAN SSTMXIOB IO'S
Bug 23103188 - Incorrect ORA-1031 or ORA-942 during Grant due to 22677790, fixed in 12.2
Bug 19340498 - CDB:NO ROWS RETURNED WHEN QUERYING V$SQL_SHARED_MEMORY INSIDE A PDB
Bug 23514710 - CROSS-CONTAINER FIXED TABLE NOT OBSOLETED WHEN PARENT IS MARKED OBSOLETE in ADG Env
Document 19392364.8 - Process spin in kkscsFindMatchingRange() with adaptive cursor sharing, fixed in 12.2
Document 2096561.1 - High Amount Of Shared Memory Allocated Into KGLH0 Heap In 12.1.0.2
Document 2119923.1 - Shared Pool from KGLH0 constantly growing causing ORA-04031 and Latch contention
Document 23168642.8 - Sporadic ORA-600 [kglunpin-bad-pin] / ORA-14403 / high 'library cache: mutex x' using partitions and deferred seg creation / ORA-600 [qesmascTrimLRU_1] with fix 21759047, fixed in 12.2
Document 21759047.8 - High 'library cache: mutex x' Contention when Many Sessions are Executing Cursors Concurrently Against a Partitioned Table - superseded, fixed in 12.2
Document 14380605.8 - High "library cache lock", "cursor: pin S wait on X" and "library cache: mutex X" waits, fixed in 12.2
Document 19822816.8 - High parse time for SQL with PIVOT and binds (can block LGWR on "library cache lock"), fixed in 12.2
Document 13542050.8 - 'library cache: mutex X' waits : A mutex related hang with holder around 65534 (0xfffe), fixed in 12.2
Document 21834574.8 - Mutex contention and increased hard parse time due to ILM (segment-access tracking) checking with partitioned objects, fixed in 12.2
Document 19790972.8 - "library cache lock" waits due to DBMS_STATS gather of stats for a subpartition, fixed in 12.2


Server Manageability (SVRMAN)

Bug 21521882 - SQLT CAUSES ORA-00600: [KGHSTACK_UNDERFLOW_INTERNAL_1]
  duplicate of Bug 19593445 - SR12.2CDBCONC3 - TRC - KPDBSWITCH
Document 18148383.8 AWR snapshots stopped , MMON hung with "EMON to process ntnfs" - affected 12.1.0.1, fixed in 12.1.0.2
Bug 20976392 - AWR REPORT: NEGATIVE OR WRONG VALUES IN %TOTAL OF UNOPTIMIZED READS
Document 2016112.1 - Replaying a PRE-12C Capture On 12.1.0.2.0 Encounters Unexpected ORA-1000 Errors
Bug 21117072 - DBREPLAY PATCH BUNDLE 1 FOR 12.1.0.2 , RAT Patch Bundle 1 for 12.1.0.2 is mandatory LGWR
Bug 23501117 - WORKLOAD REPLAY SKIPS USERCALLS FOR WORKLOAD CAPTURED UNDER 11.2.0.4
Document 22345045.8 - ORA-600 [kewrspbr_2: wrong last partition] in AWR internal tables after upgrading to 12.1.0.2
Multiple LGWR

Document 1968563.1 Hang Waiting on 'LGWR worker group ordering' with Deadlock Between LGWR (Log Writer) Slaves (LG0n) when Using Multiple LGWR Processes
Document 1957710.1 ALERT: Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium - ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600


Exadata/In-memory/Multitenant

Document 21553476.8 Wrong Results using aggregations of CASE expression with fix of Bug 20003240 present in Exadata
Bug 19130972 - EXTREMELY LONG PARSE TIME FOR STAR QUERY (For In-memory DB, fixed in 12.1.0.2 DBBP 3)
Bug 17439841 - IMC DYNAMIC SAMPLING CAUSE "CURSOR: PIN S WAIT ON X" ON PARALLEL QUERY - affected 12.1.0.2, fixed in 12.1.0.2
Bug 21445204 - PARSING OF A QUERY TAKES LONG TIME WITH IMC
Document 21153142.8 - Row cache lock self deadlock accessing seed PDB, fixed in 12.2.
Bug 20766944 - QUERIES INVOLVING INT$DBA_CONSTRAINTS TAKE A LOT OF TIME, fixed in 12.2.
Document 17805926.8 - Parameter changes at PDB level affect other PDBs, fixed in 12.2


Popular Documents for Known issues/Bugs

Document 2034610.1 Things to Consider Before Upgrading to 12.1.0.2 to Avoid Poor Performance or Wrong Results
Document 2035898.1 Patches to Consider for 12.1.0.2 to Avoid Problems with SQL Plan Management (SPM)
Document 2107602.1 Things to Consider When Using Incremental Statistics
Document 1683802.1 12.1.0.2 Patch Set - List of Bug Fixes by Problem Type
Document 1924126.1 12.1.0.2 Patch Set Updates - List of Fixes in each PSU


Port-Specific

Document 1970525.1 Things to Consider to Avoid RDBMS Performance Problems on SPARC Platforms Using Oracle 12.1.0.2


Reference documents covering 12c new features

The following list of documents cover many of the features that have been introduced or enhanced in 12c and as provided to aid re-discovery:

Document 2031605.1 Adaptive Query Optimization
Document 2002108.1 Dynamic Sampling Level Is Changed Automatically in 12C
Document 2033658.1 Dictionary Queries Running Slow in 12C PDBs
Document 2002089.1 High Latch Free Waits on 'Result Cache: RC Latch' In 12C when RESULT_CACHE_MODE = MANUAL
Document 2004828.1 ORA-20002: Unable To Export Table Stats Using DBMS_STATS Package In 12c Database
Document 2051004.1 ORA-12012 ORA-20000 During Automatic Statistics Collection Job in 12C With Concurrent Option Enabled
Document 1955319.1 Huge Trace Files Created Containing "----- Cursor Obsoletion Dump sql_id=%s -----" From 12.1.0.2
Document 2053877.1 Wrong Results (0 Rows) Returned for Query that Includes Subquery Containing AND ROWNUM when OPTIMIZER_ADAPTIVE_FEATURES = TRUE
Document 2041541.1 Gather_Database_Stats_Job_Proc Taking More Time in 12.1.0.2 Than 11.2.0.4


Documentation

Database SQL Tuning Guide 12.1: http://docs.oracle.com/database/121/TGSQL/toc.htm
Database Performance Tuning Guide: http://docs.oracle.com/database/121/TGDBA/toc.htm
Database Testing Guide (RAT): http://docs.oracle.com/database/121/RATUG/toc.htm



参考:
Commonly Reported Known Issues for Database and Query Performance Reported in 12C (Doc ID 2097793.1 INTERNAL)
Documented Database Bugs With High “Solved SR” Count (Doc ID 2099231.1 INTERNAL)

从航拍萧山机场飞机降落想到的……

$
0
0

自由是为自觉的人准备的,可往往少数不自觉的人破坏了大部分人的自由。

1月15日,有一段航拍杭州萧山机场飞机降落的视频,在飞友圈中激起很多反响。

随机,公安机关介入调查:

很快,公安部出台了《治安管理处罚法(修订公开征求意见稿)》,如无意外,一个月后将通过并生效。

飞友们所担心的“一刀切”的管理办法很快就会来临。所谓一刀切,就是没有得到允许,不能飞行。除非是在以下情况下:
1. 在室内运行的无人机;
2. 在视距内运行(半径≤500米;相对高度≤120米)的微型无人机(空机重≤7公斤);
3. 在人烟稀少、空旷的非人口稠密区进行试验的无人机

这些条件就大大减少了玩无人机的乐趣。如小于120米的高度,基本市区内的很多大楼都是这个高度,自动返航的时候,往往小于120米就会撞楼。

一刀切从本质上来说,就是懒政。

我试着和微信群里面的一些朋友聊天,我认为无人机是个新鲜事物,进步的事物,应该得到保护。而在后续制定规则时,传统的民航部门往往据有话语权。这是不合理的。新鲜事物应该得到保护,应该在规则的制定中更加积极的融入其中,扮演更多的角色。

电灯泡不应该被蜡烛送上绞刑台。

逆向思维的说,我无人机为什么要避让你民航,你民航有告诉我你的升降高度吗?当然,正常情况下,这样的话往往是民航说出来的。

我的想法是,这件事件上,我们不应该仅仅是指责当时在萧山机场拍飞机降落的飞友:
1.为什么要在机场附近飞无人机
2.就算你在机场附近飞,为什么还要拍航路上的飞机降落
2.就算你拍了航路上降落的飞机,为什么还要放到网上来

这种指责的逻辑,其实和指责陈冠希老师拍照片出名是一个道理,为什么你要拍;如果拍了,你就不应该保存在电脑上;就算保存在电脑上,你的电脑不应该拿去修。

我觉得正确的做法应该是无人机行业和民航,需要积极的合作,共同制定相关规则,并且通过国家的认证。如机场几公里不能飞,跑道延伸朝向的几公里不能飞,而不是大疆只是划了一个5公里的圈。飞机在几公里范围能不能下降到多高的高度,无人机在几公里的范围内不能超越多高的高度。

说起认证,当前我国的认证是“三国演义”,都是行业认证,不是国家标准:
1. 由民航总局发的AOPA证书。主要针对大于7kg,超过120米高度,500米距离的无人机。
2. 由国家体育总局发的ASFC证书。主要针对小型无人机,穿越机,航模等等,听说在上海地区比较认可该证书。
3. 有大疆旗下的慧飞公司颁发的UTC证书。目前只有植保机的证书,后续会增加摄影机的证书。

由于三家证书都仅仅是行业证书,所以都互相撕逼,互不承认对方。三家证书都有不少培训机构,诋毁对方,以获得圈钱的优势,已经不是什么奇怪的事情。

所幸,大疆将在技术上做出改进,将推出全新的ADS-B广播式预警系统,帮助航拍飞行器的操作人员避开民航客机。但这仅仅是在技术上,说到指定标准,还需要积极的和民航以及其他部门的配合。但是与肉食者鄙的部门配合,你也知道,在我国是一件非常困难的事情。请证明你妈是你妈给我看看。谢谢!

关于oradebug -prelim

$
0
0

在oracle数据库hang的情况下,我们可以用sqlplus -prelim / as sysdba登录数据库,进行一些收集信息的操作,也可以进行shutdown database的操作。这里需要注意几点:

1. process满是可以用sqlplus -prelim / as sysdba登录的

2. 从11.2.0.2开始,sqlplus -prelim / as sysdba是不能收集hanganalyze的信息,即使hanganalyze命令运行成功,但是在trace文件中看不到对应的信息,只能看到如下的报错:

ERROR: Can not perform hang analysis dump without a process state object and a session state object.
( process=(nil), sess=(nil) )

3. sqlplus -prelim / as sysdba可以收集process state dump,system state dump,dump errorstack,short_stack的操作。

参考:How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)

Known Issues for Database and Query Performance Reported in 12C

$
0
0

LAST UPDATE:

Dec 13, 2016

APPLIES TO:

Oracle Database - Enterprise Edition - Version 12.1.0.1 to 12.1.0.2 [Release 12.1]
Information in this document applies to any platform.

PURPOSE:

In recent quarters, during non-code closure bug review for Performance, it was found that more than 50% of duplicates (36,96) closed by Sustaining Engineering (SE) belonged to version 12.1. As a result this document was created to list most commonly reported and share this with Tier-1 engineers so that they could check this before logging any new bugs. Thus the Goal of this document is to provide a list of known bugs reported in 12C for database & SQL performance for support analysts to check before logging new bugs.

In addition to the bugs listed in this article, there is also a useful list of Documented Database Bugs With High "Solved SR" Count where a high number of Service Requests means that there are more than 50 "Solved SR" entries for the bug - ie: all bugs in this doc have > 50 "Solved SR" entries.” :

Document 2099231.1 Documented Database Bugs With High "Solved SR" Count



DETAILS:
Query Optimizer / SQL Execution

Bug 20271226 - QUERY INVOLVING VIRTUAL COLUMN AND PRIMARY KEY CONSTRAINT CRASHES
Document 19046459.8 - Inconsistent behavior of OJPPD
Document 22077191.8 - Create Table As Select of Insert Select Statement on 12.1 is Much Slower Than on 11.2
Bug 20597568 - PJE INCORRECTLY APPLIED TO A CONNECT BY QUERY
Document 18456944.8 - Suboptimal plan for query fetching ROWID from nested VIEW with fix 10129357
Bug 20355502 - QUERY PARSE NEVER ENDS ( BURNING CPU ALL THE TIME)
Document 20355502.8 Limit number of branches on non-cost based OR expansion
Document 20636003.8 - Slow Parsing caused by Dynamic Sampling (DS_SRV) queries in 12.1.0.2
Bug 22513913 - QUERY FAILS WITH ORA-07445: [KKOSJT()+281] WHEN USING DATABASE LINK
Bug 22020067 - UPGRADE 11.2.0.2 TO 11.2.0.4 SCALAR SUBQUERY UNNEST DISABLED
Bug 21099502 - JPPD Not Happening In UNION ALL View Having Group-by and Aggregates
Bug 22862828 - Regression with 22706363, JPPD Does Not Occur Despite the Fix Control 9380298 is ON
Bug 19295003 - High CPU While Parsing Huge SQL [kkqtutlTravOptAndReplaceOJNNCols]
Document 21091518.8 - Suboptimal plan for SQLs using UNION-ALL with bug fix 18304693 enabled
Document 22113854.8 - Query Against ALL_SYNONYMS Runs Slow in 12C
Document 21871902.8 - SELECT query fails with ORA-7445 [qerixRestoreFetchState2] - superseded by
Document 22255113.8 - High parse time with high memory and CPU usage
Document 22339954.8 - Bug 22339954 - High Parse Time for Query Using UNPIVOT Operator
Document 19490852.8 - Excessive "library cache lock" waits for DML with PARALLEL hint when parallel DML is disabled
Bug 20226806 - QUERY AGAINST ALL_CONSTRAINTS AND ALL_CONS_COLUMNS RUNS SLOWER IN 12.1.0.2
duplicate of Bug 20355502 - QUERY PARSE NEVER ENDS ( BURNING CPU ALL THE TIME)
Document 20118383.8 - Long query parse time in 12.1 when many histograms are involved - superseded by
Document 18795224.8 - Hard parse time increases with fix 12341619 enabled, fixed in 12.2
Document 19475484.8 - Cardinality of 0 for range predicate with fix for bug 6062266 present (default), fixed in 12.2
Document 2182951.1 Higher Elapsed Time after Updating from 11.2.0.4 to 12.1.0.2
Document 18498878.8 - medium size tables do not cached consistently causing unnecessary waits on direct path read, fixed in 12.2
Bug 23516956 - DYNAMIC SAMPLING QUERIES CONTAINS INCORRECT HINTS
duplicate of Document 19631234.8 - Suboptimal execution plan for Dynamic Sampling Queries
Bug 20503656 - Remote SQLs having group by with bind variables throw ORA-00979, fixed in 12.2
Bug 19047578 - Optimizer No Longer Uses Function Based Index When CURSOR_SHARING=FORCE
Document 22734628.8 - Wrong results from UNION ALL using OJPPD with cost based transformation disabled, fixed in 12.2
Bug 21839477 - ORA-7445 [QESDCF_DFB_RESET] WITH FIX FOR BUG:20118383 ON, fixed in 12.2
Document 20774515.8 - Wrong results with partial join evaluation, fixed in 12.2
Bug 21303294 - Wrong Result Due to Lost OR Predicate in Bitmap Plan, fixed in 12.2
Document 22951825.8 - Wrong Results with JPPD, Concatenation and Projection Pushdown, fixed in 12.2
Bug 19847091 - HUGE INLIST SQL HANGS DURING PARSE
superseded by Bug 20384335 - CPU REGRESSION DUE TO PLAN CHANGE IN ONE SELECT SQL WITH IN PREDICATE, fixed in 12.2


SQL Plan Management (SPM)

Document 20877664.8 - SQL Plan Management Slow with High Shared Pool Allocations
Document 21075138.8 - SPM does not reproduce plan with SORT UNIQUE, affected from 11.2.0.4 and fixed in 12.2 only
Document 20978266.8 - SPM: SQL not using plan in plan baselines and plans showing as not reproducible - superseded, fixed in 12.2
Document 19141838.8 - ORA-600 [qksanGetTextStr:1] from SQL Plan Management after Upgrade to 12.1
Document 18961555.8 - Static PL/SQL baseline reproduction broken by fix for bug 18020394, fixed in 12.2
Document 21463894.8 - Failure to reproduce plan with fix for bug 20978266, fixed in 12.2
Document 2039379.1 ORA-04024: Self-deadlock detected while trying to mutex pin cursor" on AUD_OBJECT_OPT$ With SQL Plan Management (SPM) Enabled


Wrong Results

Document 20214168.8 - Wrong Results using aggregations of CASE expression with fix of Bug 20003240 present
Bug 21971099 - 12C WRONG CARDINALITY FROM SQL ANALYTIC WINDOWS FUNCTIONS
Document 16191689.8 - Wrong resilts from projection pruning with XML and NESTED LOOPS
Bug 18302923 - WRONG ESTIMATE CARDINALITY IN HASH GROUP BY OR HASH JOIN
Bug 21220620 - WRONG RESULT ON 12C WHEN QUERYING PARTITIONED TABLE WITH DISTINCT CLAUSE
Document 22173980.8 - Wrong results (number of rows) from HASH join when "_rowsets_enabled" = true in 12c (default)
Bug 20871556 - WRONG RESULTS WITH OPTIMIZER_FEATURES_ENABLE SET TO 12.1.0.1 OR 12.1.0.2
Bug 21387771 - 12C WRONG RESULT WITH OR CONDITION EVEN AFTER PATCHING FOR BUG:20871556
Bug 22660003 - WRONG RESULTS WHEN UNNESTING SET SUBQUERY WITH CORRELATED SUBQUERY IN A BRANCH
Bug 22373397 - WRONG RESULTS DUE TO MISSING PREDICATE DURING BITMAP OR EXPANSION
Bug 22365117 - SQL QUERY COMBINING TABLE FUNCTION WITH JOIN YIELDS WRONG JOIN RESULTS
Bug 20176675 - Wrong results for SQLs using ROWNUM<=N when Scalar Subquery is Unnested (VW_SSQ_1)
Document 19072979.8 - Wrong results with HASH JOIN and parameter "_rowsets_enable"
Document 22338374.8 - ORA-7445 [kkoiqb] or ORA-979 or Wrong Results when using scalar sub-queries
Document 22365117.8 - Wrong Results / ORA-7445 from Query with Table Function
Bug 19318508 - WRONG RESULT IN 12.1 WITH NULL ACCEPTING SEMIJOIN - fixed in 12.2
Bug 21962459 - WRONG RESULTS WITH FIXED CHAR COLUMN AFTER 12C MIGRATION WITH PARTIAL JOIN EVAL
Bug 23019286 - CARDINALITY ESTIMATE WRONG WITH HISTOGRAM ON COLUMN GROUP ON CHAR/NCHAR TYPES, fixed in 12.2
Bug 23253821 - Wrong Results While Running Merge Statement in Parallel, fixed in 12.2
Document 18485835.8 - Wrong results from semi-join elimination if fix 18115594 enabled, fixed in 12.2
Document 18650065.8 - Wrong Results on Query with Subquery Using OR EXISTS or Null Accepting Semijoin, fixed in 12.2
Document 19567916.8 - Wrong results when GROUP BY uses nested queries in 12.1.0.2 - superseded by
Document 20508819.8 - Wrong results/dump/ora-932 from GROUP BY query when "_optimizer_aggr_groupby_elim"=true - superseded by
Document 21826068.8 - Wrong Results when _optimizer_aggr_groupby_elim=true
Document 20634449.8 - Wrong results from OUTER JOIN with a bind variable and a GROUP BY clause in 12.1.0.2
Bug 20162495 - WRONG RESULTS FROM NULL ACCEPTING SEMI-JOIN, fixed in 12.2.


Statistics

Bug 22081245 - COPY_TABLE_STATS DOES NOT WORK PROPERLY FOR TABLES WITH SUB PARTITIONS
Bug 22276972 - DBMS_STATS.COPY_TABLE_STATS INCORRECTLY ADJUST MIN/MAX FOR PARTITION COLUMNS
Document 19450139.8 Slow gather table stats with incremental stats enabled
Bug 21258096 - UNNECESSARY INCREMENTAL STATISTICS GATHERED FOR UNCHANGED PARTITIONS DUE TO HISTOGRAMS
Bug 23100700 - PERFORMANCE ISSUE WITH RECLAIM_SYNOPSIS_SPACE, fixed in 12.2
Document 21171382.8 - Enhancement: AUTO_STAT_EXTENSIONS preference on DBMS_STATS - Enhancement To turn off automatic creation of extended statistics in 12c
Document 21498770.8 - automatic incremental statistics job taking more time with fix 16851194, fixed in 12.2
Document 22984335.8 - Unnecessary Incremental Partition Gathers/Histogram Regathers


Errors

Document 20804063.8 ORA-1499 as REGEXP_REPLACE is allowed to be used in Function-based indexes (FBI)
Document 17609164.8 ORA-600 [kkqctmdcq: Query Block Could] from ANSI JOIN with no a join predicate
Document 21482099.8 ORA-7445 [opitca] or ORA-932 errors from aggregate GROUP BY elimination
Document 21756734.8 - ORA-600 [kkqcsnlopcbkint : 1] from cost based query transformation
Bug 22566555 - ORA-00600 [KKQCSCPOPNWITHMAP: 0] FOR SUBQUERY FACTORING QUERY WITH DISTINCT
Bug 21377051 - ORA-600 [QKAFFSINDEX1] FROM DELETE STATEMENT
Document 21188532.8 - Unexpected ORA-979 with fix for bug 20177858 - superseded by
Bug 23343584 - GATHER_TABLE_STATS FAILS WITH ORA-6502/ORA-6512
duplicate of Bug 22928015 - GATHER DATABASE STATS IS RUNNING VERY SLOWLY ON A RAC ENVIRONMENT
Document 21038926.8 - ORA-7445 [qesdcf_dfb_reset] with fix for bug 20118383 present, fixed in 12.2
Document 18405192.8 - ORA-7445 under qervwFetch() or qervwRestoreViewBufPtrs() when deleting from view with an INSTEAD OF trigger
Document 21968539.8 - ORA-600 [kkqcscpopnwithmap: 0]
Document 18201352.8 - ORA-7445 [qertbStart] or similar errors from query with DISTINCT and partial join evaluation (PJE) transformation
Document 19472320.8 - ORA-600 [kkqtSetOp.1] from join factorization on encrypted column
Bug 22394273 - ORA-600 [QESMMCVALSTAT4] FROM SELECT STATEMENT
Document 21529241.8 - DBMS_STATS ORA-6502: "PL/SQL: numeric or value error"
Document 20505851.8 - ORA-600 / ORA-7445 qksqbApplyToQbcLoc from WITH query that has duplicate predicates
Bug 19952510 - ORA 600 [QERSCALLOCATE: BUFFER SIZE LIMIT] AND ORA 600 [KEUUXS_1]
Bug 21274291 - ORA-600 :[17147] AND [ORA-600 :[KGHALF] DURING AN INSERT
Bug 21524270 - ORA-28115 ON ORACLE 12C WITH VPD IN MERGE STMT, BUT INSERT SUCCEEDS
Bug 20660917 - INSTEAD OF TRIGGER IS FIRED TWICE INSTEAD OF ONCE WITH FIX TO 13812807, Causes ORA-07445 [kxtifir] in 12.1.0.2 - fixed in 12.2.


Adaptive Query Optimization

Bug 18745156 - ADAPTIVE DYNAMIC SAMPLING CAUSES BAD CARDINALITY ESTIMATE
Bug 16033838 - TST&PERF: ADAPTIVE JOINS LEADS TO WORSE PLAN FOR QUERY B4TKVSMPFSP85
Document 20465582.8 High parse time in 12c for multi-table join SQL with SQL plan directives enabled - superseded
Document 20807398.8 ORA-600 [kgl-hash-collision] with fix to Bug 20465582 installed
Document 18430870.8 Adaptive Plan and Left Join Give Wrong Result
Bug 21912039 : GSIAS: PERF REGRESSION: ADAPTIVE PLAN SQL_ID DN50XQR69FJ0F GETS WORSE
  duplicate of Bug 20243268 : EM QUERY WITH SQL_ID 4RQ83FNXTF39U PERFORMS POORLY ON ORACLE 12C RELATIVE TO 11G
Bug 19731829 - ISSUES WITH PACK AND UNPACK OF SQL PLAN DIRECTIVES
Document 20370037.8 - KGLH0 growth leading to ORA-4031 by cardinality feedback, fixed in 12.2
Document 20413540.8 - Excessive executions of SQL frjd8zfy2jfdq


Library cache / Rowcache / Shared Cursors / Dictionary / Buffer cache

Document 19450314.8 Unnecessary invalidations in 12c
Bug 21153142 - ROW CACHE LOCK SELF DEADLOCK ACCESSING SEED PDB
Bug 22081947 - ORA-4023 MAY OCCUR IN ACTIVE DATA GUARD ENVIRONMENTS
Bug 12387079 - THE CHILD CURSOR IS INCREASED WHEN THE CURSOR IS KEPT AND EXECUTED DDL
  superseded by Bug 19239846 - FIX FOR Bug 12387079 NEEDS TO BE RE-WORKED
Bug 12320556 - HIGH VERSION COUNTS OCCUR DUE TO AUTH_CHECK_MISMATCH, INSUFF_PRIVS_REM=Y
  superseded by Bug 21515534 - QUERY USING 2 REMOTE DB NOT SHARED - AUTH_CHECK_MISMATCH ,INSUFF_PRIVS_REM
Document 21515534.8 Query referencing multiple remote databases not shared with reason AUTH_CHECK_MISMATCH INSUFF_PRIVS_REM
Document 20907061.8 High number of executions for recursive call on col$
Document 20476175.8 High VERSION_COUNT (in V$SQLAREA) for query with OPT_PARAM('_fix_control') hint
Bug 21799609 - ORA-04024 DEADLOCK ON STARTUP ON SYS.ECOL$ ON LOAD HISTOGRAMS
  duplicate of Bug 19469538 - LOADING OF DATA IN ECOL$ TABLE IS NON-OPTIMAL FOR QUERIES
Bug 22586498 - HUGE M000 TRACEFILE DUE TO OBSOLETE CURSOR DUMPS
Document 20906941.8 - DBW0 spins with high CPU
Bug 17700290 - TST&PERF: LIBRARY CACHE MUTEX WAIT TIME INCREASED HUGELY IN MAIN - affected 12.1.0.2, fixed in 12.2 only
Bug 22733141 - GATHERING STATS ON X$KQLFBC HANGS
Bug 23098370 - DBWR CAN ISSUE MORE THAN SSTMXIOB IO'S
Bug 23103188 - Incorrect ORA-1031 or ORA-942 during Grant due to 22677790, fixed in 12.2
Bug 19340498 - CDB:NO ROWS RETURNED WHEN QUERYING V$SQL_SHARED_MEMORY INSIDE A PDB
Bug 23514710 - CROSS-CONTAINER FIXED TABLE NOT OBSOLETED WHEN PARENT IS MARKED OBSOLETE in ADG Env
Document 19392364.8 - Process spin in kkscsFindMatchingRange() with adaptive cursor sharing, fixed in 12.2
Document 2096561.1 - High Amount Of Shared Memory Allocated Into KGLH0 Heap In 12.1.0.2
Document 2119923.1 - Shared Pool from KGLH0 constantly growing causing ORA-04031 and Latch contention
Document 23168642.8 - Sporadic ORA-600 [kglunpin-bad-pin] / ORA-14403 / high 'library cache: mutex x' using partitions and deferred seg creation / ORA-600 [qesmascTrimLRU_1] with fix 21759047, fixed in 12.2
Document 21759047.8 - High 'library cache: mutex x' Contention when Many Sessions are Executing Cursors Concurrently Against a Partitioned Table - superseded, fixed in 12.2
Document 14380605.8 - High "library cache lock", "cursor: pin S wait on X" and "library cache: mutex X" waits, fixed in 12.2
Document 19822816.8 - High parse time for SQL with PIVOT and binds (can block LGWR on "library cache lock"), fixed in 12.2
Document 13542050.8 - 'library cache: mutex X' waits : A mutex related hang with holder around 65534 (0xfffe), fixed in 12.2
Document 21834574.8 - Mutex contention and increased hard parse time due to ILM (segment-access tracking) checking with partitioned objects, fixed in 12.2
Document 19790972.8 - "library cache lock" waits due to DBMS_STATS gather of stats for a subpartition, fixed in 12.2


Server Manageability (SVRMAN)

Bug 21521882 - SQLT CAUSES ORA-00600: [KGHSTACK_UNDERFLOW_INTERNAL_1]
  duplicate of Bug 19593445 - SR12.2CDBCONC3 - TRC - KPDBSWITCH
Document 18148383.8 AWR snapshots stopped , MMON hung with "EMON to process ntnfs" - affected 12.1.0.1, fixed in 12.1.0.2
Bug 20976392 - AWR REPORT: NEGATIVE OR WRONG VALUES IN %TOTAL OF UNOPTIMIZED READS
Document 2016112.1 - Replaying a PRE-12C Capture On 12.1.0.2.0 Encounters Unexpected ORA-1000 Errors
Bug 21117072 - DBREPLAY PATCH BUNDLE 1 FOR 12.1.0.2 , RAT Patch Bundle 1 for 12.1.0.2 is mandatory LGWR
Bug 23501117 - WORKLOAD REPLAY SKIPS USERCALLS FOR WORKLOAD CAPTURED UNDER 11.2.0.4
Document 22345045.8 - ORA-600 [kewrspbr_2: wrong last partition] in AWR internal tables after upgrading to 12.1.0.2
Multiple LGWR

Document 1968563.1 Hang Waiting on 'LGWR worker group ordering' with Deadlock Between LGWR (Log Writer) Slaves (LG0n) when Using Multiple LGWR Processes
Document 1957710.1 ALERT: Bug 21915719 Database hang or may fail to OPEN in 12c IBM AIX or HPUX Itanium - ORA-742, DEADLOCK or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600


Exadata/In-memory/Multitenant

Document 21553476.8 Wrong Results using aggregations of CASE expression with fix of Bug 20003240 present in Exadata
Bug 19130972 - EXTREMELY LONG PARSE TIME FOR STAR QUERY (For In-memory DB, fixed in 12.1.0.2 DBBP 3)
Bug 17439841 - IMC DYNAMIC SAMPLING CAUSE "CURSOR: PIN S WAIT ON X" ON PARALLEL QUERY - affected 12.1.0.2, fixed in 12.1.0.2
Bug 21445204 - PARSING OF A QUERY TAKES LONG TIME WITH IMC
Document 21153142.8 - Row cache lock self deadlock accessing seed PDB, fixed in 12.2.
Bug 20766944 - QUERIES INVOLVING INT$DBA_CONSTRAINTS TAKE A LOT OF TIME, fixed in 12.2.
Document 17805926.8 - Parameter changes at PDB level affect other PDBs, fixed in 12.2


Popular Documents for Known issues/Bugs

Document 2034610.1 Things to Consider Before Upgrading to 12.1.0.2 to Avoid Poor Performance or Wrong Results
Document 2035898.1 Patches to Consider for 12.1.0.2 to Avoid Problems with SQL Plan Management (SPM)
Document 2107602.1 Things to Consider When Using Incremental Statistics
Document 1683802.1 12.1.0.2 Patch Set - List of Bug Fixes by Problem Type
Document 1924126.1 12.1.0.2 Patch Set Updates - List of Fixes in each PSU


Port-Specific

Document 1970525.1 Things to Consider to Avoid RDBMS Performance Problems on SPARC Platforms Using Oracle 12.1.0.2


Reference documents covering 12c new features

The following list of documents cover many of the features that have been introduced or enhanced in 12c and as provided to aid re-discovery:

Document 2031605.1 Adaptive Query Optimization
Document 2002108.1 Dynamic Sampling Level Is Changed Automatically in 12C
Document 2033658.1 Dictionary Queries Running Slow in 12C PDBs
Document 2002089.1 High Latch Free Waits on 'Result Cache: RC Latch' In 12C when RESULT_CACHE_MODE = MANUAL
Document 2004828.1 ORA-20002: Unable To Export Table Stats Using DBMS_STATS Package In 12c Database
Document 2051004.1 ORA-12012 ORA-20000 During Automatic Statistics Collection Job in 12C With Concurrent Option Enabled
Document 1955319.1 Huge Trace Files Created Containing "----- Cursor Obsoletion Dump sql_id=%s -----" From 12.1.0.2
Document 2053877.1 Wrong Results (0 Rows) Returned for Query that Includes Subquery Containing AND ROWNUM when OPTIMIZER_ADAPTIVE_FEATURES = TRUE
Document 2041541.1 Gather_Database_Stats_Job_Proc Taking More Time in 12.1.0.2 Than 11.2.0.4


Documentation

Database SQL Tuning Guide 12.1: http://docs.oracle.com/database/121/TGSQL/toc.htm
Database Performance Tuning Guide: http://docs.oracle.com/database/121/TGDBA/toc.htm
Database Testing Guide (RAT): http://docs.oracle.com/database/121/RATUG/toc.htm



参考:
Commonly Reported Known Issues for Database and Query Performance Reported in 12C (Doc ID 2097793.1 INTERNAL)
Documented Database Bugs With High “Solved SR” Count (Doc ID 2099231.1 INTERNAL)


Documented Database Bugs With High “Solved SR” Count

$
0
0

APPLIES TO:

Oracle Database - Enterprise Edition - Version 9.2.0.8 and later

PURPOSE:

This note lists "documented" database bugs that have been marked as the solution to a "high number of Service Requests".
For the purpose of this listing:
A "documented" bug means one with a bugno.8 KM document created from BugTag data.
A "high number of Service Requests" means that there are more than 50 "Solved SR" entries for the bug - ie: all bugs in this doc have > 50 "Solved SR" entries from the SR closure data.
It is suggested to restrict the list based on the version of interest, and then use the radio buttons to check on particular features / symptoms / facts linked to the bug descriptions.

The list includes some bugs with a "D" in the "NB" column - this is used to denote a bug fix that is DISABLED by default and so a customer may encounter that issue in DB versions where the bug is marked as fixed.


KNOWN BUGS:

NB Prob Bug Fixed Description
P IIII 7272646 Linux-x86_64: ORA-27103 on startup when MEMORY_TARGET > 3g
IIII 18384537 11.2.0.4.6, 11.2.0.4.BP13, 12.1.0.2, 12.2.0.0 Process spin in opipls() / ORA-4030 for “kgh stack” memory
IIII 18148383 11.2.0.4.BP16, 12.1.0.1.5, 12.1.0.2, 12.2.0.0 AWR snapshots stopped , MMON hung with “EMON to process ntnfs”
IIII 17951233 11.2.0.4.4, 11.2.0.4.BP11, 12.1.0.2, 12.2.0.0 ORA-600 [kcblin_3] [103] after setting _pga_max_size > 2Gb
IIII 17867137 12.1.0.1.4, 12.1.0.2, 12.2.0.0 ORA-700 [Offload issue job timed out] on Exadata storage with fix of bug 16173738 present
IIII 17831758 12.1.0.2, 12.2.0.0 ORA-600 [kwqitnmphe:ltbagi] in Qnnn background process
IIII 17469624 11.2.0.4.BP15, 12.1.0.2, 12.2.0.0 ORA-600 [kcfis_finalize_cached_sessions_2] / ORA-600 [kcfis_update_recoverable_val_3] during session cleanup
IIII 17339455 12.1.0.2, 12.2.0.0 ORA-7445 [kkorminl] or similar can occur when running Automatic tuning tasks / DBMS_SQLTUNE Index Advisor
IIII 17306264 12.1.0.2, 12.2.0.0 Frequent ORA-1628 max # extents (32765) reached for rollback segment
IIII 17079301 11.2.0.4.BP14, 12.1.0.2, 12.2.0.0 ORA-6525 length mismatch from MMON job
IIII 17042658 11.2.0.4.4, 11.2.0.4.BP09, 12.1.0.1.3, 12.1.0.2, 12.2.0.0 ORA-600 [kewrsp_split_partition_2] during AWR purge
IIII 17037130 11.2.0.4.4, 11.2.0.4.BP10, 12.1.0.2, 12.2.0.0 Excess shared pool “PRTMV” memory use / ORA-4031 with partitioned tables
IIII 16989630 12.1.0.2, 12.2.0.0 Intermitent ORA-7445 [kglHandleParent] / [kglGetMutex] / [kglic0]
IIII 16667538 11.2.0.4.BP15, 12.1.0.2, 12.2.0.0 SGA memory corruption possible (single bit 0x1000 cleared)
IIII 16621589 12.1.0.2, 12.2.0.0 ORA-1426 “numeric overflow” from AUTO_SPACE_ADVISOR_JOB_PROC
IIII 16268425 11.2.0.4.3, 11.2.0.4.BP04, 12.1.0.2, 12.2.0.0 Memory corruption / ORA-7445 / ORA-600 gathering statistics in parallel for table with virtual column/s
IIII 16002686 11.2.0.4, 12.1.0.2, 12.2.0.0 ORA-7445 [kglbrk] or ORA-7445 [kxsSqlId] in shared server process
IIII 14764829 11.2.0.4.4, 11.2.0.4.BP11, 12.1.0.2, 12.2.0.0 ORA-600[kwqicgpc:cursta] can occur using AQ
IIII 14084247 11.2.0.3.BP24, 11.2.0.4.4, 11.2.0.4.BP07, 12.1.0.2, 12.2.0.0 ORA-1555 or ORA-12571 Failed AWR purge can lead to continued SYSAUX space use
IIII 13814203 11.2.0.4, 12.1.0.2, 12.2.0.0 ORA-600 [ktsapsblk-1] from SQL Tuning
IIII 16472780 11.2.0.4, 12.1.0.2 PGA memory leak ORA-600 [723] for “Fixed Uga” memory
IIII 15993436 12.1.0.2 Intermittent ORA-7445 [kokacau] errors
IIII 16477664 11.2.0.4.BP09, 12.1.0.1 ORA-600 [kokuxpout3] can occur querying some V$ views
IIII 16166364 12.1.0.1 ORA-600 [rworofprFastUnpackRowsets:oobp] or similar from Parallel Query
IIII 14843189 12.1.0.1 Select list pruning with subqueries / ORA-7445 [msqcol]
IIII 14602788 11.2.0.4.3, 11.2.0.4.BP07, 12.1.0.1 Q00* process spin when buffered messages spill
IIII 14275161 11.2.0.4.BP16, 12.1.0.1 ORA-600 [rwoirw: check ret val] on CTAS with predicate move around
IIII 14201252 11.2.0.4, 12.1.0.1 Stack corruption within kponPurgeUnreachLoc
IIII 14119856 11.2.0.4, 12.1.0.1 ORA-4030 occurs at 16gb of PGA even if it could grow much larger
IIII 14040124 11.2.0.4, 12.1.0.1 ORA-7445 [ktspsrch_reset] during commit on ASSM segment
IIII 14034426 11.2.0.3.BP25, 11.2.0.4.5, 11.2.0.4.BP09, 12.1.0.1 ORA-600 [kjbrfixres:stalew] in LMS in RAC
IIII 14024668 11.2.0.4, 12.1.0.1 ORA-7445 [ksuklms] from ‘alter system kill session (non-existent)’
IIII 13914613 11.2.0.3.6, 11.2.0.3.BP12, 11.2.0.4, 12.1.0.1 Excessive time holding shared pool latch in kghfrunp with auto memory management
IIII 13872868 11.2.0.3.10, 11.2.0.3.BP23, 11.2.0.4, 12.1.0.1 ORA-600[keomnReadGlobalInfoFromStream:magic] from V$SQL_MONITOR
IIII 13863932 11.2.0.3.BP22, 11.2.0.4, 12.1.0.1 ORA-600[12259] using JDBC application with PL/SQL
IIII 13680405 11.2.0.3.6, 11.2.0.3.BP16, 11.2.0.4, 12.1.0.1 PGA consumption keeps growing in DIA0 process
IIII 13608792 11.2.0.3.BP16, 11.2.0.4, 12.1.0.1 ORA-600 [15713] from Ctrl-C/interrupt of PQ
IIII 12656350 11.2.0.4, 12.1.0.1 Small parse overhead with fix for bug 12534597 present
IIII 12537316 11.2.0.4, 12.1.0.1 Assorted ORA-600 / ORA-7445 for SQL with merged subquery
IIII 11837095 11.2.0.3.BP10, 11.2.0.4, 12.1.0.1 “time drift detected” appears intermittently in alert log
P IIII 11801934 11.2.0.4, 12.1.0.1 AIX: Wrong page-in and page-out OS VM stats in V$OSSTAT on AIX
IIII 11769185 11.2.0.4, 12.1.0.1 ORA-600 / ORA-7445 from SQL performance Analyzer for SQL with UNION and fake binds
IIII 10110625 11.2.0.3.11, 11.2.0.3.BP24, 11.2.0.4, 12.1.0.1 DBSNMP.BSLN_INTERNAL reports ORA-6502 running BSLN_MAINTAIN_STATS_JOB
P+ IIII 10194190 11.2.0.4.BP09 Solaris: Process spin and/or ASM and DB crash if RAC instance up for > 248 days
IIII 14076523 11.2.0.2.9, 11.2.0.2.BP18, 11.2.0.3.4, 11.2.0.3.BP11, 11.2.0.4 ORA-600 [kgxRelease-bad-holder] can occur in rare cases
IIII 9137871 11.2.0.2 ORA-600 [15851] using function based index on DATE column
IIII 22243719 11.2.0.4.161018, 12.1.0.2.161018, 12.2.0.0 Several Internal Errors due to Shared Pool Memory Corruptions in 11.2.0.4 and later. Instance may Crash
* IIII 22241601 12.2.0.0 ORA-600 [kdsgrp1] ORA-1555 / ORA-600 [ktbdchk1: bad dscn] due to Invalid Commit SCN in INDEX block
*P IIII 21915719 12.1.0.2.160419, 12.2.0.0 12c Hang: LGWR waiting for ‘lgwr any worker group’ or ORA-600 [kcrfrgv_nextlwn_scn] ORA-600 [krr_process_read_error_2] on IBM AIX / HPIA
IIII 21749315 12.2.0.0 ORA-600 [keomnReadBindsFromStream:magic]
IIII 21373473 12.1.0.2.160719, 12.1.0.2.DBBP12, 12.2.0.0 Excess “ges resource dynamic” memory use / ORA-4031 / instance crash in RAC with many distributed transactions / XA
IIII 21286665 11.2.0.4.160419, 12.2.0.0 “Streams AQ: enqueue blocked on low memory” waits with fix 18828868 – superseded
IIII 21283337 12.2.0.0 ORA-600 [kghstack_underflow_internal_1] or similar if fix for bug 19052685 present
IIII 21260431 12.1.0.2.160419, 12.2.0.0 Excessive “ges resource dynamic” memory use in shared pool in RAC (ORA-4031)
IIII 20987661 12.2.0.0 QMON slave processes reporting ORA-600 [kwqitnmphe:ltbagi]
E IIII 20907061 12.1.0.2.161018, 12.2.0.0 high number of executions for recursive call on col$
IIII 20877664 12.1.0.2.160119, 12.2.0.0 SQL Plan Management Slow with High Shared Pool Allocations
IIII 20844426 12.1.0.2.161018, 12.2.0.0 ORA-600 [kkzdgdefq] from DBMS_REFRESH.refresh
D IIII 20636003 12.2.0.0 Slow Parsing caused by Dynamic Sampling (DS_SRV) queries (side effects possible ORA-12751/ ORA-29771)
IIII 20547245 12.2.0.0 ORA-7445 [lxregsergop] from query using REGEXP on Exadata
IIII 20505778 12.2.0.0 Private memory corruption / ORA-7445 [kfuhRemove] / ORA-7445 [kghstack_underflow_internal] / ORA-600[17147] from DBA_TABLESPACE_USAGE_METRICS
IIII 20476175 11.2.0.4.BP20, 12.1.0.2.5, 12.1.0.2.DBBP08, 12.2.0.0 High VERSION_COUNT (in V$SQLAREA) for query with OPT_PARAM(‘_fix_control’) hint
IIII 20387265 12.1.0.2.4, 12.1.0.2.DBBP07, 12.2.0.0 ORA-600 [Cursor not typechecked] errors on cursor executed from PLSQL
IIII 20186278 11.2.0.4.GIPSU07, 12.1.0.2.GIPSU04, 12.2.0.0 crfclust.bdb Becomes Huge Size Due to Sudden Retention Change
IIII 19942889 12.2.0.0 ORA-600 [kpdbSwitchPreRestore: txn] from SQL autotune of a remote (dblink) query – duplicate of bug 19052685 – superseded
IIII 19689979 12.1.0.2.160119, 12.1.0.2.DBBP07, 12.2.0.0 ORA-8103 or ORA-600 [ktecgsc:kcbz_objdchk] or Wrong Results on PARTITION table after TRUNCATE in 11.2.0.4 or above
IIII 19621704 12.2.0.0 PGA memory leak / ORA-600 [723] with large allocations of “mbr node memory” when using Spatial
IIII 19614585 11.2.0.4.BP17, 12.1.0.2.DBBP03, 12.2.0.0 Wrong Results / ORA-600 [kksgaGetNoAlloc_Int0] / ORA-7445 / ORA-8103 / ORA-1555 from query on RAC ADG Physical Standby Database
IIII 19509982 12.1.0.2.DBBP02, 12.2.0.0 Disable raising of ORA-1792 by default
IIII 19475971 12.2.0.0 ORA-600 [17285] from PLSQL packages
IIII 19450314 12.1.0.2.160419, 12.2.0.0 Unnecessary compiled PL/SQL invalidations in 12c
IIII 19366669 12.2.0.0 CRS-8503: oracle clusterware osysmond process experienced fatal signal or exception 11 – superseded
IIII 18899974 12.1.0.2.161018, 12.2.0.0 ORA-600 [kcbgtcr_13] on ADG for SPACE metadata blocks and UNDO blocks
IIII 18841764 12.2.0.0 Network related error like ORA-12592 or ORA-3137 or ORA-3106 may be signaled
IIII 18828868 11.2.0.4.5, 11.2.0.4.BP12, 12.1.0.2, 12.2.0.0 Too Many Qxxx Processes Maxing Out the Number of Processes with fix for bug 14602788 present
IIII 18758878 12.2.0.0 Automatic SQL tuning advisor fails with ORA-7445 [apaneg]
* IIII 18607546 11.2.0.4.6, 11.2.0.4.BP16, 12.1.0.2.3, 12.1.0.2.DBBP06, 12.2.0.0 ORA-600 [kdblkcheckerror]..[6266] corruption with self-referenced chained row. ORA-600 [kdsgrp1] / Wrong Results / ORA-8102
IIII 18536720 12.1.0.2, 12.2.0.0 ORA-600 [kwqitnmphe:ltbagi] processing History IOT in AQ
IIII 18280813 11.2.0.4.4, 11.2.0.4.BP10, 12.1.0.2, 12.2.0.0 Process hangs in ‘gc current request’ in RAC
IIII 18199537 11.2.0.4.4, 11.2.0.4.BP10, 12.1.0.2, 12.2.0.0 RAC database becomes almost hung when large amount of row cache are used in shared pool
IIII 18189036 11.2.0.4.5, 11.2.0.4.BP14, 12.1.0.2, 12.2.0.0 ORA-600 [qkaffsindex3] from SQL Tuning task / advisor
IIII 17890099 12.1.0.2.160119, 12.1.0.2.DBBP07, 12.2.0.0 Wrong cardinality estimation for “is NULL” predicate on a remote table
IIII 17722075 12.1.0.2.160719, 12.2.0.0 ORA-7445 [kkcnrli] in Qnnn process during ALTER DATABASE OPEN
IIII 17551261 12.1.0.2.DBBP12, 12.2.0.0 ORA-942 / ORA-904 “from$_subquery$_###”.<column_name> with query rewrite
IIII 17365043 12.1.0.2.5, 12.1.0.2.DBBP10, 12.2.0.0 Session hangs on “Streams AQ: enqueue blocked on low memory”
IIII 17274537 11.2.0.4.5, 11.2.0.4.BP13, 12.1.0.2.3, 12.1.0.2.DBBP05, 12.2.0.0 ASM disk group force dismounted due to slow I/Os
IIII 17220460 12.1.0.2, 12.2.0.0 parallel query hangs with ‘PX Deq: Execute Reply’
D IIII 17018214 11.2.0.4, 12.1.0.2, 12.2.0.0 ORA-600 [krdrsb_end_qscn_2] ORA-4021 in Active Dataguard Standby Database with fix for bug 16717701 present – Instance may crash
IIII 16817656 12.2.0.0 ORA-7445[_int_malloc] on shared server process / IO failures on ASM leading to unnecessary Disk Offline in Exadata
IIII 16756406 12.2.0.0 ORA-600 [kpp_concatq:2] or hang when NCHAR/NVARCHAR2 AL16UTF16 characters are included in a SQL statement
D IIII 16717701 11.2.0.4, 12.1.0.2, 12.2.0.0 Active Dataguard Hangs waiting for library cache lock on DBINSTANCE namespace with possible deadlock – Superseded
IIII 16504613 11.2.0.4, 12.1.0.2, 12.2.0.0 ORA-600[12337] from SQL with predicate of the form “NVL() [not] IN (inlist)”
IIII 15883525 11.2.0.3.9, 11.2.0.3.BP22, 11.2.0.4, 12.1.0.2, 12.2.0.0 ORA-600 [kcbo_switch_cq_1] can occur causing an instance crash
IIII 14572561 11.2.0.4, 12.1.0.2, 12.2.0.0 ORA-7445 [pevm_SUBSTR] crash on packages with constants
IIII 13542050 12.1.0.2.160419, 12.2.0.0 A mutex related hang with holder around 65534 (0xfffe) – superseded
IIII 24316947 11.2.0.4.161018, 12.1.0.2.161018 ORA-07445 and ORA-00600 after applying 11.2.0.4/12.1.0.1/12.1.0.2 April DB PSU or DB Bundle Patch
IIII 19730508 11.2.0.4.5, 11.2.0.4.BP14, 12.1.0.2.3, 12.1.0.2.DBBP05 Orphan subscribers / ORA-600 [kwqdlprochstentry:ltbagi] on SYS$SERVICE_METRICS_TAB in RAC with fix of bug 14054676 present
IIII 17437634 11.2.0.3.9, 11.2.0.3.BP22, 11.2.0.4.2, 11.2.0.4.BP03, 12.1.0.1.3, 12.1.0.2 ORA-1578 or ORA-600 [6856] transient in-memory corruption on TEMP segment during transaction recovery / ROLLBACK (eg: after Ctrl-C) – superseded
IIII 17325413 11.2.0.3.BP23, 11.2.0.4.2, 11.2.0.4.BP04, 12.1.0.1.3, 12.1.0.2 Drop column with DEFAULT value and NOT NULL definition ends up with Dropped Column Data still on Disk leading to Corruption
IIII 18973907 11.2.0.4.4, 11.2.0.4.BP11, 12.1.0.1.5, 12.1.0.2 Memory corruption / various ORA-600/ORA-7445 using database links between 11.2.0.4/12.1.0.1 and earlier versions – superseded
IIII 12578873 11.2.0.4.BP15, 12.1.0.2 ORA-7445 [opiaba] when using more than 65535 bind variables
P IIII 20675347 12.1.0.1 AIX: ORA-7445 [kghstack_overflow_internal] or ORA-600 [kghstack_underflow_internal_2] in 11.2.0.4 on IBM AIX
IIII 18235390 11.2.0.4.5, 11.2.0.4.BP14, 12.1.0.1 ORA-600 [kghstack_underflow_internal_3] … [kttets_cb – autoextfiles_kttetsrow] after applying patch 17897511
IIII 17897511 12.1.0.1 ORA-1000 from query on DBA_TABLESPACE_USAGE_METRICS after upgrade to 11.2.0.4 – superseded
IIII 17586955 11.2.0.4.4, 11.2.0.4.BP11, 12.1.0.1 ORA-600 [ktspfmdb:objdchk_kcbnew_3] in RAC
IIII 17501296 11.2.0.4.BP09, 12.1.0.1 ORA-604 / PLS-306 attempting to delete rows from table with Text index after upgrade to 11.2.0.4
IIII 16392079 11.2.0.4, 12.1.0.1 Sessions hang waiting for ‘resmgr:cpu quantum’ with Resource Manager
IIII 15881004 11.2.0.3.BP24, 11.2.0.4, 12.1.0.1 Excessive SGA memory usage with Extended Cursor Sharing
IIII 14791477 11.2.0.3.8, 11.2.0.3.BP17, 11.2.0.4, 12.1.0.1 Instance eviction in RAC due to lock element shortage (Pseudo Reconfiguration reason 3)
IIII 14657740 11.2.0.3.11, 11.2.0.3.BP24, 11.2.0.4.4, 11.2.0.4.BP09, 12.1.0.1 ORA-600 [510] … [cache buffers chains]
IIII 14601231 11.2.0.3.BP16, 11.2.0.4, 12.1.0.1 ORA-7445 [kpughndlarr] / assorted ORA-600
IIII 14588746 11.2.0.3.11, 11.2.0.3.BP23, 11.2.0.4, 12.1.0.1 ORA-600 [kjbmprlst:shadow] in LMS in RAC – crashes the instance
IIII 14489591 11.2.0.3.11, 11.2.0.3.BP24, 11.2.0.4, 12.1.0.1 ORA-3137 [3149] on server due to bad bind attempt in client
IIII 14409183 11.2.0.3.4, 11.2.0.3.BP10, 11.2.0.4, 12.1.0.1 ORA-600 [kjblpkeydrmqscchk:pkey] or similar / session hangs on “gc buffer busy acquire”
IIII 14373728 11.2.0.4, 12.1.0.1 Old statistics not purged from SYSAUX tablespace
IIII 14192178 11.2.0.4, 12.1.0.1 EXPDP of partitioned table can be slow
IIII 14091984 11.2.0.4, 12.1.0.1 dump on kkoatsamppred
P IIII 13940331 11.2.0.4, 12.1.0.1 AIX: OCSSD threads are not set to the correct priority
IIII 13931044 11.2.0.3.11, 11.2.0.3.BP24, 11.2.0.4, 12.1.0.1 ORA-600 [13009] / ORA-600 [13030] with Nested Loop Batching
IIII 13869978 11.2.0.3.GIPSU04, 11.2.0.4, 12.1.0.1 OCSSD reports that the voting file is offline without reporting the reason
IIII 13860201 11.2.0.3.6, 11.2.0.3.BP12, 11.2.0.4, 12.1.0.1 Dump on kkspbd0
IIII 13840704 11.2.0.3.11, 11.2.0.3.BP24, 11.2.0.4, 12.1.0.1 ORA-12012 / ORA-6502 from Segment Advisor (DBMS_SPACE/DBMS_ADVISOR) for LOB segments
IIII 13737746 11.2.0.2.8, 11.2.0.2.BP16, 11.2.0.3.4, 11.2.0.3.BP05, 11.2.0.4, 12.1.0.1 Recovery fails with ORA-600 [krr_assemble_cv_12] or ORA-600[krr_assemble_cv_3]
IIII 13718279 11.2.0.3.4, 11.2.0.3.BP10, 11.2.0.4, 12.1.0.1 DB instance terminated due to ORA-29770 in RAC
IIII 13616375 11.2.0.2.11, 11.2.0.2.BP21, 11.2.0.3.6, 11.2.0.3.BP15, 11.2.0.4, 12.1.0.1 ORA-600 [qkaffsindex5] on a query with ORDER BY DESC and functional index on DESC column from SQL tuning index advisor job
IIII 13583663 11.2.0.4, 12.1.0.1 ORA-7445[opipls] from EXECUTE IMMEDIATE in PLSQL – superseded
IIII 13555112 11.2.0.4, 12.1.0.1 ORA-600 [kkopmCheckSmbUpdate:2] using plan baseline
+ IIII 13550185 11.2.0.2.9, 11.2.0.2.BP17, 11.2.0.3.4, 11.2.0.3.BP06, 11.2.0.4, 12.1.0.1 Hang / SGA memory corruption / ORA-7445 [kglic0] when using multiple shared pool subpools – superseded
IIII 13527323 11.2.0.3.3, 11.2.0.3.BP07, 11.2.0.4, 12.1.0.1 ORA-6502 generating HTML AWR report using awrrpt.sql in Multibyte characterset database
IIII 13493847 11.2.0.3.7, 11.2.0.3.BP14, 11.2.0.4, 12.1.0.1 ORA-600 [15709] can occur with Parallel Query
IIII 13477790 11.2.0.3.10, 11.2.0.3.BP04, 11.2.0.4, 12.1.0.1 ORA-7445 [kghalo] / memory errors / ORA-4030 from XMLForest / XMLElement
IIII 13464002 11.2.0.2.BP16, 11.2.0.3.4, 11.2.0.3.BP06, 11.2.0.4, 12.1.0.1 ORA-600 [kcbchg1_12] or ORA-600 [kdifind:kcbget_24]
IIII 13463131 11.2.0.3.BP23, 11.2.0.4, 12.1.0.1 Dump (kgghash) from bind peeking
IIII 13456573 11.2.0.3.BP23, 11.2.0.4, 12.1.0.1 Many child cursors / ORA-4031 with large allocation in KGLH0 using extended cursor sharing
IIII 13397104 11.2.0.3.4, 11.2.0.3.BP09, 12.1.0.1 Instance crash with ORA-600 [kjblpkeydrmqscchk:pkey] or similar – superseded
IIII 13257247 10.2.0.5.7, 11.2.0.2.6, 11.2.0.2.BP15, 11.2.0.3.4, 11.2.0.3.BP04, 11.2.0.4, 12.1.0.1 AWR Snapshot collection hangs due to slow inserts into WRH$_TEMPSTATXS.
IIII 13250244 11.2.0.2.8, 11.2.0.2.BP18, 11.2.0.3.4, 11.2.0.3.BP10, 11.2.0.4, 12.1.0.1 Shared pool leak of “KGLHD” memory when using multiple subpools
IIII 13099577 11.2.0.2.7, 11.2.0.2.BP16, 11.2.0.3.4, 11.2.0.3.BP05, 11.2.0.4, 12.1.0.1 ora-12801 and ORA-1460 with parallel query
IIII 13072654 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4, 12.1.0.1 Unnecessary ORA-4031 for “large pool”,”PX msg pool” from PQ slaves
IIII 13000553 11.2.0.3.BP11, 11.2.0.4, 12.1.0.1 RMAN backup fails with RMAN-20999 error at standby database
IIII 12971242 11.2.0.4, 12.1.0.1 dumps occurs around kpofcr with STAR transformation
IIII 12919564 11.2.0.3.2, 11.2.0.3.BP04, 11.2.0.4, 12.1.0.1 ORA-600 [ktbesc_plugged] executing SQL against a Plugged in (transported) tablespace
IIII 12899768 11.2.0.2.8, 11.2.0.2.BP18, 11.2.0.3.8, 11.2.0.3.BP11, 11.2.0.4, 12.1.0.1 Processed messages remain in Queue causing space issues
IIII 12880299 10.2.0.4.13, 10.2.0.5.8, 11.1.0.7.12, 11.2.0.2.7, 11.2.0.2.BP17, 11.2.0.3.3, 11.2.0.3.BP24, 11.2.0.4, 12.1.0.1 TCP handlers block if listener registration is restricted to IPC with COST
IIII 12865902 11.2.0.2.BP13, 11.2.0.3.8, 11.2.0.3.BP03, 11.2.0.4, 12.1.0.1 NOWAIT lock requests could hang (like Parallel Queries may hang “enq: TS – contention”) in RAC
IIII 12848798 11.2.0.2.9, 11.2.0.2.BP19, 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1 OERI:kcbgtcr_13 on active dataguard
IIII 12834800 11.2.0.4, 12.1.0.1 ORA-7445 [qkxrPXformUnm] from SQL with positional ORDER BY or GROUP BY and function based index
IIII 12834027 11.2.0.2.8, 11.2.0.2.BP13, 11.2.0.3.1, 11.2.0.3.BP02, 11.2.0.4, 12.1.0.1 ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking
IIII 12815057 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4, 12.1.0.1 ORA-600/ORA-7445/ UGA memory corruptions using PLSQL callouts (such as SYS_CONTEXT)
IIII 12794305 11.2.0.2.8, 11.2.0.2.BP18, 11.2.0.3.4, 11.2.0.3.BP09, 11.2.0.4, 12.1.0.1 ORA-600 [krsr_pic_complete.8] on standby database
IIII 12747437 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4, 12.1.0.1 ORA-600 [ktspfmdb:objdchk_kcbnew_3] after purging single consumer queue table
IIII 12738119 11.2.0.3.BP22, 11.2.0.4, 12.1.0.1 RAC slow / repeat diagnostic dumps
IIII 12714511 11.2.0.4, 12.1.0.1 ORA-600 [17114] / memory corruption optimizing ANSI queries with FIRST_ROWS(K)
IIII 12683462 11.2.0.4, 12.1.0.1 Internal Errors / Wrong results from “PX SEND RANGE”
IIII 12680491 11.2.0.2.GIPSU04, 11.2.0.3.GIPSU03, 11.2.0.4, 12.1.0.1 Intermittent hiccup in network CHECK action can fail over vip, bring listener offline briefly
IIII 12672969 11.2.0.1.BP12, 11.2.0.2.BP12, 11.2.0.3.BP01, 11.2.0.4, 12.1.0.1 Assorted Dumps with aggregate expression in ORDER BY
IIII 12637294 11.2.0.3.BP11, 11.2.0.4, 12.1.0.1 Deadlock of PS and BF locks during parallel query operations
IIII 12552578 11.2.0.4, 12.1.0.1 ORA-1790 / ORA-600 [kkqtsetop.1] / ORA-1789 during SET operation query with redundant WHERE conditions
E IIII 12534597 11.2.0.4, 12.1.0.1 Bind Peeking is disabled for remote queries
IIII 12531263 11.1.0.7.10, 11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1 ORA-4020 on object $BUILD$.{Hexadecimal Number}
IIII 12340939 11.2.0.2.4, 11.2.0.2.BP10, 11.2.0.3, 12.1.0.1 ORA-7445 [kglic0] can occur capturing cursor stats for V$SQLSTATS
IIII 12312133 11.2.0.2.BP17, 11.2.0.3.8, 11.2.0.3.BP09, 11.2.0.4, 12.1.0.1 Standby DB crashes with ORA-600 [krcccb_busy] /Ora-00600 [krccckp_scn] with block change tracking
IIII 11902008 11.2.0.4, 12.1.0.1 SMON may crash with ORA-600 [kcbgcur_3] or ORA-600 [kcbnew_3] during Transaction recovery
IIII 11872103 11.2.0.2.7, 11.2.0.2.BP16, 11.2.0.3, 12.1.0.1 RMAN RESYNC CATALOG very slow / V$RMAN_STATUS incorrectly shows RUNNING
E IIII 11869207 12.1.0.1 Improvements to archived statistics purging / SYSAUX tablespace grows – superseded
IIII 11744544 12.1.0.1 Set newname for database does not apply to block change tracking file
+ IIII 11666959 11.2.0.3, 12.1.0.1 ORA-7445 / LPX-200 / wrong results etc.. from new XML parser
IIII 11068682 11.2.0.4, 12.1.0.1 ORA-7445 [ph2csql_analyze] in active dataguard – superseded
E IIII 10411618 11.1.0.7.9, 11.2.0.1.BP12, 11.2.0.2.2, 11.2.0.2.BP06, 11.2.0.3, 12.1.0.1 Enhancement to add different “Mutex” wait schemes
IIII 10314054 12.1.0.1 ORA-600 [13001] or similar from DELETE/UPDATE/MERGE SQL with non-deterministic WHERE clause
IIII 10279045 11.2.0.3, 12.1.0.1 Slow Statistics purging (SYSAUX grows)
+ IIII 10259620 11.2.0.2.BP12, 11.2.0.3, 12.1.0.1 Wrong results / ORA-7445 with DESC indexes and OR expansion
IIII 10237773 11.2.0.2.4, 11.2.0.2.BP12, 11.2.0.3, 12.1.0.1 ORA-600 [kcbz_check_objd_typ] / ORA-600 [ktecgsc:kcbz_objdchk]
E IIII 10220118 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Print warning to alert log when system is swapping
IIII 10204505 11.2.0.3, 12.1.0.1 SGA autotune can cause row cache misses, library cache reloads and parsing
E IIII 10187168 11.1.0.7.7, 11.2.0.1.BP12, 11.2.0.2.2, 11.2.0.2.BP06, 11.2.0.3, 12.1.0.1 Enhancement to obsolete parent cursors if VERSION_COUNT exceeds a threshold
IIII 10155684 11.2.0.3, 12.1.0.1 ORA-600 [17099] / dump after session migration using trusted callout
IIII 10089333 11.2.0.2.6, 11.2.0.2.BP15, 11.2.0.3, 12.1.0.1 “init_heap_kfsg” memory leaks in SGA of db instance using ASM
IIII 10082277 11.2.0.1.BP12, 11.2.0.2.3, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.1 Excessive allocation in PCUR or KGLH0 heap of “kkscsAddChildNo” (ORA-4031)
IIII 10018789 11.2.0.1.BP07, 11.2.0.2.2, 11.2.0.2.BP01, 11.2.0.3, 12.1.0.1 Spin in kgllock / DB hang with high library cache lock waits on ADG
IIII 10013177 11.2.0.2.6, 11.2.0.2.BP16, 11.2.0.3, 12.1.0.1 Wrong Results (truncate values) / dumps and internal errors with Functional based indexes of expressions used in Aggregations
IIII 10010310 10.2.0.5.3, 11.2.0.3, 12.1.0.1 ORA-27300 / ORA-27302 killing a non existing session
IIII 9964177 11.2.0.3, 12.1.0.1 ORA-7445 in/under LpxFSMSaxSE parsing an XML file due to the fact that lpxfsm_getattr_name() evaluates a string incorrectly
* IIII 9877980 11.2.0.2.8, 11.2.0.2.BP18, 11.2.0.3, 12.1.0.1 ORA-7445[kkslMarkLiteralBinds] / Assorted Errors on 11.2.0.2 if cursor sharing is enabled – Affects RMAN
P IIII 9871302 11.2.0.3, 12.1.0.1 Windows: Cannot make new connection to database on Windows platforms with TNS-12560
IIII 9829397 11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1 Excessive CPU and many “asynch descriptor resize” waits for SQL using Async IO
IIII 9795214 11.1.0.7.7, 11.2.0.1.BP12, 11.2.0.2.4, 11.2.0.2.BP08, 11.2.0.3, 12.1.0.1 Library Cache Memory Corruption / ORA-600 [17074] may result in Instance crash
IIII 9772888 10.2.0.5.2, 11.2.0.2, 12.1.0.1 Needless “WARNING:Could not lower the asynch I/O limit to .. for SQL direct I/O It is set to -1” messages
IIII 9746210 11.2.0.2.4, 11.2.0.2.BP12, 11.2.0.3, 12.1.0.1 ORA-7445 [qsmmixComputeClusteringFactor] from SQL tuning
E IIII 9735536 11.2.0.4, 12.1.0.1 Enhancement request which allows ability to selectively remove slow clients from Emon Notification mechanism
+ IIII 9735237 10.2.0.5.5, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Dump [under kxspoac] / ORA-1722 as SQL uses child with mismatched BIND metadata
P IIII 9728806 11.2.0.2, 12.1.0.1 ORA-7445 [kggibr()+52] during recovery on IBM AIX POWER Systems
IIII 9727147 11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1 ORA-7445 [qksvcProcessVirtualColumn] using SQL Tuning / Index advisor
IIII 9706792 11.2.0.3.6, 11.2.0.3.BP07, 11.2.0.4, 12.1.0.1 ORA-600 [kcrpdv_noent] during STARTUP in Crash Recovery with Parallelism
IIII 9703463 11.1.0.7.8, 11.2.0.1.BP12, 11.2.0.2, 12.1.0.1 ORA-3137 [12333] or ORA-600 [kpobav-1] When Using Bind Peeking – superceded
IIII 9689310 10.2.0.5.7, 11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2, 12.1.0.1 Excessive child cursors / high VERSION_COUNT / ORA-600 [17059] due to bind mismatch
IIII 9651350 11.2.0.2.2, 11.2.0.2.BP05, 11.2.0.3, 12.1.0.1 Large redo dump and ORA-308 might be raised due to ORA-8103
IIII 9594372 11.2.0.2, 12.1.0.1 A dump can occur in (kokscold)
+ IIII 9577583 11.2.0.1.BP08, 11.2.0.2, 12.1.0.1 False ORA-942 or other errors when multiple schemas have identical object names
IIII 9478199 11.2.0.2.8, 11.2.0.2.BP18, 11.2.0.3, 12.1.0.1 Memory corruption / ORA-600 from PLSQL anonymous blocks
+ IIII 9399991 11.1.0.7.5, 11.2.0.1.3, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1 Assorted Internal Errors and Dumps (mostly under kkpa*/kcb*) from SQL against partitioned tables
IIII 9395500 11.2.0.1.BP07, 11.2.0.2, 12.1.0.1 Dump [kupfuDecompress] importing large table from compressed DMP file
IIII 9390347 11.2.0.2, 12.1.0.1 ADR purge may dump (DIA-48457 [11])
IIII 9373370 11.2.0.2.8, 11.2.0.2.BP18, 11.2.0.3, 12.1.0.1 The wrong cursor may be executed by JDBC thin following a query timeout / ORA-3137 [12333]
IIII 9316980 11.2.0.3, 12.1.0.1 ORA-600 [723] UGA leak of “KPON Callback A” memory in QMNC slave (Qnnn process)
IIII 9243912 11.2.0.2, 12.1.0.1 Additional diagnostics for ORA-3137 [12333] / OERI:12333
IIII 9233544 11.2.0.2.9, 11.2.0.2.BP19, 11.2.0.3, 12.1.0.1 ORA-600 [15709] during parallel rollback
IIII 9073910 11.2.0.4, 12.1.0.1 Direct path creates bad functional index on LOB column
IIII 9067282 11.2.0.1.2, 11.2.0.1.BP01, 11.2.0.3, 12.1.0.1 ORA-600 [kksfbc-wrong-kkscsflgs] can occur
IIII 9066130 10.2.0.5, 11.1.0.7.2, 11.2.0.2, 12.1.0.1 OERI [kksfbc-wrong-kkscsflgs] / spin with multiple children
IIII 9061785 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1 Assorted Dumps from JPPD on distributed query with UNION ALL or OUTER JOIN
IIII 9050716 11.2.0.1.BP12, 11.2.0.2, 12.1.0.1 Dumps on kkqstcrf with ANSI joins and Join Elimination
*D IIII 8895202 11.2.0.2, 12.1.0.1 ORA-1555 / ORA-600 [ktbdchk1: bad dscn] ORA-600 [2663] in Physical Standby after switchover – superseded
IIII 8865718 10.2.0.5.3, 11.1.0.7.4, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1 Recursive cursors for MV refresh not shared
IIII 8797501 11.2.0.2, 12.1.0.1 OERI [qksdsInitSample:2] from SQL Tuning
IIII 8771916 10.2.0.5.3, 11.1.0.7.6, 11.2.0.1.BP12, 11.2.0.2, 12.1.0.1 OERI [kdsgrp1] during CR read
IIII 8763922 11.1.0.7.5, 11.2.0.1.BP09, 11.2.0.2, 12.1.0.1 Dump (kgghash) from bind peeking for RAW data types
IIII 8730312 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1 wrong Null ASH data may cause dumps and kew* messages in alert.log
IIII 8666117 10.2.0.5.5, 11.2.0.2, 12.1.0.1 High row cache latch contention in RAC
IIII 8553944 11.2.0.2, 12.1.0.1 SYSAUX tablespace grows
IIII 8547978 11.2.0.2.9, 11.2.0.2.BP19, 11.2.0.3.6, 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1 Online redefinition corrupts dictionary / ORA-600[kqd-objerror$] from DROP USER
IIII 8496830 11.1.0.7.3, 11.2.0.1.1, 11.2.0.1.BP03, 11.2.0.2, 12.1.0.1 ORA-8176 while inserting into global temp table
IIII 8477973 11.2.0.2, 12.1.0.1 Multiple open DB links / ORA-2020 / distributed deadlock / ORA-600 possible using DB Links
IIII 8434467 11.2.0.2, 12.1.0.1 SubOptimal Execution Plan for queries over V$RMAN_BACKUP_JOB_DETAILS
IIII 8223165 11.2.0.1.BP11, 11.2.0.2.3, 11.2.0.2.BP07, 11.2.0.3, 12.1.0.1 ORA-600 [ktsxtffs2] During Startup When Using Temporary Tablespace Group
IIII 8211733 10.2.0.5.3, 11.1.0.7.8, 11.2.0.2, 12.1.0.1 Shared pool latch contention when shared pool is shrinking
IIII 5702977 11.2.0.4, 12.1.0.1 Wrong cardinality estimation for “is NULL” predicate on a remote table – withdrawn
P IIII 13604285 11.2.0.4, 12.1.0.0 Solaris: ora.net1.network keeps failing on Solaris 11
E IIII 8857940 12.1.0.0 Enhancement to group durations to help reduce chance of ORA-4031
IIII 14313519 11.2.0.4 ORA-7445 [ktspsrch_reset] / ORA-7445 [ktspsrch_cbk] can occur (11g fix for bug 14040124)
IIII 12979199 11.2.0.2.BP15, 11.2.0.3.BP03, 11.2.0.4 ORA-1466 querying a Global Temporary Table in a READ ONLY transaction
IIII 12633340 11.2.0.2.6, 11.2.0.2.BP13, 11.2.0.3 Heavy “library cache lock” and “library cache: mutex X” contention for a “$BUILD$.xx” lock
IIII 10378005 11.2.0.2.3, 11.2.0.2.BP08, 11.2.0.3 ORA-600 [kolrarfc: invalid lob type] from LOB garbage collection
IIII 8579188 10.2.0.5, 11.2.0.2 CATALOG BACKUPIECE introduces invalid DATE (ORA-1861 produced by RMAN)
IIII 3934729 10.1.0.5, 10.2.0.3, 11.2.0.2, 9.2.0.7 Random dumps (nstimexp) using DCD
P IIII 10190759 PSEONLY AIX: Processes consuming additional memory due to “Work USLA Heap”
IIII 14508968 11.2.0.3.10, 11.2.0.3.BP12, 11.2.0.4, 12.1.0.1 ORA-600 [504] [ges process parent latch] during logon in RAC
IIII 13004894 11.2.0.3.BP02, 11.2.0.4, 12.1.0.1 Wrong results with SQL_TRACE or 10046 or STATISTICS_LEVEL=ALL / Slow Parse
IIII 14013094 11.2.0.3.BP10, 11.2.0.4 DBMS_STATS places statistics in the wrong index partition
IIII 17230530 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4 ORA-600 [kkzqid2fro] after apply 11.2.0.3.7 DB PSU
IIII 6904068 High CPU usage when there are “cursor: pin S” waits
IIII 9593134 11.2.0.2 DNS or NIS mis-configuration can cause slow database connects
IIII 9267837 11.1.0.7.8, 11.2.0.2 Auto-SGA policy may see larger resizes than needed
IIII 9002336 11.2.0.1.BP05, 11.2.0.2 Assorted Dumps with DISTINCT & WITH clause
IIII 8554900 11.2.0.2 PMON can crash the instance with OERI [ksnwait:nsevwait]
IIII 8625762 11.1.0.7.3, 11.2.0.1 ORA-3137 [12333] due to bind data not read from wire
* IIII 8199533 10.2.0.5, 11.2.0.1 NUMA enabled by default can cause high CPU / OERI
IIII 7686855 11.2.0.1 ORA-600[kjucnl(pmon):!dead] from PMON cleaning dead distributed transaction
* IIII 7662491 10.2.0.4.2, 10.2.0.5, 11.1.0.7.4, 11.2.0.1 Array Update can corrupt a row. Errors OERI[kghstack_free1] or OERI[kddummy_blkchk][6110]
+ IIII 7653579 11.1.0.7.2, 11.2.0.1 IPC send timeout in RAC after only short period
IIII 7648406 10.2.0.5, 11.1.0.7.4, 11.2.0.1 Child cursors not shared for “table_…” cursors (that show as “SQL Text Not Available”) when NLS_LENGTH_SEMANTICS = CHAR
IIII 7643188 10.2.0.5, 11.1.0.7.2, 11.2.0.1 Invalid / corrupt AWR SQL statistics
IIII 7626014 11.1.0.7.5, 11.2.0.1 OERI[kksfbc-new-child-thresh-exceeded] can occur / unnecessary child cursors
IIII 7523755 10.2.0.5, 11.2.0.1 “WARNING:Oracle process running out of OS kernel I/O resources” messages
IIII 7411568 10.2.0.5, 11.2.0.1 ORA-600[kcbbpibr_waitall_2] can occur
IIII 7385253 10.2.0.4.1, 10.2.0.5, 11.1.0.7.3, 11.2.0.1 Slow Truncate / DBWR uses high CPU / CKPT blocks on RO enqueue
IIII 7312791 10.2.0.5, 11.2.0.1 Dump (kokacau) if AQ client aborts with an active TX for dequeued message
IIII 7291739 10.2.0.4.4, 10.2.0.5, 11.2.0.1 Contention with auto-tuned undo retention or high TUNED_UNDORETENTION
IIII 7189722 10.2.0.5, 11.2.0.1 Frequent grow/shrink SGA resize operations
IIII 7039896 10.2.0.4.1, 10.2.0.5, 11.2.0.1 Spin under kghquiesce_regular_extent holding shared pool latch with AMM
IIII 6960699 10.2.0.5, 11.1.0.7, 11.2.0.1 “latch: cache buffers chains” contention/ORA-481/kjfcdrmrfg: SYNC TIMEOUT/ OERI[kjbldrmrpst:!master]
IIII 6918493 11.2.0.1 Net DCD (sqlnet.expire_time>0) can cause OS level mutex hang, possible to get PMON failed to acquired latch
IIII 6851110 10.2.0.5, 11.1.0.7.1, 11.2.0.1 ASMB process memory leak
IIII 6471770 10.2.0.5, 11.1.0.7, 11.2.0.1 ora-32690/OERI [32695] [hash aggregation can’t be done] from Hash GROUP BY
D IIII 6376915 10.2.0.4, 11.1.0.7, 11.2.0.1 HW enqueue contention for ASSM LOB segments
IIII 6196748 10.2.0.5, 11.1.0.7.3, 11.2.0.1 Dump in ksxpmprp() during logoff with multiple sessions in one process
IIII 6139856 10.2.0.5, 11.1.0.7, 11.2.0.1 Memory corruption in Net nsevrec leading to dump
IIII 6113783 11.2.0.1 Arch processes can hang indefinitely on network
C IIII 6085625 10.2.0.4, 11.1.0.7, 11.2.0.1 Wrong child cursor may be executed which has mismatching bind information
IIII 6034072 11.2.0.1 ORA-7445 [kgxMutexHng] / ORA-600 [ksdhng_callcbk: bad session 1] from hang analysis
D IIII 6795880 10.2.0.5, 11.1.0.7 Session spins / OERI after ‘kksfbc child completion’ wait – superceded
IIII 6122696 10.2.0.5 ORA-7445 [osnsgl] after ORA-3115 error
IIII 8449495 ORA-600 [17280] if client killed when fetching from pipelined PLSQL function
IIII 5939230 10.2.0.5, 11.1.0.6 Dump [kkeidc] / memory corruption from query over database link
IIII 5890966 10.2.0.4, 11.1.0.6 Intermittent ORA-6502 with package level associative array
+ IIII 5868257 10.2.0.4.1, 10.2.0.5, 11.1.0.6 Dump / memory corruption from DMLs
IIII 5736850 10.2.0.4, 11.1.0.6 SGA corruption / crash from PQO bloom filter
IIII 5655419 11.1.0.6 Distributed transaction hits ORA-600:[1265] or ORA-600:[k2gget: downgrade] in 10.2
* IIII 5605370 10.2.0.4, 11.1.0.6 Various dumps / instance crash possible
IIII 5508574 10.2.0.4, 11.1.0.6 OERI[504] / OERI[99999] / Dump [kgscdump] with > 31 CPUs
IIII 5497611 10.2.0.4, 11.1.0.6 OERI[qctVCO:csform] from Xquery using XMLType constructor
PI IIII 5496862 10.2.0.3, 11.1.0.6 AIX: Mandatory patch to use Oracle with IBM Technology Level 5 (5300-5)
IIII 4937225 10.2.0.3, 11.1.0.6 ORA-22 from OCIStmtExecute after OCISessionBegin
IIII 4483084 11.1.0.6 ORA-600 [LibraryCacheNotEmptyOnClose] on shutdown
IIII 14076510 10.2.0.5.8 ORA-600 [ktrgcm_3] in 10.2.0.5.3 – 10.2.0.5.7
IIII 7612454 10.2.0.5.4 More “direct path read” operations / OERI:kcblasm_1
IIII 7706062 10.2.0.5 OERI [17087] following concurrent hard parses on same cursor
* IIII 7190270 10.2.0.4.1, 10.2.0.5 Various ORA-600 errors / dictionary inconsistency from CTAS / DROP
IIII 6852598 10.2.0.4.4, 10.2.0.5 Dump / corrupt library cache lock free list
IIII 4518443 10.2.0.3 Listener hang under load
+ IIII 7038750 10.2.0.4.1, 10.2.0.5 Dump (ksuklms) / instance crash
IIII 8575528 Missing entries in V$MUTEX_SLEEP.location
IIII 4359111 9.2.0.8, 10.1.0.5, 10.2.0.2 OERI [17281][1001] can occur on session switch in UPI mode
IIII 5671074 ORA-4052/ORA-3106 on create / refresh of materialized view

'*' indicates that an alert exists for that issue.
'+' indicates a particularly notable issue / bug.
See Note:1944526.1 for details of other symbols used

参考:
Documented Database Bugs With High “Solved SR” Count (Doc ID 2099231.1 INTERNAL)

Oracle并发(CONCURREMT)收集统计信息

$
0
0

对于大表的统计信息收集,我们可以加degree参数,使得扫描大表的时候,进行并行扫描,加快扫描速度。
但是这在收集的时候,还是进行一个表一个表的扫描。并没有并发的扫描各个表。在oracle 11.2.0.2之后,就有了一个参数,可以并发扫描表(或者分区),这就是CONCURRENT参数。你可以通过

SELECT DBMS_STATS.get_prefs('CONCURRENT') FROM dual;

看到你的数据库是否启用了CONCURRENT收集统计信息。

开启方式为:

SQL> begin
  2   dbms_stats.set_global_prefs('CONCURRENT','TRUE');
  3  end;
  4  /

开启concurrent之后,收集统计信息就会以并发的形式进行,会并发出多个job进程。
其收集方式如下图:

从测试结果看,启用concurrent的收集统计信息速度对比:
schema级别的收集,XXX_SCHEMA下有400个多segment,大约20多GB:
默认:

exec dbms_stats.gather_schema_stats(ownname => 'XXX_SCHEMA');
--263秒

开启8个并发:

exec dbms_stats.gather_schema_stats(ownname => 'XXX_SCHEMA',degree => 8);
--95秒。

开启concurrent+8个并发:

begin
dbms_stats.set_global_prefs('CONCURRENT','TRUE');
end;

exec dbms_stats.gather_schema_stats(ownname => 'XXX_SCHEMA',degree => 8);
--61秒

database级别的收集:(600多G数据,9万多个segment)
默认:

exec sys.dbms_stats.gather_database_stats;
--9小时

开启concurrent+8个并发:

begin
dbms_stats.set_global_prefs('CONCURRENT','TRUE');
end;


exec dbms_stats.sys.dbms_stats.gather_database_stats(degree => 8);
--4小时

需要注意的是:
1. 用concurrent收集统计信息,需要收集统计信息用户具有以下权限:
CREATE JOB
MANAGE SCHEDULER
MANAGE ANY QUEUE

即使是该用户具有了dba角色,也还是需要显式授权上述权限。
不然执行job的时候,可能会报错
ORA-27486 insufficient privileges和ORA-20000: Statistics collection failed for 32235 objects in the database

2. concurrent不能控制多少的并发度,所以如果数据库的初始化参数job_queue_processes设置的太高,(注意,在11.2.0.3之后,这个值的默认值是1000,所以就可能并发出1000个job。)
如在测试时,某测试库设置了60个job_queue_processes的时候,数据库中就会并发出60个job来收集统计信息。此时的top情况,可以看到CPU的user部分基本已经在90%以上了。

top - 11:31:08 up 118 days, 19:28,  2 users,  load average: 30.65, 28.13, 25.64
Tasks: 728 total,  50 running, 678 sleeping,   0 stopped,   0 zombie
Cpu(s): 91.7%us,  7.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.7%si,  0.0%st
Mem:  16467504k total, 16375356k used,    92148k free,   119896k buffers
Swap:  6094844k total,  2106168k used,  3988676k free,  8952852k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19295 ora       20   0 8856m 154m 119m R 22.9  1.0   0:01.44 ora_j030_mydb12
18503 ora       20   0 8856m 583m 548m R 21.0  3.6   0:25.02 ora_j032_mydb12
19042 ora       20   0 8856m 332m 297m R 21.0  2.1   0:09.21 ora_j026_mydb12
19162 ora       20   0 8856m 273m 238m R 21.0  1.7   0:05.51 ora_j020_mydb12
19203 ora       20   0 8856m 198m 164m R 21.0  1.2   0:02.66 ora_j035_mydb12
19211 ora       20   0 8856m 243m 208m R 21.0  1.5   0:04.03 ora_j024_mydb12
18550 ora       20   0 8856m 526m 491m R 20.0  3.3   0:21.06 ora_j033_mydb12
19009 ora       20   0 8856m 305m 271m R 20.0  1.9   0:07.84 ora_j031_mydb12
18792 ora       20   0 8857m 502m 467m R 19.6  3.1   0:18.23 ora_j022_mydb12
19199 ora       20   0 8856m 204m 169m R 19.3  1.3   0:03.31 ora_j025_mydb12
19137 ora       20   0 8857m 401m 367m R 19.0  2.5   0:06.67 ora_j011_mydb12
14518 ora       20   0 8857m 3.7g 3.6g R 18.3 23.3   1:25.49 ora_j003_mydb12
...
19128 ora       20   0 8857m 257m 222m R 17.0  1.6   0:04.57 ora_j034_mydb12
19255 ora       20   0 8856m 208m 173m R 17.0  1.3   0:02.79 ora_j000_mydb12
19065 ora       20   0 8856m 437m 402m R 16.7  2.7   0:09.31 ora_j001_mydb12
19073 ora       20   0 8856m 262m 227m R 16.7  1.6   0:05.53 ora_j038_mydb12
19195 ora       20   0 8848m 246m 215m R 16.7  1.5   0:04.21 ora_j004_mydb12
19112 ora       20   0 8857m 297m 262m D 16.4  1.9   0:06.68 ora_j017_mydb12
19299 ora       20   0 8856m 155m 120m R 16.4  1.0   0:01.21 ora_j037_mydb12
12088 ora       20   0 8872m 1.4g 1.3g R 16.0  8.8   6:59.12 ora_j021_mydb12
19108 ora       20   0 8856m 310m 275m R 16.0  1.9   0:06.90 ora_j006_mydb12
19191 ora       20   0 8856m 233m 198m R 16.0  1.5   0:04.01 ora_j016_mydb12
19259 ora       20   0 8829m 174m 163m R 15.7  1.1   0:02.84 ora_j008_mydb12
18536 ora       20   0 8857m 516m 481m R 15.4  3.2   0:19.72 ora_j040_mydb12
18939 ora       20   0 8856m 322m 287m R 15.4  2.0   0:07.44 ora_j039_mydb12

所以开启concurrent的另外一个建议,就是使用resource manager。

3. 观察concurrent收集的进度:

select job_name, state, comments
from dba_scheduler_jobs
where job_class like 'CONC%';

select state,count(*)
from dba_scheduler_jobs
where job_class like 'CONC%';
group by state;

4. 当启用concurrent的时候,同时再使用并行,建议将PARALLEL_ADAPTIVE_MULTI_USER设置成false,关闭并发度的自适应调整。
默认值是true,当使用默认值时,使自适应算法,在查询开始时基于系统负载来自动减少被要求的并行度。实际的并行度基于默认、来自表或hints的并行度,然后除以一个缩减因数。该算法假设系统已经在单用户环境下进行了最优调整。

5. EBS系统应用是采用自己的并发管理器(FND_STATS)来收集统计信息,而收集统计信息用户往往是没有显式授权CREATE JOB、MANAGE SCHEDULER、MANAGE ANY QUEUE的。且EBS中用户众多,不可能为这些应用用户都显式授权。
所以在EBS中不能开启concurrent参数。EBS的安装文档中(Doc ID 396009.1),也是说将数据上收集统计信息的功能关闭的(_optimizer_autostats_job=false)

参考:
https://blogs.oracle.com/optimizer/entry/gathering_optimizer_statistics_is_one
http://blog.csdn.net/lukeUnique/article/details/51705922
Doc ID 1555451.1 – FAQ: Gathering Concurrent Statistics Using DBMS_STATS Frequently Asked Questions
Doc ID 396009.1 – Database Initialization Parameters for Oracle E-Business Suite Release 12

mongodb添加arbiter节点

$
0
0

创建mongodb的replica set的时候,只是做成了1主2从,没有做成1主1从1仲裁。这我们将一个几点从replica set中删除,再以仲裁节点的身份加入到replica set中:

1.初始状态:

shard1ReplSet:PRIMARY> rs.status();rs.status();
{
        "set" : "shard1ReplSet",
        "date" : ISODate("2017-02-21T07:48:03.058Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1487663274, 1),
                        "t" : NumberLong(1)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1487587982, 1),
                        "t" : NumberLong(-1)
                }
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "10.13.0.130:22001",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 76672,
                        "optime" : {
                                "ts" : Timestamp(1487663274, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2017-02-21T07:47:54Z"),
                        "electionTime" : Timestamp(1487587993, 1),
                        "electionDate" : ISODate("2017-02-20T10:53:13Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "10.13.0.131:22001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 75300,
                        "optime" : {
                                "ts" : Timestamp(1487663274, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1487587982, 1),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("2017-02-21T07:47:54Z"),
                        "optimeDurableDate" : ISODate("2017-02-20T10:53:02Z"),
                        "lastHeartbeat" : ISODate("2017-02-21T07:48:02.150Z"),
                        "lastHeartbeatRecv" : ISODate("2017-02-21T07:48:02.215Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "10.13.0.132:22001",
                        "configVersion" : 1
                },
                {
                        "_id" : 2,
                        "name" : "10.13.0.132:22001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 75300,
                        "optime" : {
                                "ts" : Timestamp(1487663274, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1487587982, 1),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("2017-02-21T07:47:54Z"),
                        "optimeDurableDate" : ISODate("2017-02-20T10:53:02Z"),
                        "lastHeartbeat" : ISODate("2017-02-21T07:48:02.889Z"),
                        "lastHeartbeatRecv" : ISODate("2017-02-21T07:48:01.503Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "10.13.0.130:22001",
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}
shard1ReplSet:PRIMARY>
shard1ReplSet:PRIMARY>
shard1ReplSet:PRIMARY>
shard1ReplSet:PRIMARY>

2.删除节点:

shard1ReplSet:PRIMARY> rs.remove("10.13.0.132:22001"); rs.remove("10.13.0.132:22001");
{ "ok" : 1 }
shard1ReplSet:PRIMARY> rs.status();rs.status();
{
        "set" : "shard1ReplSet",
        "date" : ISODate("2017-02-21T07:50:52.934Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1487663447, 1),
                        "t" : NumberLong(1)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1487587982, 1),
                        "t" : NumberLong(-1)
                }
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "10.13.0.130:22001",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 76841,
                        "optime" : {
                                "ts" : Timestamp(1487663447, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2017-02-21T07:50:47Z"),
                        "electionTime" : Timestamp(1487587993, 1),
                        "electionDate" : ISODate("2017-02-20T10:53:13Z"),
                        "configVersion" : 2,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "10.13.0.131:22001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 75470,
                        "optime" : {
                                "ts" : Timestamp(1487663447, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1487587982, 1),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("2017-02-21T07:50:47Z"),
                        "optimeDurableDate" : ISODate("2017-02-20T10:53:02Z"),
                        "lastHeartbeat" : ISODate("2017-02-21T07:50:51.182Z"),
                        "lastHeartbeatRecv" : ISODate("2017-02-21T07:50:52.212Z"),
                        "pingMs" : NumberLong(0),
                        "configVersion" : 2
                }
        ],
        "ok" : 1
}
shard1ReplSet:PRIMARY>
shard1ReplSet:PRIMARY>
shard1ReplSet:PRIMARY>

3.添加成arbiter节点:

shard1ReplSet:PRIMARY> rs.addArb("10.13.0.132:22001");rs.addArb("10.13.0.132:22001");
{ "ok" : 1 }
shard1ReplSet:PRIMARY>
shard1ReplSet:PRIMARY> rs.status();rs.status();
{
        "set" : "shard1ReplSet",
        "date" : ISODate("2017-02-21T07:54:05.161Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1487663637, 1),
                        "t" : NumberLong(1)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1487587982, 1),
                        "t" : NumberLong(-1)
                }
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "10.13.0.130:22001",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 77034,
                        "optime" : {
                                "ts" : Timestamp(1487663637, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2017-02-21T07:53:57Z"),
                        "electionTime" : Timestamp(1487587993, 1),
                        "electionDate" : ISODate("2017-02-20T10:53:13Z"),
                        "configVersion" : 3,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "10.13.0.131:22001",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 75662,
                        "optime" : {
                                "ts" : Timestamp(1487663637, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1487587982, 1),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("2017-02-21T07:53:57Z"),
                        "optimeDurableDate" : ISODate("2017-02-20T10:53:02Z"),
                        "lastHeartbeat" : ISODate("2017-02-21T07:54:03.210Z"),
                        "lastHeartbeatRecv" : ISODate("2017-02-21T07:54:02.211Z"),
                        "pingMs" : NumberLong(0),
                        "configVersion" : 3
                },
                {
                        "_id" : 2,
                        "name" : "10.13.0.132:22001",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 5,
                        "lastHeartbeat" : ISODate("2017-02-21T07:54:03.214Z"),
                        "lastHeartbeatRecv" : ISODate("2017-02-21T07:54:02.274Z"),
                        "pingMs" : NumberLong(0),
                        "configVersion" : 3
                }
        ],
        "ok" : 1
}
shard1ReplSet:PRIMARY>

注1: mongodb 3.4之后,虽然要求config server为replica set,但是不支持arbiter。添加的时候,会报错:

cfgReplSet:PRIMARY> rs.addArb("10.13.0.132:21000");rs.addArb("10.13.0.132:21000");
{
        "ok" : 0,
        "errmsg" : "Arbiters are not allowed in replica set configurations being used for config servers",
        "code" : 103,
        "codeName" : "NewReplicaSetConfigurationIncompatible"
}
cfgReplSet:PRIMARY>

注2:rs.reconfig()也可以进行操作,效果类似re.remove+rs.addArb,详见:mongodb官方文档- Remove Members from Replica Set

Oracle在12.1.0.2开始改变了补丁策略

$
0
0

Oracle在12.1.0.2开始,改变了补丁策略:在12.1.0.2之前,即12.1.0.1,11.2.0.4或者更早的版本之前,是推荐使用PSU的补丁策略的。但是从12.1.0.2开始,oracle更推荐使用Database Proactive Bundle Patches(简称DPBP)。

DPBP的内容,包含了PSU的内容,也包含了SPU的内容。可以说是一个十全大补丸。以170418为例,大小为2.1G,解压缩后近7G,要求system free space 13G。

DPBP适用于Engineered Systems,DB In-Memory,也适用于Non-Engineered Systems,RAC和非RAC。里面包含的补丁包括GI和DB,以及ACFS。

值得注意的是,DPBP,psu,spu是不同的路,一旦选择走某一条路,就不能走另外的路了。也就是说,如果你之前选择打PSU,那么如果你要改成打bp,就必须回滚之前的psu。

从发布的频率看,从12.1.0.2.160719开始,也是一个季度一次DPBP。

我们需要做的是:
1. 决定今后的补丁战略方向,是PSU还DPBP
2. 升级opatch到最新版本
3. opatchauto apply
4. opatch lsinventory检查
5. datapatch -verbose(之前是catch psu脚本)
6. select * from dba_registry_sqlpatch(之前是dba_hist_registry)

参考:
Oracle Database – Overview of Database Patch Delivery Methods (Doc ID 1962125.1)
12.1.0.2 Database Proactive Bundle Patches / Bundle Patches for Engineered Systems and DB In-Memory – List of Fixes in each Bundle (Doc ID 1937782.1)
Quick Reference to Patch Numbers for Database/GI PSU, SPU(CPU), Bundle Patches and Patchsets (Doc ID 1454618.1)

打DPBP 170418补丁

$
0
0

上一篇文章中提到,从oracle 12.1.0.2之后,oracle就推荐打Database Proactive Bundle Patches(简称DPBP,参考Oracle Database – Overview of Database Patch Delivery Methods (Doc ID 1962125.1))

打补丁的过程,可以参考补丁的readme.html文档,这里简单记录一下打DPBP 170418补丁的过程。
整体来说:
DPBP 170418,补丁号为25433352,里面包含4个大的补丁包。

25397136 这其实是db的补丁集。但是不仅仅要打在db home,grid home也是需要的。
25481150 这其实是个grid的补丁集,OCW的意思是Oracle Cluster Ware,所以你从这个名字中也可以猜到是for grid用的。但是不仅仅要打在grid home,db home也是需要的。
25363750 这个是for ACFS的补丁集
21436941 这个是DBWLM,即DataBase WorkLoad Management组件。

步骤为:

1. 升级opatch到最新版本。注,在opatch 12.2.0.1.5之前,执行opatchauto时需要加-ocmrf [ocm response file]参数。如果使用这个版本之后,就不需要再加响应文件的参数了。另外,170418这个DPBP要求使用opatch版本至少为12.2.0.1.7。

2. [GRID_HOME]/OPatch/opatchauto apply [UNZIPPED_PATCH_LOCATION]/25433352。注意,这个命令需要在各个节点上依次(非并行)执行。执行的时候,会bring down crs和database,会给grid home和oracle home打上补丁。依次打的方式,也减少了停机时间。

3. datapatch -verbose。注,上面说了依次打减少了停机时间,但是停机时间还是需要的,就是在这里的运行datapatch的时间。这个步骤是升级数据字典,针对整个database的数据字典,因此只需在一个节点上跑就可以了。主要注意的是,如果是cdb模式,需要alter pluggable database all open,打开所有的pdb之后,再运行datapatch。

4. 打完之后建议用orachk检查一下。

日志如下:
运行前的信息

======Mon Jun 12 00:11:09 CST 2017========
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
               ONLINE  ONLINE       12102-rac1               STABLE
               ONLINE  ONLINE       12102-rac2               STABLE
               ONLINE  ONLINE       12102-rac3               STABLE
ora.DG_DATA.dg
               ONLINE  ONLINE       12102-rac1               STABLE
               ONLINE  ONLINE       12102-rac2               STABLE
               ONLINE  ONLINE       12102-rac3               STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       12102-rac1               STABLE
               ONLINE  ONLINE       12102-rac2               STABLE
               ONLINE  ONLINE       12102-rac3               STABLE
ora.net1.network
               ONLINE  ONLINE       12102-rac1               STABLE
               ONLINE  ONLINE       12102-rac2               STABLE
               ONLINE  ONLINE       12102-rac3               STABLE
ora.ons
               ONLINE  ONLINE       12102-rac1               STABLE
               ONLINE  ONLINE       12102-rac2               STABLE
               ONLINE  ONLINE       12102-rac3               STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.12102-rac1.vip
      1        ONLINE  ONLINE       12102-rac1               STABLE
ora.12102-rac2.vip
      1        ONLINE  ONLINE       12102-rac2               STABLE
ora.12102-rac3.vip
      1        ONLINE  ONLINE       12102-rac3               STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       12102-rac2               STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       12102-rac3               STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       12102-rac1               STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       12102-rac1               169.254.161.44 192.1
                                                             68.57.34,STABLE
ora.asm
      1        ONLINE  ONLINE       12102-rac1               Started,STABLE
      2        ONLINE  ONLINE       12102-rac2               Started,STABLE
      3        ONLINE  ONLINE       12102-rac3               Started,STABLE
ora.cdbrac.db
      1        ONLINE  ONLINE       12102-rac3               Open,STABLE
      2        ONLINE  ONLINE       12102-rac1               Open,STABLE
      3        ONLINE  ONLINE       12102-rac2               Open,STABLE
ora.cvu
      1        ONLINE  ONLINE       12102-rac2               STABLE
ora.gns
      1        ONLINE  ONLINE       12102-rac1               STABLE
ora.gns.vip
      1        ONLINE  ONLINE       12102-rac1               STABLE
ora.mgmtdb
      1        ONLINE  ONLINE       12102-rac1               Open,STABLE
ora.oc4j
      1        ONLINE  ONLINE       12102-rac2               STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       12102-rac2               STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       12102-rac3               STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       12102-rac1               STABLE
--------------------------------------------------------------------------------

SQL>  select * from dba_registry_sqlpatch;

no rows selected

SQL>

节点1的opatch版本检查,节点2,节点3的类似检查。

[root@12102-rac1 ~]# /u01/app/12.1.0.2/grid/OPatch/opatch version
OPatch Version: 12.2.0.1.9

OPatch succeeded.
[root@12102-rac1 ~]#

节点1的opatchauto,节点2,节点3的类似操作。

[root@12102-rac1 ~]# /u01/app/12.1.0.2/grid/OPatch/opatch version
OPatch Version: 12.2.0.1.9

OPatch succeeded.
[root@12102-rac1 ~]# /u01/app/12.1.0.2/grid/OPatch/opatchauto apply /u01/ora_inst/25433352

OPatchauto session is initiated at Sun Jun 11 21:43:31 2017

System initialization log file is /u01/app/12.1.0.2/grid/cfgtoollogs/opatchautodb/systemconfig2017-06-11_09-43-37PM.log.

Session log file is /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/opatchauto2017-06-11_09-44-37PM.log
The id for this session is R7M1

Executing OPatch prereq operations to verify patch applicability on home /u01/app/12.1.0.2/grid

Executing OPatch prereq operations to verify patch applicability on home /u01/app/oracle/product/12.1.0.2/db_1
 Patch applicability verified successfully on home /u01/app/12.1.0.2/grid

 Patch applicability verified successfully on home /u01/app/oracle/product/12.1.0.2/db_1


Verifying SQL patch applicability on home /u01/app/oracle/product/12.1.0.2/db_1
SQL patch applicability verified successfully on home /u01/app/oracle/product/12.1.0.2/db_1


Preparing to bring down database service on home /u01/app/oracle/product/12.1.0.2/db_1
Successfully prepared home /u01/app/oracle/product/12.1.0.2/db_1 to bring down database service


Bringing down CRS service on home /u01/app/12.1.0.2/grid
  Prepatch operation log file location: /u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/crspatch_12102-rac1_2017-06-11_09-49-08PM.log
CRS service brought down successfully on home /u01/app/12.1.0.2/grid


Performing prepatch operation on home /u01/app/oracle/product/12.1.0.2/db_1
Perpatch operation completed successfully on home /u01/app/oracle/product/12.1.0.2/db_1


Start applying binary patch on home /u01/app/oracle/product/12.1.0.2/db_1
  Binary patch applied successfully on home /u01/app/oracle/product/12.1.0.2/db_1


Performing postpatch operation on home /u01/app/oracle/product/12.1.0.2/db_1
Postpatch operation completed successfully on home /u01/app/oracle/product/12.1.0.2/db_1


Start applying binary patch on home /u01/app/12.1.0.2/grid
   Binary patch applied successfully on home /u01/app/12.1.0.2/grid


Starting CRS service on home /u01/app/12.1.0.2/grid
   Postpatch operation log file location: /u01/app/12.1.0.2/grid/cfgtoollogs/crsconfig/crspatch_12102-rac1_2017-06-11_10-04-00PM.log
CRS service started successfully on home /u01/app/12.1.0.2/grid


Preparing home /u01/app/oracle/product/12.1.0.2/db_1 after database service restarted
No step execution required.........
Prepared home /u01/app/oracle/product/12.1.0.2/db_1 successfully after database service restarted


Trying to apply SQL patch on home /u01/app/oracle/product/12.1.0.2/db_1
  SQL patch applied successfully on home /u01/app/oracle/product/12.1.0.2/db_1

OPatchAuto successful.

--------------------------------Summary--------------------------------

Patching is completed successfully. Please find the summary as follows:

Host:12102-rac1
RAC Home:/u01/app/oracle/product/12.1.0.2/db_1
Summary:

==Following patches were SKIPPED:

Patch: /u01/ora_inst/25433352/21436941
Reason: This patch is not applicable to this specified target type - "rac_database"

Patch: /u01/ora_inst/25433352/25363750
Reason: This patch is not applicable to this specified target type - "rac_database"


==Following patches were SUCCESSFULLY applied:

Patch: /u01/ora_inst/25433352/25397136
Log: /u01/app/oracle/product/12.1.0.2/db_1/cfgtoollogs/opatchauto/core/opatch/opatch2017-06-11_21-52-02PM_1.log

Patch: /u01/ora_inst/25433352/25481150
Log: /u01/app/oracle/product/12.1.0.2/db_1/cfgtoollogs/opatchauto/core/opatch/opatch2017-06-11_21-52-02PM_1.log


Host:12102-rac1
CRS Home:/u01/app/12.1.0.2/grid
Summary:

==Following patches were SUCCESSFULLY applied:

Patch: /u01/ora_inst/25433352/21436941
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-06-11_21-57-02PM_1.log

Patch: /u01/ora_inst/25433352/25363750
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-06-11_21-57-02PM_1.log

Patch: /u01/ora_inst/25433352/25397136
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-06-11_21-57-02PM_1.log

Patch: /u01/ora_inst/25433352/25481150
Log: /u01/app/12.1.0.2/grid/cfgtoollogs/opatchauto/core/opatch/opatch2017-06-11_21-57-02PM_1.log



OPatchauto session completed at Sun Jun 11 22:13:40 2017
Time taken to complete the session 30 minutes, 9 seconds
[root@12102-rac1 ~]#
[root@12102-rac1 ~]#

节点1的datapatch,节点2,节点3无需操作。

[oracle@12102-rac1 OPatch]$ sqlplus "/ as sysdba"

SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 12 00:19:54 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options

SQL> alter pluggable database all open;

Pluggable database altered.

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Advanced Analytics and Real Application Testing options
[oracle@12102-rac1 OPatch]$ cd $ORACLE_HOME/OPatc
[oracle@12102-rac1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 12.1.0.2.0 Production on Mon Jun 12 00:20:42 2017
Copyright (c) 2012, 2017, Oracle.  All rights reserved.

Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/sqlpatch_26172_2017_06_12_00_20_42/sqlpatch_invocation.log

Connecting to database...OK
Note:  Datapatch will only apply or rollback SQL fixes for PDBs
       that are in an open state, no patches will be applied to closed PDBs.
       Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation
       (Doc ID 1585822.1)
Bootstrapping registry and package to current versions...done
Determining current state... done

Current state of SQL patches:
Bundle series DBBP:
  ID 170418 in the binary registry and not installed in any PDB

Adding patches to installation queue and performing prereq checks...
Installation queue:
  For the following PDBs: CDB$ROOT PDB$SEED PDBRAC1 PDBRAC2
    Nothing to roll back
    The following patches will be applied:
      25397136 (DATABASE BUNDLE PATCH 12.1.0.2.170418)

Installing patches...



Patch installation complete.  Total patches installed: 4

Validating logfiles...
Patch 25397136 apply (pdb CDB$ROOT): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/25397136/21145057/25397136_apply_CDBRAC_CDBROOT_2017Jun12_00_23_03.log (no errors)
Patch 25397136 apply (pdb PDB$SEED): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/25397136/21145057/25397136_apply_CDBRAC_PDBSEED_2017Jun12_00_42_12.log (no errors)
Patch 25397136 apply (pdb PDBRAC1): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/25397136/21145057/25397136_apply_CDBRAC_PDBRAC1_2017Jun12_00_42_12.log (no errors)
Patch 25397136 apply (pdb PDBRAC2): SUCCESS
  logfile: /u01/app/oracle/cfgtoollogs/sqlpatch/25397136/21145057/25397136_apply_CDBRAC_PDBRAC2_2017Jun12_00_42_10.log (no errors)
SQL Patching tool complete on Mon Jun 12 00:59:05 2017


SQL> select PATCH_ID,PATCH_UID,VERSION,ACTION,STATUS,DESCRIPTION,BUNDLE_SERIES,BUNDLE_ID from dba_registry_sqlpatch
  2
SQL> /

  PATCH_ID  PATCH_UID VERSION              ACTION          STATUS          DESCRIPTION                                                                                          BUNDLE_SERIES                   BUNDLE_ID
---------- ---------- -------------------- --------------- --------------- ---------------------------------------------------------------------------------------------------- ------------------------------ ----------
  25397136   21145057 12.1.0.2             APPLY           SUCCESS         DATABASE BUNDLE PATCH 12.1.0.2.170418                                                                DBBP                               170418

SQL>
[oracle@12102-rac1 OPatch]$

最后在用orachk检查一下。

题外话,你可能会觉得奇怪,我db打了25397136和25481150两个补丁,为什么我只看到了PATCH_ID为25397136的注册信息,没看到25481150的信息?

这是因为,25481150没有产生需要执行的sqlpatch脚本,首先,你可以在$ORACLE_HOME/sqlpatch下是没有看到与25481150相关的需要执行的脚本的。

[oracle@12102-rac1 sqlpatch]$ pwd
/u01/app/oracle/product/12.1.0.2/db_1/sqlpatch
[oracle@12102-rac1 sqlpatch]$ ls
20243804  20594149  20950328  21359749  21694919  22806133  24340679  25397136  sqlpatch      sqlpatch_bootstrap_driver.sql  sqlpatch.pl
20415006  20788771  21125181  21527488  21949015  23144544  24732088  lib       sqlpatch.bat  sqlpatch_bootstrap.sql         sqlpatch.pm
[oracle@12102-rac1 sqlpatch]$

而为什么没有在$ORACLE_HOME/sqlpatch下生产脚本,这是由于25481150这个补丁不需要生产执行的脚本。我们可以通过查验action.xml确认。

[oracle@12102-rac1 config]$ pwd
/u01/ora_inst/25433352/25481150/etc/config
[oracle@12102-rac1 config]$
[oracle@12102-rac1 config]$ ls -l
total 168
-rw-r--r--. 1 oracle oinstall 102444 Mar 28 21:00 actions.xml
-rw-rw-r--. 1 oracle oinstall  63645 Mar 28 21:01 inventory.xml
[oracle@12102-rac1 config]$
[oracle@12102-rac1 config]$
[oracle@12102-rac1 config]$ cat actions.xml |grep -i sqlpatch
[oracle@12102-rac1 config]$

我们看到是不包含放到sqlpatch目录下的操作。(一般会是onewaycopy具体的脚本)

注:补丁的所有操作,都会写在这个action.xml文件里面。如备份文件,拷贝文件,删除文件,编译文件等等。注意需要最新的opatch,才能执行最新的action.xml。

令人误解的ORA-16047: DGID mismatch between destination setting and target database

$
0
0

今天为一个Rac onenode的主库搭建好一个dg库之后,一直没有传日志,在主库的alertlog中,有报错:

Thu Jun 29 14:55:34 2017
ALTER SYSTEM SET log_archive_dest_state_2='DEFER' SCOPE=BOTH;
Thu Jun 29 14:55:44 2017
ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH;
Thu Jun 29 14:55:44 2017
Errors in file /data/prd/oracle/diag/rdbms/payroll/payroll_2/trace/payroll_2_tt00_27554.trc:
ORA-16047: DGID mismatch between destination setting and target database
Thu Jun 29 14:55:44 2017
Errors in file /data/prd/oracle/diag/rdbms/payroll/payroll_2/trace/payroll_2_tt00_27554.trc:
ORA-16047: DGID mismatch between destination setting and target database
Thu Jun 29 14:55:44 2017
Errors in file /data/prd/oracle/diag/rdbms/payroll/payroll_2/trace/payroll_2_tt00_27554.trc:
ORA-16047: DGID mismatch between destination setting and target database
Thu Jun 29 14:55:45 2017
Thread 2 advanced to log sequence 632 (LGWR switch)
  Current log# 7 seq# 632 mem# 0: +DATA/PAYROLL/ONLINELOG/group_7.303.946836271
  Current log# 7 seq# 632 mem# 1: +FRA/PAYROLL/ONLINELOG/group_7.791.946836271
Thu Jun 29 14:55:45 2017

根据oerr的报错信息:

SQL> !oerr ora 16047
16047, 00000, "DGID mismatch between destination setting and target database"
// *Cause:  The DB_UNIQUE_NAME specified for the destination did not match
//          the DB_UNIQUE_NAME at the target database.
// *Action: Make sure the DB_UNIQUE_NAME specified in the LOG_ARCHIVE_DEST_n
//          parameter matches the DB_UNIQUE_NAME parameter defined at the
//          destination.

SQL>

检查了主库和备库的db_unique_name,发现都没有问题。

进一步检查v$dataguard_status发现:

可以看到有个16062的报错。我们看看ora-16062报错的含义:

SQL> !oerr ora 16062
16062, 00000, "standby database not in Data Guard configuration"
// *Cause:  The standby database was not found in the Data Guard configuration
//          of the server.
// *Action: Add the database unique name of the standby database to the
//          DG_CONFIG attribute of the LOG_ARCHIVE_CONFIG database
//          initialization parameter.

SQL>

检查主备库的LOG_ARCHIVE_CONFIG,发现主库已经配置,但是备库的LOG_ARCHIVE_CONFIG没有配置。所以ORA-16047的报错只是表象,造成日志不同步的真正的原因是ora-16062。

加上备库的LOG_ARCHIVE_CONFIG=DG_CONFIG,并且在主库重新defer+enable log_archive_dest_state_2之后,故障解决。

参考:
MOSC – archive not getting shipped to standby

Redis学习笔记

$
0
0

Redis的官方网站是https://redis.io/,也有中文的网站 http://www.redis.cn/
Redis 当前的稳定版本是3.2(具体是3.2.9),最新版本是4.0。

在本文你将看到:
1. Redis的基础知识,如redis的数据类型,redis的安装配置,redis的主要参数设置等等。
2. Redis的主从复制,以及Redis的自动主从切换的高可用架构(Sentinel)
3. Redis的集群高可用架构,即Redis Cluster(包含主从自动切换和数据分片)
4. Redis的监控
5. Redis的docker化。
6. Redis 4.0的新特性

一、Redis基础知识

1. redis是一个内存数据库,是key value的方式记录数据。redis是单进程单线程,所以只占用一个cpu,所以在监控时候,多CPU主机的平均使用cpu可能使用率低,但是可能redis进程使用的那个cpu已经打满。
redis的主要操作命令工具是redis-cli,提供交互命令行,类似sqlplus,进行数据的操作。

redis数据类型主要有如下5种:(其他还有bitmap,hyperloglog等等,这里不做讨论)
1.1 string类型:
• set 插入或者修改(注1:不能存相同的字符串;注2:无序,无左右)
• get 获取
• del 删除

127.0.0.1:6379> set name oracleblog
OK
127.0.0.1:6379> get name
"oracleblog"
127.0.0.1:6379> set name oracleblog
OK
127.0.0.1:6379> get name
"oracleblog"
127.0.0.1:6379>
#可以看到即使set了两次,但是其实只有一个值

应用场景:一般的key-value。注,一个value最大可以容纳512Mb长度的string值。

1.2 list类型
• lpush/rpush 将值插入左端/右端 (注:list可以存储多个相同的串)
• lrange 获取给定范围的列表中的值
• lindex 获取列表中的某个值
• lpop 从左边弹出列表中的一个值(注:pop之后,值就不在列表中了)

127.0.0.1:6379> lpush lname oracle
(integer) 1
127.0.0.1:6379> lpush lname mysql
(integer) 2
127.0.0.1:6379> lpush lname oracle
(integer) 3
127.0.0.1:6379> lpush lname mssql
(integer) 4
127.0.0.1:6379> lrange lname 0 100
1) "mssql"
2) "oracle"
3) "mysql"
4) "oracle"
127.0.0.1:6379>
127.0.0.1:6379> lindex lname 2
"mysql"
127.0.0.1:6379>

注,最多可以包含2^32个元素。

1.3 set类型
• sadd 插入(set通过hash列表保证自己存储的每个字符串是不同的,无序,无左右)
• smember 列出所有member
• sismember 判断是否为member
• srem移除

127.0.0.1:6379> sadd sname s-mysql
(integer) 1
127.0.0.1:6379> sadd sname s-oracle
(integer) 1
127.0.0.1:6379> sadd sname s-mssql
(integer) 1
127.0.0.1:6379> sadd sname s-oracle
(integer) 0
127.0.0.1:6379> sadd sname s-mssql
(integer) 0
127.0.0.1:6379> sadd sname s-redis
(integer) 1
127.0.0.1:6379>
127.0.0.1:6379> sadd sname s-mango s-postgres
(integer) 2
127.0.0.1:6379>
127.0.0.1:6379> smembers sname
1) "s-redis"
2) "s-mssql"
3) "s-oracle"
4) "s-mango"
5) "s-postgres"
6) "s-mysql"
127.0.0.1:6379>

使用场景:你我的共同朋友,共同爱好等等

1.4 hash类型
• hset 插入
• hget 获取指定hash列
• hgetall 获取所有hash列的所有键值
• hdel 如果给定键存在于hash列里面,则删除这个键。

127.0.0.1:6379> hset hname passwd1 dji123
(integer) 1
127.0.0.1:6379> hset hname passwd1 dji123
(integer) 0
127.0.0.1:6379> hset hname passwd1 dji124
(integer) 0
127.0.0.1:6379> hgetall hname
1) "passwd1"
2) "dji124"
127.0.0.1:6379>
#注,后一个替换掉了前一个。
127.0.0.1:6379> hset hname passwd2 dji222 passwd3 dji333 passwd4 dji444
(error) ERR wrong number of arguments for 'hset' command
127.0.0.1:6379>
127.0.0.1:6379> hmset hname passwd2 dji222 passwd3 dji333 passwd4 dji444
OK
127.0.0.1:6379> hgetall hname
1) "passwd1"
2) "dji124"
3) "passwd2"
4) "dji222"
5) "passwd3"
6) "dji333"
7) "passwd4"
8) "dji444"
127.0.0.1:6379>
#注,如果要一次set多个hash,需要hmset

1.5 zset类型(有序集合)
• zadd
• zrange
• zrangebyscore
• zrem

127.0.0.1:6379> zadd zname 1 oracle
(integer) 1
127.0.0.1:6379> zadd zname 2 mysql
(integer) 1
127.0.0.1:6379> zadd zname 3 mssql
(integer) 1
127.0.0.1:6379> zadd zname 3 redis
(integer) 1
127.0.0.1:6379> zrangebyscore zname 0 1000
1) "oracle"
2) "mysql"
3) "mssql"
4) "redis"
127.0.0.1:6379>
127.0.0.1:6379>
127.0.0.1:6379>
127.0.0.1:6379> zrange zname 0 1000
1) "oracle"
2) "mysql"
3) "mssql"
4) "redis"
127.0.0.1:6379>

使用场景:排行榜,投票等等

2.持久化
2.1 RDB,类似snapshot。
当符合一定条件时 redis 会folk一个进程,利用copy on write原理,自动将内存中的所有数据生成一份副本并保存到硬盘上。
过程:
遍历每个DB,遍历每个db的dict,获取每个dictEntry
获取key后查询expire,如过期就丢弃
将数据的key,value,expiretime等写入文件
计算checksum,通过checksum交换旧的rdb文件。

执行的前提条件:
1)配置自动快照的规则
2)用户执行了 SAVE 或 BGSAVE 命令
3)执行 FLUSHALL 命令
4)执行复制时
缺点:一旦 redis 程序退出,会丢失最后一次快照以后更改的所有数据。

相关参数有:
save 60 100
stop-write-on-bysave-error no
rdbcompression yes
dbfilename dump.rdb

注,bgsave
如果redis在虚拟机上,那么bgsave时间可能会加长。
redis进程每占用1G内存,bgsave创建子进程所需要的时间增加10~20ms
save和bgsave的区别:save一直阻塞到快照生成。而bgsave由子进程完成。

RDB文件解析:
以db0中只存在set msg “hello”为例:

2.2 AOF,类似归档,起到追加的作用。
注,每次数操作都会调用flushApendOnlyFile来刷新AOF,每次操作都需要fsync,前台线程阻塞。
注,选用ssd将明显提高aof的性能。

相关参数有:
appendonly yes
appendsync everysec
no-appendsync-on-rewrite no
auto-aof-rewrite-percent 100
auto-aof-rewrite-min-size 64mb
dir ~/

AOF文件解析:

[root@redis01 6399]# cat  appendonly.aof
*2         #2个参数
$6         #第一个参数长度是6
SELECT     #第一个参数值是SELECT
$1         #第二个参数长度是
0          #第二个参数的值是0
*3         #3个参数
$3         #第一个参数长度是3
SET        #第一个参数值是SET
$4         #第二个参数长度是4
col2       #第二个参数值是col2
$2         #第三个参数长度是2
v2         #第三个参数值是v2

也就是如下:
select 0 ##选择db0
set col2 v2 ##插入key-value,col2-v2。

AOF重写(BGREWRITEAOF):
目的:减少AOF文件大小
触发条件:
1. 发起命令bgrewriteaof
2. aof文件的大小增长超过一定比例,且aof文件实际大小超过一定

# Specify a percentage of zero in order to disable the automatic AOF
# rewrite feature.

auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

注,目前key的条目不多于64,如果多于64个条目,会进行拆分。

/* Static server configuration */
……
#define LOG_MAX_LEN    1024 /* Default maximum length of syslog messages */
#define AOF_REWRITE_PERC  100
#define AOF_REWRITE_MIN_SIZE (64*1024*1024)
#define AOF_REWRITE_ITEMS_PER_CMD 64
#define CONFIG_DEFAULT_SLOWLOG_LOG_SLOWER_THAN 10000
#define CONFIG_DEFAULT_SLOWLOG_MAX_LEN 128
……

2.3 数据完整性
如果允许几分钟的数据丢失。可以采用rdb,如果需要持续记录,那么可以采用aof。另外,从性能考虑,由于aof是持续写,可以将aof放在备库,主库只有rdb。

注,redis server异常crash后重启,将进行如下优先级操作:
如果只配置了aof,启动时加载aof
如果同步配置了aof和rdb,启动时只加载aof
如果只配置了rdb,启动时加载rdb的dump文件。

注,在linux 6(centos 6,redhat 6,oel 6)中,重启redis可以用/etc/init.d/redis-server restart命令,但是这个命令在重启的时候是不save的。就会导致如果不开aof,会丢失上次save之后的数据。
正确的做法是redis-cli之后,用shutdown命令(默认带save),或者shutdown save命令。不要用shutdown nosave。

如果在中途开启AOF,比较好的方式是:
a. 动态的修改CONFIG SET appendonly yes,此时会生成appendonly.aof 文件,不仅包含修改之前的值,还包含修改之后的值。
b. 修改redis.conf的值为appendonly yes
c. 在有停机窗口的时候,重启redis。

掉电导致AOF或者rdb文件损坏,相关修复工具:
redis-check-aof 检查、修复aof(会删除出错命令之后(含)所有的命令)
redis-check-dump 检查、修复rdb

3. Redis的key过期(expire)。
我们可以设置某个key过期:

127.0.0.1:6379> get testexpire
(nil)
127.0.0.1:6379> set testexpire value1
OK
127.0.0.1:6379> get testexpire
"value1"
127.0.0.1:6379>
127.0.0.1:6379>
127.0.0.1:6379> expire testexpire 60
(integer) 1
127.0.0.1:6379>
127.0.0.1:6379> ttl testexpire
(integer) 56
127.0.0.1:6379>
127.0.0.1:6379> set testexpire value2
OK
127.0.0.1:6379> get testexpire
"value2"
127.0.0.1:6379>
127.0.0.1:6379> ttl testexpire
(integer) -1
127.0.0.1:6379> ttl testexpire
(integer) -1
127.0.0.1:6379> get testexpire
"value2"
127.0.0.1:6379> ttl testexpire
(integer) -1
127.0.0.1:6379> ttl testexpire
(integer) -1
127.0.0.1:6379>
127.0.0.1:6379>
127.0.0.1:6379> expire testexpire 10
(integer) 1
127.0.0.1:6379> ttl testexpire
(integer) 8
127.0.0.1:6379> ttl testexpire
(integer) 5
127.0.0.1:6379> ttl testexpire
(integer) 1
127.0.0.1:6379> ttl testexpire
(integer) -2
127.0.0.1:6379> get testexpire
(nil)
127.0.0.1:6379>

上面的测试中,ttl key,返回-1表示不会expire,-2表示已经expired。大于0的数字表示剩余时间。另外可以看到,当set新值之后覆盖了原来的值,则设置在原来key上的expire也被取消了。

注1. Redis keys过期后叫volatile(在后面谈到设置maxmemory-policy的时候,会提到这个词),过期删除有两种方式:被动和主动方式。

当一些客户端尝试访问它时,key会被发现并主动的过期。

当然,这样是不够的,因为有些过期的keys,永远不会访问他们。
所以Redis每秒10次做的事情:
测试随机的20个keys进行相关过期检测。
删除所有已经过期的keys。
如果有多于25%的keys过期,重复步骤1.

注2.expire的限制:只能应用于整个键,而不能对键的某一部分数据做expire。也就是说,expire 列,不能expire 行。

注3. RDB对过期key的处理:过期key对RDB没有任何影响

从内存数据库持久化数据到RDB文件
持久化key之前,会检查是否过期,过期的key不进入RDB文件
从RDB文件恢复数据到内存数据库
数据载入数据库之前,会对key先进行过期检查,如果过期,不导入数据库(主库情况)

注4. AOF对过期key的处理:过期key对AOF没有任何影响

从内存数据库持久化数据到AOF文件:
当key过期后,还没有被删除,此时进行执行持久化操作(该key是不会进入aof文件的,因为没有发生修改命令)
当key过期后,在发生删除操作时,程序会向aof文件追加一条del命令(在将来的以aof文件恢复数据的时候该过期的键就会被删掉)
AOF重写
重写时,会先判断key是否过期,已过期的key不会重写到aof文件

不过期的话,到达maxmemory之后,所有和内存增加的操作都会报错。在64bit系统下,maxmemory默认设置为0表示不限制Redis内存使用,在32bit系统下,maxmemory隐式不能超过3GB。 所以在64位系统中,默认值是个危险的值。

当memory使用量到达maxmemory之后,将根据设置的maxmemory-policy的方式,进行内存回收。
maxmemory-policy可以设置的值有:
1. noeviction:返回错误当内存限制达到并且客户端尝试执行会让更多内存被使用的命令(大部分的写入指令,但DEL和几个例外)
2. allkeys-lru: 尝试回收最少使用的键(LRU),使得新添加的数据有空间存放。
3. volatile-lru: 尝试回收最少使用的键(LRU),但仅限于在过期集合的键,使得新添加的数据有空间存放。
4. allkeys-random: 回收随机的键使得新添加的数据有空间存放。
5. volatile-random: 回收随机的键使得新添加的数据有空间存放,但仅限于在过期集合的键。
6. volatile-ttl: 回收在过期集合的键,并且优先回收存活时间(TTL)较短的键,使得新添加的数据有空间存放。

注,redis采用的LRU算法是近似LRU算法,LRU的采样率通过设置如maxmemory-samples 5来确定。新版本的redis的近似LRU算法,在同等的maxmemory-samples条件下,比旧版本的好很多。

4 redis的安装。
4.1 主机相关参数配置(注, for Linux 7):
4.1.1. 选择文件系统至少为ext4,xfs更佳。

4.1.2. 关闭numa,关闭redis所在文件系统/分区的atime选项。

4.1.3. 如果是非SSD,设置文件系统IO调度方式为deadline,如果是SSD则为noop。

4.1.4. 调整kernel。
4.1.4.1 检查当前操作系统使用的tuned profile
cat /etc/tuned/active_profile
virtual-guest

4.1.4.2. 建立一个目录用来放for redis的tuned profile
mkdir /etc/tuned/for_redis

4.1.4.3. 将当前系统默认的tune profile复制到for redis 下:
cp /usr/lib/tuned/virtual-guest/tuned.conf /etc/tuned/for_redis/

4.1.4.4.修改/etc/tuned/for_redis/
[main]
include=throughput-performance

[vm]
transparent_hugepages=never

[sysctl]
vm.dirty_ratio = 30
vm.swappiness = 30
vm.overcommit_memory = 1

net.core.somaxconn = 65535

4.1.4.5.指定tuned profile为for_redis
tuned-adm profile for_redis

4.1.4.6.重启主机。

4.2 下载、解压redis:
mkdir /root/redis_install
cd /root/redis_install
wget http://download.redis.io/releases/redis-3.2.9.tar.gz
tar -zxvf /root/redis_install/redis-3.2.9.tar.gz
cd /root/redis_install/redis-3.2.9
make
make test
注,make test时如果报错You need tcl 8.5 or newer in order to run the Redis test,则需要yum install tcl,正常情况下,如果make test通过,则显示如下:
make install
mkdir -p /etc/redis
mkdir -p /var/redis
mkdir -p /var/redis/6379
cp /root/redis_install/redis-3.2.9/redis.conf /etc/redis/redis_6379.conf
修改redis_6379.conf

daemonize yes
pidfile /var/run/redis_6379.pid
logfile /var/log/redis_6379.log
dir /var/redis/6379
## 注释掉IP绑定,使得其他主机的客户端也可以连接redis
# bind 127.0.0.1
## 设置远程访问密码。本案例允许远程访问,已经取消设置bind为127.0.0.1和protected-mode yes
requirepass "oracleblog"
## 修改高危命令为error命令。
rename-command FLUSHDB "FLUSHDB_ORACLE_MASK"
rename-command FLUSHALL "FLUSHALL_ORACLE_MASK"
rename-command CONFIG "CONFIG_ORACLE_MASK"
##如果不是如下save值,请修改配置
save 900 1
save 300 10
save 60 10000
appendonly yes
##注,tcp-backlog需要小于操作系统设置的somaxconn大小
tcp-backlog 511
##最多设置maxmemory为内存40%。(40%用于redis,40%用于bgsave,20%用于系统)
maxmemory 838860800
maxmemory-policy allkeys-lru
maxmemory-samples 5

启动:redis-server /etc/redis/redis_6379.conf
连接:redis-cli -a oracleblog -h 192.168.56.108 -p 6380 (注,oracleblog就是在requiepass中设置的密码)
关闭:192.168.56.108:6379> shutdown save

5. 一个redis最多包含16个db,可以通过select进行跳转,move可以转移key到别的db。

127.0.0.1:6399> keys *
1) "col6"
2) "col2"
3) "col1"
4) "col5"
5) "col3"
6) "col4"
127.0.0.1:6399>
127.0.0.1:6399> select 1
OK
127.0.0.1:6399[1]> keys *
(empty list or set)
127.0.0.1:6399[1]>
127.0.0.1:6399[1]> set db1_col1 v1;
OK
127.0.0.1:6399[1]>
127.0.0.1:6399[1]>
127.0.0.1:6399[1]> keys *
1) "db1_col1"
127.0.0.1:6399[1]>
127.0.0.1:6399[1]>
127.0.0.1:6399[1]> select 0
OK
127.0.0.1:6399> move col2 1
(integer) 1
127.0.0.1:6399>
127.0.0.1:6399> select 1
OK
127.0.0.1:6399[1]> keys *
1) "col2"
2) "db1_col1"
127.0.0.1:6399[1]>

二、redis的主从复制和sentinel

1. 主从复制配置:
我们来配一个1主2从的redis。分别是在3台主机,3个端口上。
主:192.168.56.108 port 6379 –> 从1:192.168.56.109 port 6380 –> 从2:192.168.56.110 port 6381

各个主机上的配置文件:

/etc/redis/redis_6379.conf
不需要改


在运行状态下:
slaveof 192.168.56.108 6379
CONFIG_ORACLE_MASK set masterauth oracleblog
并且修改/etc/redis/redis_6380.conf
slaveof 192.168.56.108 6379
masterauth oracleblog


slaveof 192.168.56.109 6380
CONFIG_ORACLE_MASK set masterauth oracleblog
并且修改/etc/redis/redis_6381.conf
slaveof 192.168.56.109 6380
masterauth oracleblog

注1:复制启动过程中,从节点会丢弃旧数据(如果有的话)
注2:实际使中最好让redis主节点只使用50%~60%内存,留30%~45%用于bgsave。
注3:redis不支持主-主复制
注4:redis支持级联主从
注5:info命令看:aof-pending_bio_fsync是否为0,如果为0,表示主从同步正常。aof-pending_bio_fsync的含义是number of fsync pending job in background I/O queue
注6:从网上的测试看,启动复制,比没有复制的TPS会有所降低,在100 client并发的情况下,大约降低30%。响应时间,从0.8毫秒到1.2毫秒。

如何更换故障的主服务器:

(1)A -----------> B
(2)A(crash)     B
(3)A(crash)     B(运行save命令生成rdb)
(4)A(crash)     B -----(将rdb传输到C主机)-----  C
(5)A(crash)     B-----(slaveof C port)------> C

2. 高可用架构sentinel配置:
sentinel是redis实例的一个特殊模式,可以通过如下两种方式启动:
redis-sentinel /path/to/sentinel.conf

redis-server /path/to/sentinel.conf –sentinel

Sentinel 原理:

1. Sentinel 集群通过给定的配置文件发现 master,启动时会监控 master。通过向 master 发送 info 信息获得该服务器下面的所有从服务器。
2. Sentinel 集群通过命令连接向被监视的主从服务器发送 hello 信息 (每秒一次),该信息包括 Sentinel 本身的 IP、端口、id 等内容,以此来向其他 Sentinel 宣告自己的存在。
3. Sentinel 集群通过订阅连接接收其他 Sentinel 发送的 hello 信息,以此来发现监视同一个主服务器的其他 Sentinel;集群之间会互相创建命令连接用于通信,因为已经有主从服务器作为发送和接收 hello 信息的中介,Sentinel 之间不会创建订阅连接。
4. Sentinel 集群使用 ping 命令来检测实例的状态,如果在指定的时间内(down-after-milliseconds)没有回复或则返回错误的回复,那么该实例被判为下线。
5. 当 failover 主备切换被触发后,failover 并不会马上进行,还需要 Sentinel 中的大多数 Sentinel 授权后才可以进行 failover,即进行 failover 的 Sentinel 会去获得指定 quorum 个的 Sentinel 的授权,成功后进入 ODOWN 状态。如在 5 个 Sentinel 中配置了 2 个 quorum,等到 2 个 Sentinel 认为 master 死了就执行 failover。
6. Sentinel 向选为 master 的 slave 发送 SLAVEOF NO ONE 命令,选择 slave 的条件是 Sentinel 首先会根据 slaves 的优先级来进行排序,优先级越小排名越靠前。如果优先级相同,则查看复制的下标,哪个从 master 接收的复制数据多,哪个就靠前。如果优先级和下标都相同,就选择进程 ID 较小的。
7. Sentinel 被授权后,它将会获得宕掉的 master 的一份最新配置版本号 (config-epoch),当 failover 执行结束以后,这个版本号将会被用于最新的配置,通过广播形式通知其它 Sentinel,其它的 Sentinel 则更新对应 master 的配置。

●主观下线(Subjectively Down, 简称 SDOWN)指的是单个 Sentinel 实例对服务器做出的下线判断。
●客观下线(Objectively Down, 简称 ODOWN)指的是多个 Sentinel 实例在对同一个服务器做出 SDOWN 判断, 并且通过 SENTINEL is-master-down-by-addr 命令互相交流之后, 得出的服务器下线判断。

注,客观下线条件只适用于主服务器。
1 到 3 是自动发现机制:
以 10 秒一次的频率,向被监视的 master 发送 info 命令,根据回复获取 master 当前信息。
以 1 秒一次的频率,向所有 redis 服务器、包含 Sentinel 在内发送 PING 命令,通过回复判断服务器是否在线。
以 2 秒一次的频率,通过向所有被监视的 master,slave 服务器发送当前 Sentinel master 信息的消息。
4 是检测机制,5 和 6 是 failover 机制,7 是更新配置机制。

#该行的意思是sentinel工作在62379端口,这个是默认sentinel端口,可以修改。
port 26379
#该行的意思是:监控的master的名字叫做mymaster(自定义),地址为127.0.0.1:6379,行尾最后的一个2代表在sentinel集群中,多少个sentinel认为masters死了,才能真正认为该master不可用了。
sentinel monitor mymaster 192.168.56.108 6379 2
#该行的意思是sentinel会向master发送心跳PING来确认master是否存活,如果master在“一定时间范围”内不回应PONG 或者是回复了一个错误消息,那么这个sentinel会主观地(单方面地)认为这个master已经不可用了(subjectively down, 也简称为SDOWN)。而这个down-after-milliseconds就是用来指定这个“一定时间范围”的,单位是毫秒,默认30秒。
sentinel down-after-milliseconds mymaster 60000
#该行的意思是sentinel会向master发送心跳PING来确认master是否存活,如果master在“一定时间范围”内不回应PONG 或者是回复了一个错误消息,那么这个sentinel会主观地(单方面地)认为这个master已经不可用了(subjectively down, 也简称为SDOWN)。而这个down-after-milliseconds就是用来指定这个“一定时间范围”的,单位是毫秒,默认30秒。
sentinel failover-timeout mymaster 180000
#该行的意思是,在发生failover主备切换时,这个选项指定了最多可以有多少个slave同时对新的master进行同步,这个数字越小,完成failover所需的时间就越长,但是如果这个数字越大,就意味着越多的slave因为replication而不可用。可以通过将这个值设为 1 来保证每次只有一个slave处于不能处理命令请求的状态。
sentinel parallel-syncs mymaster 1
sentinel auth-pass mymaster oracleblog
## Following parameter add by Jimmy
daemonize yes
logfile /var/log/redis_sentinel_6379.log
pidfile /var/run/redis_sentinel_6379.pid
# bind 127.0.0.1 注意,不能加bind,因为sentinel之间会互相通信,需要做仲裁
dbfilename dump_sentinel.rdb
#注意每个sentinel 的myid应该都不一样,否则会互相忽略对方的存在
sentinel myid be167fc5c77a14ef53996d367e237d3cc33a53b6
#注意需要rename-command还原,因为sentinel会使用这些命令,不然会造成虽然可以识别节点故障,但是无法实现切换。
#rename-command FLUSHDB "FLUSHDB_ORACLE_MASK"
#rename-command FLUSHALL "FLUSHALL_ORACLE_MASK"
#rename-command CONFIG "CONFIG_ORACLE_MASK

注:sentinel不建议是单个,因为:
1:即使有一些sentinel进程宕掉了,依然可以进行redis集群的主备切换;
2:如果只有一个sentinel进程,如果这个进程运行出错,或者是网络堵塞,那么将无法实现redis集群的主备切换(单点问题);
3:如果有多个sentinel,redis的客户端可以随意地连接任意一个sentinel来获得关于redis集群中的信息

实施步骤:
我们来配一个1主2从的redis。分别是在3台主机,3个端口上。

主:192.168.56.108 port 6379 --> 从1:192.168.56.109 port 6380
       |
       |
       L--> 从2:192.168.56.110 port 6381

1. cp /root/redis_install/redis-3.2.9/sentinel.conf /etc/redis/sentinel_6379.conf
2. 按照上面说的配置项进行修改
3. redis-sentinel /etc/redis/sentinel_6379.conf
4. 在其他从节点重复上面的步骤
5. 在主节点redis-cli -p 26379 ping,正常返回pong,在从1节点redis -p 26380 ping,正常返回pong;在从2节点redis-cli -p 26381 ping,正常会返回pong
6. 在从1节点redis-cli -p 26380 sentinel masters,显示如下,注意34行和35行,显示了有2个slave和察觉到了有2个sentinel:

127.0.0.1:26379> sentinel masters
1)  1) "name"
    2) "mymaster"
    3) "ip"
    4) "192.168.56.108"
    5) "port"
    6) "6379"
    7) "runid"
    8) "357e8a88b9ae5aeb325f16239aac20ec965ae167"
    9) "flags"
   10) "master"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "970"
   19) "last-ping-reply"
   20) "970"
   21) "down-after-milliseconds"
   22) "30000"
   23) "info-refresh"
   24) "9971"
   25) "role-reported"
   26) "master"
   27) "role-reported-time"
   28) "713185"
   29) "config-epoch"
   30) "0"
   31) "num-slaves"
   32) "2"
   33) "num-other-sentinels"
   34) "2"
   35) "quorum"
   36) "2"
   37) "failover-timeout"
   38) "180000"
   39) "parallel-syncs"
   40) "1"
127.0.0.1:26379>

7. 在从1节点运行sentinel slaves mymaster:

[root@redis02 ~]# redis-cli -p 26380 sentinel slaves mymaster
1)  1) "name"
    2) "192.168.56.110:6381"
    3) "ip"
    4) "192.168.56.110"
    5) "port"
    6) "6381"
    7) "runid"
    8) "aaa02dfaf12a59bba2b84c0b2b1bc6fb13ae76e2"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "373"
   19) "last-ping-reply"
   20) "373"
   21) "down-after-milliseconds"
   22) "30000"
   23) "info-refresh"
   24) "8707"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "500482"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "192.168.56.108"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "274239"
2)  1) "name"
    2) "192.168.56.109:6380"
    3) "ip"
    4) "192.168.56.109"
    5) "port"
    6) "6380"
    7) "runid"
    8) "860e99210956e311ad522e7f5c582e7ba50d09bd"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "373"
   19) "last-ping-reply"
   20) "373"
   21) "down-after-milliseconds"
   22) "30000"
   23) "info-refresh"
   24) "8707"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "500482"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "192.168.56.108"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "274239"
[root@redis02 ~]#

8.在从1运行sentinel get-master-addr-by-name :

[root@redis02 ~]# redis-cli -p 26380 sentinel get-master-addr-by-name mymaster
1) "192.168.56.108"
2) "6379"
[root@redis02 ~]#

9. kill掉redis进程,或者运行sentinel failover命令:

在节点1的log中,可以看到:

2824:X 27 Jun 05:58:05.551 # +new-epoch 17
2824:X 27 Jun 05:58:05.551 # +config-update-from sentinel be167fc5c77a14ef53996d367e237d3cc33a53b6 192.168.56.109 26380 @ mymaster 192.168.56.108 6379
2824:X 27 Jun 05:58:05.551 # +switch-master mymaster 192.168.56.108 6379 192.168.56.109 6380
2824:X 27 Jun 05:58:05.551 * +slave slave 192.168.56.110:6381 192.168.56.110 6381 @ mymaster 192.168.56.109 6380
2824:X 27 Jun 05:58:05.551 * +slave slave 192.168.56.108:6379 192.168.56.108 6379 @ mymaster 192.168.56.109 6380
2824:X 27 Jun 05:58:15.594 * +convert-to-slave slave 192.168.56.108:6379 192.168.56.108 6379 @ mymaster 192.168.56.109 6380

在节点2的log中可以看到:

3047:X 27 Jun 05:58:05.272 # Executing user requested FAILOVER of 'mymaster'
3047:X 27 Jun 05:58:05.273 # +new-epoch 17
3047:X 27 Jun 05:58:05.273 # +try-failover master mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:05.285 # +vote-for-leader be167fc5c77a14ef53996d367e237d3cc33a53b6 17
3047:X 27 Jun 05:58:05.285 # +elected-leader master mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:05.285 # +failover-state-select-slave master mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:05.362 # +selected-slave slave 192.168.56.109:6380 192.168.56.109 6380 @ mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:05.362 * +failover-state-send-slaveof-noone slave 192.168.56.109:6380 192.168.56.109 6380 @ mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:05.424 * +failover-state-wait-promotion slave 192.168.56.109:6380 192.168.56.109 6380 @ mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:06.316 # +promoted-slave slave 192.168.56.109:6380 192.168.56.109 6380 @ mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:06.316 # +failover-state-reconf-slaves master mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:06.363 * +slave-reconf-sent slave 192.168.56.110:6381 192.168.56.110 6381 @ mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:07.026 * +slave-reconf-inprog slave 192.168.56.110:6381 192.168.56.110 6381 @ mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:08.050 * +slave-reconf-done slave 192.168.56.110:6381 192.168.56.110 6381 @ mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:08.117 # +failover-end master mymaster 192.168.56.108 6379
3047:X 27 Jun 05:58:08.117 # +switch-master mymaster 192.168.56.108 6379 192.168.56.109 6380
3047:X 27 Jun 05:58:08.117 * +slave slave 192.168.56.110:6381 192.168.56.110 6381 @ mymaster 192.168.56.109 6380
3047:X 27 Jun 05:58:08.117 * +slave slave 192.168.56.108:6379 192.168.56.108 6379 @ mymaster 192.168.56.109 6380

在节点3的log中,可以看到:

3018:X 27 Jun 05:58:06.395 # +new-epoch 17
3018:X 27 Jun 05:58:06.395 # +config-update-from sentinel be167fc5c77a14ef53996d367e237d3cc33a53b6 192.168.56.109 26380 @ mymaster 192.168.56.108 6379
3018:X 27 Jun 05:58:06.395 # +switch-master mymaster 192.168.56.108 6379 192.168.56.109 6380
3018:X 27 Jun 05:58:06.396 * +slave slave 192.168.56.110:6381 192.168.56.110 6381 @ mymaster 192.168.56.109 6380
3018:X 27 Jun 05:58:06.396 * +slave slave 192.168.56.108:6379 192.168.56.108 6379 @ mymaster 192.168.56.109 6380

三、Redis的分片和集群高可用架构

redis的高可用+分片技术,是通过redis cluster来实现的。一般情况下,如果单个或者主从结构,撑不住业务的需求,如单核CPU撑爆,或者内存使用过多,我们一般会将redis拆成多个分片。

1. Redis的分片技术,可以分成
1.1 redis原生分片:


1.2 proxy分片:


1.3 应用程序分片:

我们这里聊的是redis的原生分片,即redis cluster。它可以实现多主多从的架构,多从是为了给主在down掉的时候,实现切换。这个切换在cluster内提供,不需要在额外的使用sentinel。注意:redis集群需要至少6个节点,也就是六台服务器。如果服务器数量不足可在每台服务器上建立多个节点,如2台服务器,每台服务器上建立3个节点。

另外,由于redis cluster官方自带的redis-trib.rb工具不支持密码,因此在配置完成前,不能加密码。

Redis集群不能保证强一致性。产生写操作丢失的第一个原因,是因为主从节点之间使用了异步的方式来同步数据。

一个最小的集群需要最少3个主节点。建议配置至少6个节点:3个主节点和3个从节点。

2. 安装redis cluster:
redis cluster的管理工具是redis-trib。
2.1. 要运行redis-trib要先安装ruby运行环境:
yum -y install ruby

2.2. 接下来安装ruby gems,用它来查找、安装、升级和卸载ruby软件包:
yum -y install rubygems

2.3. 然后通过gem来安装ruby的redis客户端
gem install redis

这一步有可能会失败,大多是因为国内连不上gem官方库,那只能修改gem库为国内的源,如淘宝网的RubyGems镜像:
下面是换源操作:

# gem source -l
# gem source --remove http://rubygems.org/
# gem sources -a http://ruby.taobao.org/
# gem source -l

# gem install redis

2.4. 修改各个主机上的redis.conf文件,添加cluster选项:

集群配置:
cluster-enabled <yes/no>: 如果配置”yes”则开启集群功能,此redis实例作为集群的一个节点,否则,它是一个普通的单一的redis实例。
cluster-config-file : 注意:虽然此配置的名字叫“集群配置文件”,但是此配置文件不能人工编辑,它是集群节点自动维护的文件,主要用于记录集群中有哪些节点、他们的状态以及一些持久化参数等,方便在重启时恢复这些状态。通常是在收到请求之后这个文件就会被更新。
cluster-node-timeout : 这是集群中的节点能够失联的最大时间,超过这个时间,该节点就会被认为故障。如果主节点超过这个时间还是不可达,则用它的从节点将启动故障迁移,升级成主节点。注意,任何一个节点在这个时间之内如果还是没有连上大部分的主节点,则此节点将停止接收任何请求。
cluster-slave-validity-factor : 如果设置成0,则无论从节点与主节点失联多久,从节点都会尝试升级成主节点。如果设置成正数,则cluster-node-timeout乘以cluster-slave-validity-factor得到的时间,是从节点与主节点失联后,此从节点数据有效的最长时间,超过这个时间,从节点不会启动故障迁移。假设cluster-node-timeout=5,cluster-slave-validity-factor=10,则如果从节点跟主节点失联超过50秒,此从节点不能成为主节点。注意,如果此参数配置为非0,将可能出现由于某主节点失联却没有从节点能顶上的情况,从而导致集群不能正常工作,在这种情况下,只有等到原来的主节点重新回归到集群,集群才恢复运作。
cluster-migration-barrier :主节点需要的最小从节点数,只有达到这个数,主节点失败时,它从节点才会进行迁移。更详细介绍可以看本教程后面关于副本迁移到部分。
cluster-require-full-coverage <yes/no>:在部分key所在的节点不可用时,如果此参数设置为”yes”(默认值), 则整个集群停止接受操作;如果此参数设置为”no”,则集群依然为可达节点上的key提供读操作。

cluster-enabled yes
##注意各个主机的这个文件是不同的
cluster-config-file nodes-6379.conf
##表示超时5000毫秒,cluster就认为该节点下线,在大规模集群中(近1000个redis实例)集群间通信占用大量带宽资源,调整cluster-node-timeout 参数能有效降低带宽。
cluster-node-timeout 5000
##修改成允许所有网络连接
#bind 127.0.0.1
##取消protect mode
protected-mode no
##取消密码:
#masterauth oracleblog
#requirepass "oracleblog"

2.5. 我们分别在3个主机上启动6个实例:
192.168.56.108 : redis_6379.conf + redis_6389.conf
192.168.56.109 : redis_6380.conf + redis_6381.conf
192.168.56.110 : redis_6381.conf + redis_6391.conf

2.6. 创建cluster:

cp -p /root/redis_install/redis-3.2.9/src/redis-trib.rb /usr/local/bin/
[root@redis01 6379]# cd /usr/local/bin/
[root@redis01 bin]#
[root@redis01 bin]# ./redis-trib.rb create --replicas 1 192.168.56.108:6379 192.168.56.109:6380 192.168.56.110:6381 192.168.56.108:6389 192.168.56.109:6390 192.168.56.110:6391
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
192.168.56.108:6379
192.168.56.109:6380
192.168.56.110:6381
Adding replica 192.168.56.109:6390 to 192.168.56.108:6379
Adding replica 192.168.56.108:6389 to 192.168.56.109:6380
Adding replica 192.168.56.110:6391 to 192.168.56.110:6381
M: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots:0-5460 (5461 slots) master
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
S: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   replicates 50df897b5bad63a525a8d46998b30d47698d9cd9
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join...
>>> Performing Cluster Check (using node 192.168.56.108:6379)
M: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots: (0 slots) slave
   replicates 50df897b5bad63a525a8d46998b30d47698d9cd9
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@redis01 bin]#

replicas表示需要有几个slave–replicas 1 表示 自动为每一个master节点分配一个slave节点 上面有6个节点,程序会按照一定规则生成 3个master(主),3个slave(从) 。

注,如果遇到ERR Slot xxxx is already busy (Redis::CommandError)的报错,就按照下面的方法解决:

1. 删除所有node_xxxx.conf文件
2. redis-cli -p xxxx flushall
3.redis-cli -p xxxx cluster reset soft

我们来插入数据:
1. 先检查一下哪个是master:

[root@redis01 bin]# ./redis-trib.rb check 192.168.56.108:6379
>>> Performing Cluster Check (using node 192.168.56.108:6379)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@redis01 bin]#

或者下面的命令也可以:

[root@redis01 bin]# redis-cli -p 6389 cluster nodes
9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390 master - 0 1498756230880 7 connected 0-5460
00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380 master - 0 1498756231884 2 connected 5461-10922
50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379 slave 9e112eed7f8a5830e907d97792c50a2171d9f13b 0 1498756231381 7 connected
632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381 master - 0 1498756230377 3 connected 10923-16383
e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389 myself,slave 00974c9c1acede227f1ef25fd56460a1a19818a0 0 0 4 connected
3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391 slave 632f31d57d6fcbf48e277ed8cd34299188d2c675 0 1498756231381 6 connected
[root@redis01 bin]#

2. 我们登录192.168.56.109:6390进行操作:

[root@redis01 bin]# redis-cli -h 192.168.56.109 -p 6390 -c
192.168.56.109:6390> keys *
(empty list or set)
192.168.56.109:6390> set name myname1
-> Redirected to slot [5798] located at 192.168.56.109:6380
OK
192.168.56.109:6380>

注意,这里的redis-cli要用-c参数。不然会报错:

192.168.56.108:6379> set name myname1
(error) MOVED 5798 192.168.56.109:6380
192.168.56.108:6379>

3. 我们来尝试添加节点:
3.1. 先启2个redis实例,实例参数参考原来已经在跑的实例。
3.2. 添加一个实例到cluster,注意,这个是作为master的节点加进去的。

[root@redis01 bin]# pwd
/usr/local/bin
[root@redis01 bin]# ./redis-trib.rb add-node 192.168.56.108:6399 192.168.56.108:6379
>>> Adding node 192.168.56.108:6399 to cluster 192.168.56.108:6379
>>> Performing Cluster Check (using node 192.168.56.108:6379)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.56.108:6399 to make it join the cluster.
[OK] New node added correctly.
[root@redis01 bin]#

[root@redis01 bin]#   redis-cli -p 6389 cluster nodes |grep 6399
df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399 master - 0 1498814734265 0 connected
[root@redis01 bin]#

注意这里的df5bf5d030453acddd4db106fda76a1d1687a22f ,我们一会会用到。

添加从节点,注意我们这里用到了刚刚的主节点的mast id

[root@redis01 bin]# ./redis-trib.rb add-node --slave --master-id  df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.109:6370 192.168.56.108:6399
>>> Adding node 192.168.56.109:6370 to cluster 192.168.56.108:6399
>>> Performing Cluster Check (using node 192.168.56.108:6399)
M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots: (0 slots) master
   0 additional replica(s)
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.56.109:6370 to make it join the cluster.
Waiting for the cluster to join...
>>> Configure node as replica of 192.168.56.108:6399.
[OK] New node added correctly.
[root@redis01 bin]#

4. 添加完之后,数据并没有重新分布,我们需要reshard。

重新分片命令:
交互式:
/redis-trib.rb reshard [host]:[port]
非交互式:
./redis-trib.rb reshard –from [node-id] –to [node-id] –slots [number of slots] –yes [host]:[port]

注意看下面那些master中0 slot的部分:

[root@redis01 bin]# ./redis-trib.rb check 192.168.56.108:6379
>>> Performing Cluster Check (using node 192.168.56.108:6379)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots: (0 slots) master
   1 additional replica(s)
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
S: a724660e17bf5dbfd7266f33ed37d5eb952dd3d0 192.168.56.109:6370
   slots: (0 slots) slave
   replicates df5bf5d030453acddd4db106fda76a1d1687a22f
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@redis01 bin]#

[root@redis01 bin]# ./redis-trib.rb reshard 192.168.56.108:6399
>>> Performing Cluster Check (using node 192.168.56.108:6399)
M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots: (0 slots) master
   1 additional replica(s)
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: a724660e17bf5dbfd7266f33ed37d5eb952dd3d0 192.168.56.109:6370
   slots: (0 slots) slave
   replicates df5bf5d030453acddd4db106fda76a1d1687a22f
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? df5bf5d030453acddd4db106fda76a1d1687a22f
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:all

Ready to move 4096 slots.
  Source nodes:
    M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
    M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:0-5460 (5461 slots) master
   1 additional replica(s)
    M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
  Destination node:
    M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots: (0 slots) master
   1 additional replica(s)
  Resharding plan:
    Moving slot 5461 from 00974c9c1acede227f1ef25fd56460a1a19818a0
    Moving slot 5462 from 00974c9c1acede227f1ef25fd56460a1a19818a0
    Moving slot 5463 from 00974c9c1acede227f1ef25fd56460a1a19818a0
    Moving slot 5464 from 00974c9c1acede227f1ef25fd56460a1a19818a0
    Moving slot 5465 from 00974c9c1acede227f1ef25fd56460a1a19818a0
    Moving slot 5466 from 00974c9c1acede227f1ef25fd56460a1a19818a0
    ……
    Moving slot 12285 from 192.168.56.110:6381 to 192.168.56.108:6399:
    Moving slot 12286 from 192.168.56.110:6381 to 192.168.56.108:6399:
    Moving slot 12287 from 192.168.56.110:6381 to 192.168.56.108:6399:
[root@redis01 bin]#

再次检查reshard之后的情况,可以看到每个master基本都分到了4096个slot。(因为总共16384 个slot,现在有4个master,如果平均分配,那么每个4096个slots。)

[root@redis01 bin]# ./redis-trib.rb check 192.168.56.108:6379
>>> Performing Cluster Check (using node 192.168.56.108:6379)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots:0-1364,5461-6826,10923-12287 (4096 slots) master
   1 additional replica(s)
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:12288-16383 (4096 slots) master
   1 additional replica(s)
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:1365-5460 (4096 slots) master
   1 additional replica(s)
S: a724660e17bf5dbfd7266f33ed37d5eb952dd3d0 192.168.56.109:6370
   slots: (0 slots) slave
   replicates df5bf5d030453acddd4db106fda76a1d1687a22f
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@redis01 bin]#

5. 删除节点。步骤相反:删除从节点,reshard数据去不用删除的节点,删除主节点。
删除从节点:

[root@redis01 bin]# ./redis-trib.rb del-node 192.168.56.109:6370 'a724660e17bf5dbfd7266f33ed37d5eb952dd3d0'
>>> Removing node a724660e17bf5dbfd7266f33ed37d5eb952dd3d0 from cluster 192.168.56.109:6370
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
[root@redis01 bin]#
[root@redis01 bin]#

reshard数据:

[root@redis01 bin]# ./redis-trib.rb check 192.168.56.108:6379
>>> Performing Cluster Check (using node 192.168.56.108:6379)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots:0-1364,5461-6826,10923-12287 (4096 slots) master
   0 additional replica(s)
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:12288-16383 (4096 slots) master
   1 additional replica(s)
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:1365-5460 (4096 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@redis01 bin]#
[root@redis01 bin]#
[root@redis01 bin]# ./redis-trib.rb reshard 192.168.56.108:6399
>>> Performing Cluster Check (using node 192.168.56.108:6399)
M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots:0-1364,5461-6826,10923-12287 (4096 slots) master
   0 additional replica(s)
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:12288-16383 (4096 slots) master
   1 additional replica(s)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:1365-5460 (4096 slots) master
   1 additional replica(s)
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? 9e112eed7f8a5830e907d97792c50a2171d9f13b //接受者的master id
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:df5bf5d030453acddd4db106fda76a1d1687a22f //需要被删除的master id
Source node #2:done

检查已经是0 slot

[root@redis01 bin]# ./redis-trib.rb check 192.168.56.108:6379
>>> Performing Cluster Check (using node 192.168.56.108:6379)
S: 50df897b5bad63a525a8d46998b30d47698d9cd9 192.168.56.108:6379
   slots: (0 slots) slave
   replicates 9e112eed7f8a5830e907d97792c50a2171d9f13b
M: df5bf5d030453acddd4db106fda76a1d1687a22f 192.168.56.108:6399
   slots: (0 slots) master
   0 additional replica(s)
S: e73dc8caf474076fdbeb4da346333bc8410c8486 192.168.56.108:6389
   slots: (0 slots) slave
   replicates 00974c9c1acede227f1ef25fd56460a1a19818a0
M: 00974c9c1acede227f1ef25fd56460a1a19818a0 192.168.56.109:6380
   slots:8192-10922 (2731 slots) master
   1 additional replica(s)
S: 3005c5adea38cc21cf47fb86cbe1e8cfd1cbfce7 192.168.56.110:6391
   slots: (0 slots) slave
   replicates 632f31d57d6fcbf48e277ed8cd34299188d2c675
M: 632f31d57d6fcbf48e277ed8cd34299188d2c675 192.168.56.110:6381
   slots:13654-16383 (2730 slots) master
   1 additional replica(s)
M: 9e112eed7f8a5830e907d97792c50a2171d9f13b 192.168.56.109:6390
   slots:0-8191,10923-13653 (10923 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
[root@redis01 bin]#

[root@redis01 bin]#
[root@redis01 bin]# ./redis-trib.rb del-node 192.168.56.108:6399 'df5bf5d030453acddd4db106fda76a1d1687a22f'
>>> Removing node df5bf5d030453acddd4db106fda76a1d1687a22f from cluster 192.168.56.108:6399
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
[root@redis01 bin]#

删除确认没有slot的主节点

[root@redis01 bin]#
[root@redis01 bin]# ./redis-trib.rb del-node 192.168.56.108:6399 'df5bf5d030453acddd4db106fda76a1d1687a22f'
>>> Removing node df5bf5d030453acddd4db106fda76a1d1687a22f from cluster 192.168.56.108:6399
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
[root@redis01 bin]#

四. Redis的监控:

Redis的监控,主要还是从info命令的返回结果看。

1. 内存使用
如果 Redis 使用的内存超出了可用的物理内存大小,那么 Redis 很可能系统会被 OOM Killer 杀掉。针对这一点,你可以通过 info 命令对 used_memory 和 used_memory_peak 进行监控,为使用内存量设定阈值,并设定相应的报警机制。当然,报警只是手段,重要的是你得预先计划好,当内存使用量过大后,你应该做些什么,是清除一些没用的冷数据,还是把 Redis 迁移到更强大的机器上去。

# Memory
used_memory:822504
used_memory_human:803.23K
used_memory_rss:3960832
used_memory_rss_human:3.78M
used_memory_peak:822504
used_memory_peak_human:803.23K
total_system_memory:16803835904
total_system_memory_human:15.65G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:4.82
mem_allocator:jemalloc-4.0.3

2. 持久化
如果因为你的机器或 Redis 本身的问题导致 Redis 崩溃了,那么你唯一的救命稻草可能就是 dump 出来的 rdb文件了,所以,对 Redis dump 文件进行监控也是很重要的。你可以通过对 rdb_last_save_time 进行监控,了解你最近一次 dump 数据操作的时间,还可以通过对 rdb_changes_since_last_save 进行监控来知道如果这时候出现故障,你会丢失多少数据。

# Persistence
loading:0
rdb_changes_since_last_save:35
rdb_bgsave_in_progress:0
rdb_last_save_time:1498833577
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:1621
aof_base_size:1621
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

3. 主从复制
如果你设置了主从复制模式,那么你最好对复制的情况是否正常做一些监控,主要是对 info 输出中的 master_link_status 进行监控,如果这个值是 up,那么说明同步正常,如果是 down,那么你就要注意一下输出的其它一些诊断信息了。

# Replication
role:slave
master_host:192.168.56.109
master_port:6380
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:3011
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

4. Fork 性能
当 Redis 持久化数据到磁盘上时,它会进行一次 fork 操作,通过 fork 对内存的 copy on write 机制最廉价的实现内存镜像。但是虽然内存是 copy on write 的,但是虚拟内存表是在 fork 的瞬间就需要分配,所以 fork 会造成主线程短时间的卡顿(停止所有读写操作),这个卡顿时间和当前 Redis 的内存使用量有关。通常 GB 量级的 Redis 进行 fork 操作的时间在毫秒级。你可以通过对 info 输出的 latest_fork_usec 进行监控来了解最近一次 fork 操作导致了多少时间的卡顿。

# Stats
total_connections_received:1
total_commands_processed:16
instantaneous_ops_per_sec:0
total_net_input_bytes:477
total_net_output_bytes:6000613
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:206
migrate_cached_sockets:0

5. 配置一致
Redis 支持使用 CONFIG SET 操作来实现运行实的配置修改,这很方便,但同时也会导致一个问题。就是通过这个命令动态修改的配置,是不会同步到你的配置文件中去的。所以当你因为某些原因重启 Redis 时,你使用 CONFIG SET 做的配置修改就会丢失掉,所以我们最好保证在每次使用 CONFIG SET 修改配置时,也把配置文件一起相应地改掉。为了防止人为的失误,所以我们最好对配置进行监控,使用 CONFIG GET 命令来获取当前运行时的配置,并与 redis.conf 中的配置值进行对比,如果发现两边对不上,就启动报警。

6. 监控服务
-Sentinel
Sentinel 是 Redis 自带的工具,它可以对 Redis 主从复制进行监控,并实现主挂掉之后的自动故障转移。在转移的过程中,它还可以被配置去执行一个用户自定义的脚本,在脚本中我们就能够实现报警通知等功能

-Redis Live
Redis Live 是一个更通用的 Redis 监控方案,它的原理是定时在 Redis 上执行 MONITOR 命令,来获取当前 Redis 当前正在执行的命令,并通过统计分析,生成web页面的可视化分析报表。

7. 数据分布
弄清 Redis 中数据存储分布是一件很难的是,比如你想知道哪类型的 key 值占用内存最多。下面是一些工具,可以帮助你对 Redis 的数据集进行分析。

-Redis-sampler
Redis-sampler 是 Redis 作者开发的工具,它通过采样的方法,能够让你了解到当前Redis 中的数据的大致类型,数据及分布状况。

-Redis-audit
Redis-audit 是一个脚本,通过它,我们可以知道每一类 key 对内存的使用量。它可以提供的数据有:某一类 key 值的访问频率如何,有多少值设置了过期时间,某一类 key 值使用内存的大小,这很方便让我们能排查哪些 key 不常用或者压根不用。

-Redis-rdb-tools
Redis-rdb-tools 跟 Redis-audit 功能类似,不同的是它是通过对 rdb 文件进行分析来取得统计数据的。

五、Redis的Docker化:

1.docker上安装redis
先search一下有哪些redis:

LoveHousedeiMac:~ lovehouse$ docker search redis
NAME                      DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
redis                     Redis is an open source key-value store th...   3866      [OK]
sameersbn/redis                                                           54                   [OK]
bitnami/redis             Bitnami Redis Docker Image                      50                   [OK]
torusware/speedus-redis   Always updated official Redis docker image...   32                   [OK]
webhippie/redis           Docker images for redis                         7                    [OK]
anapsix/redis             11MB Redis server image over AlpineLinux        6                    [OK]
williamyeh/redis          Redis image for Docker                          3                    [OK]
clue/redis-benchmark      A minimal docker image to ease running the...   3                    [OK]
unblibraries/redis        Leverages phusion/baseimage to deploy a ba...   2                    [OK]
abzcoding/tomcat-redis    a tomcat container with redis as session m...   2                    [OK]
miko2u/redis              Redis                                           1                    [OK]
greytip/redis             redis 3.0.3                                     1                    [OK]
frodenas/redis            A Docker Image for Redis                        1                    [OK]
xataz/redis               Light redis image                               1                    [OK]
nanobox/redis             Redis service for nanobox.io                    0                    [OK]
maestrano/redis           Redis is an open source key-value store th...   0                    [OK]
cloudposse/redis          Standalone redis service                        0                    [OK]
watsco/redis              Watsco redis base                               0                    [OK]
appelgriebsch/redis       Configurable redis container based on Alpi...   0                    [OK]
maxird/redis              Redis                                           0                    [OK]
trelllis/redis            Redis Primary                                   0                    [OK]
drupaldocker/redis        Redis for Drupal                                0                    [OK]
yfix/redis                Yfix docker redis                               0                    [OK]
higebu/redis-commander    Redis Commander Docker image. https://gith...   0                    [OK]
continuouspipe/redis      Redis                                           0                    [OK]
LoveHousedeiMac:~ lovehouse$

开始pull镜像:

LoveHousedeiMac:~ lovehouse$ docker pull redis:latest
latest: Pulling from library/redis
23e3d0773492: Pull complete
bc8f870e2eab: Pull complete
9fb63685a3db: Pull complete
7d5f2d3e9188: Pull complete
4b386c0238f4: Pull complete
33c08d492082: Pull complete
Digest: sha256:6022356f9d729c858000fc10fc1b09d1624ba099227a0c5d314f7461c2fe6020
Status: Downloaded newer image for redis:latest
LoveHousedeiMac:~ lovehouse$

建议pull的时候,选择一个比较好的fuckgfw网络,不然总是会报错:

error pulling image configuration: Get https://dseasb33srnrn.cloudfront.net/registry-v2/docker/registry/v2/blobs/sha256/83/83744227b191fbc32e3bcb293c1b90ecdb86b3636d02b1a0db009effb3a5b8de/data?Expires=1497887535&Signature=aVLHPVuNv4zjReHDu8ZLum23CgZrSJkmU1~WZzy1mOQdcYu1gVvepxZeV4j44DCCfvM56VCewGzl7FFdNxev4Mtm~KpmKJjHFNQtavJNmu1nqx4MEhdjJNKWX8KNeFuL-euTU7hCwVzrzUs8OIeGO3RKhiva7w0KIFc7ql-xHC8_&Key-Pair-Id=APKAJECH5M7VWIS5YZ6Q: net/http: TLS handshake timeout

安装并启动redis,且设置appendonly为yes,注意我们这里把容器内的/data目录映射到本地目录/Users/[username]/redisdata下,用于做持久化:

LoveHousedeiMac:~ lovehouse$ docker run -p 6379:6379 -v /Users/lovehouse/redisdata:/data  -d redis:latest redis-server --appendonly yes
1fa497550b7e232eee63e050ff5e0f12c530aee992c158138af75b9442c7403f
LoveHousedeiMac:~ lovehouse$
LoveHousedeiMac:~ lovehouse$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
1fa497550b7e        redis:latest        "docker-entrypoint..."   5 seconds ago       Up 6 seconds        0.0.0.0:6379->6379/tcp   pensive_sinoussi
LoveHousedeiMac:~ lovehouse$
LoveHousedeiMac:~ lovehouse$ docker ps -a
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS                        PORTS                    NAMES
1fa497550b7e        redis:latest                  "docker-entrypoint..."   15 seconds ago      Up 16 seconds                 0.0.0.0:6379->6379/tcp   pensive_sinoussi
c9f09116cc83        oracle/database:12.2.0.1-ee   "/bin/sh -c 'exec ..."   4 weeks ago         Exited (137) 42 minutes ago                            oracle
LoveHousedeiMac:~ lovehouse$

登录redis主机后运行redis-cli:

LoveHousedeiMac:~ lovehouse$ docker exec -it 1fa497550b7e /bin/bash
root@1fa497550b7e:/data#
root@1fa497550b7e:/data#
root@1fa497550b7e:/data# redis-cli
127.0.0.1:6379>
127.0.0.1:6379>
127.0.0.1:6379> info
# Server
redis_version:3.2.9
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:d837dd4aae3a6933
redis_mode:standalone
os:Linux 4.9.27-moby x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.9.2
process_id:1
run_id:a490709296a4a15606af8650b4ae2eb922de81ff
tcp_port:6379
uptime_in_seconds:135
uptime_in_days:0
hz:10
lru_clock:4716607
executable:/data/redis-server
config_file:

# Clients
connected_clients:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:822232
used_memory_human:802.96K
used_memory_rss:4005888
used_memory_rss_human:3.82M
used_memory_peak:822232
used_memory_peak_human:802.96K
total_system_memory:16803835904
total_system_memory_human:15.65G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:4.87
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1497888696
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:0
aof_base_size:0
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:1
total_commands_processed:1
instantaneous_ops_per_sec:0
total_net_input_bytes:31
total_net_output_bytes:6005118
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.09
used_cpu_user:0.03
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Cluster
cluster_enabled:0

# Keyspace
127.0.0.1:6379>

或者运行docker run -it redis:latest redis-cli也可以:

LoveHousedeiMac:~ lovehouse$ docker run -it redis:latest redis-cli -h 192.168.1.207
192.168.1.207:6379> info
# Server
redis_version:3.2.9
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:d837dd4aae3a6933
redis_mode:standalone
os:Linux 4.9.27-moby x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.9.2
process_id:1
run_id:a490709296a4a15606af8650b4ae2eb922de81ff
tcp_port:6379
uptime_in_seconds:262
uptime_in_days:0
hz:10
lru_clock:4716734
executable:/data/redis-server
config_file:

# Clients
connected_clients:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:822232
used_memory_human:802.96K
used_memory_rss:4005888
used_memory_rss_human:3.82M
used_memory_peak:822232
used_memory_peak_human:802.96K
total_system_memory:16803835904
total_system_memory_human:15.65G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:4.87
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1497888696
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:0
aof_base_size:0
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:2
total_commands_processed:5
instantaneous_ops_per_sec:0
total_net_input_bytes:122
total_net_output_bytes:11980083
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.16
used_cpu_user:0.05
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Cluster
cluster_enabled:0

# Keyspace
192.168.1.207:6379>

2.备份,迁移和克隆docker镜像:
2.1 检查原有信息:

LoveHousedeiMac:~ lovehouse$ docker ps -a
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS                     PORTS               NAMES
f98eeeebda7e        redis:latest                  "docker-entrypoint..."   2 weeks ago         Exited (137) 2 weeks ago                       quizzical_torvalds
1a6e061b7233        redis:latest                  "docker-entrypoint..."   2 weeks ago         Exited (0) 2 weeks ago                         hungry_spence
1fa497550b7e        redis:latest                  "docker-entrypoint..."   2 weeks ago         Exited (0) 3 days ago                          pensive_sinoussi
c9f09116cc83        oracle/database:12.2.0.1-ee   "/bin/sh -c 'exec ..."   6 weeks ago         Exited (137) 10 days ago                       oracle
LoveHousedeiMac:~ lovehouse$

2.1 停下container,并将container commit成images:

LoveHousedeiMac:~ lovehouse$ docker stop pensive_sinoussi
pensive_sinoussi
LoveHousedeiMac:~ lovehouse$
LoveHousedeiMac:~ lovehouse$
LoveHousedeiMac:~ lovehouse$ docker commit -p 1fa497550b7e container-backup
sha256:b5dfe58c6528f02c7652f3261e1e60ea45c52aadf3f004d0dbd01acb0236f884
LoveHousedeiMac:~ lovehouse$
LoveHousedeiMac:~ lovehouse$

2.3 检查一下images是否已经建立

LoveHousedeiMac:~ lovehouse$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
container-backup    latest              b5dfe58c6528        9 seconds ago       98.9MB
redis               latest              83744227b191        3 weeks ago         98.9MB
oracle/database     12.2.0.1-ee         4f9df5f46a19        6 weeks ago         14.8GB
oraclelinux         7-slim              442ebf722584        2 months ago        114MB
LoveHousedeiMac:~ lovehouse$

2.4 将container-backup 这个image做成tar文件:

LoveHousedeiMac:idocker lovehouse$ docker save -o ./container-backup.tar container-backup
LoveHousedeiMac:idocker lovehouse$
LoveHousedeiMac:idocker lovehouse$ ls -l
total 200792
-rw-------   1 lovehouse  staff  102801920  7  5 00:34 container-backup.tar
drwxr-xr-x@ 19 lovehouse  staff        646  5 20 20:04 docker-images-master
drwxr-xr-x@  7 lovehouse  staff        238  6  2  2016 docker-redis-cluster-master
LoveHousedeiMac:idocker lovehouse$

2.5 我们这里将备份的东西,load进去,并且成为redis_2

LoveHousedeiMac:~ lovehouse$ cp -pR redisdata redisdata_2
LoveHousedeiMac:~ lovehouse$ cd idocker
LoveHousedeiMac:idocker lovehouse$ ls
container-backup.tar		docker-images-master		docker-redis-cluster-master
LoveHousedeiMac:idocker lovehouse$
LoveHousedeiMac:idocker lovehouse$
LoveHousedeiMac:idocker lovehouse$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
container-backup    latest              b5dfe58c6528        8 minutes ago       98.9MB
redis               latest              83744227b191        3 weeks ago         98.9MB
oracle/database     12.2.0.1-ee         4f9df5f46a19        6 weeks ago         14.8GB
oraclelinux         7-slim              442ebf722584        2 months ago        114MB
LoveHousedeiMac:idocker lovehouse$
LoveHousedeiMac:idocker lovehouse$

2.6 docker run创建第二个redis,注意这里第二个redis的端口映射为26379,不修改的话,会和第一个redis的端口冲突。

LoveHousedeiMac:idocker lovehouse$ docker run --name redis_2 -p 26379:6379 -v /Users/lovehouse/redis_2:/data container-backup:latest
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 3.2.9 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

1:M 04 Jul 16:43:54.942 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 04 Jul 16:43:54.942 # Server started, Redis version 3.2.9
1:M 04 Jul 16:43:54.942 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
1:M 04 Jul 16:43:54.943 * The server is now ready to accept connections on port 6379

2.7 启动第二个redis

LoveHousedeiMac:idocker lovehouse$ docker start redis_2
redis_2
LoveHousedeiMac:idocker lovehouse$

2.8 检查2个redis已经部署好了。

LoveHousedeiMac:idocker lovehouse$ docker ps
CONTAINER ID        IMAGE                     COMMAND                  CREATED              STATUS              PORTS                     NAMES
240382527b36        container-backup:latest   "docker-entrypoint..."   About a minute ago   Up 17 seconds       0.0.0.0:26379->6379/tcp   redis_2
1fa497550b7e        redis:latest              "docker-entrypoint..."   2 weeks ago          Up 3 minutes        0.0.0.0:6379->6379/tcp    pensive_sinoussi
LoveHousedeiMac:idocker lovehouse$
LoveHousedeiMac:idocker lovehouse$ docker run -it redis:latest redis-cli -h 192.168.1.207 -p 26379 info server
# Server
redis_version:3.2.9
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:d837dd4aae3a6933
redis_mode:standalone
os:Linux 4.9.31-moby x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.9.2
process_id:1
run_id:a326152e6689deb5bcf507354e92c53e13bbeaeb
tcp_port:6379
uptime_in_seconds:300
uptime_in_days:0
hz:10
lru_clock:6014761
executable:/data/redis-server
config_file:
LoveHousedeiMac:idocker lovehouse$
LoveHousedeiMac:idocker lovehouse$ docker run -it redis:latest redis-cli -h 192.168.1.207 -p 6379 info server
# Server
redis_version:3.2.9
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:d837dd4aae3a6933
redis_mode:standalone
os:Linux 4.9.31-moby x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.9.2
process_id:1
run_id:dd00a257e902c91c5316ae3aafb1d67f6e30270b
tcp_port:6379
uptime_in_seconds:478
uptime_in_days:0
hz:10
lru_clock:6014771
executable:/data/redis-server
config_file:
LoveHousedeiMac:idocker lovehouse$

六、Redis 4.0的新特性

1. Module的支持。
module可以在不改变redis源代码主分支的基础上,通过高层抽象的API挂载外部模块,来提供更多的功能,我的理解,这是类似postgresSQL的hook。

2. PSYNC v2
PSYNC(Partial Replication,增量同步)得到改进。
之前是从库尝试发送 psync 命令到主库,主库判断是否满足 psync 条件, 满足就返回 +CONTINUE 进行增量同步, 否则返回 +FULLRESYNC runid offfset。

虽然psync 可以解决短时间主从同步断掉重连问题,但以下几个场景仍然是需要全量同步:
a). 主库/从库有重启过。因为 runnid 重启后就会丢失,所以当前机制无法做增量同步。
b). 从库提升为主库。其他从库切到新主库全部要全量不同数据,因为新主库的 runnid 跟老的主库是不一样的

psync v2增加了一个replid2,来记录是从哪个master做同步的,这个replid2是从master的replid继承过来的。如果之前这两个曾经属于同一个主库(多级也允许), 那么新主库的 replid2 就是之前主库的 replid。只要之前是同一主库且新主库的同步进度比这个从库还快就允许增量同步。
因此上述的第二点,从库提升为主库之后,还是可以使用增量同步。

3. 缓存回收策略改进。
增加了LFU(Last Frequently Used)缓存回收策略。最不常用的缓存数据进行清理。

4. 非阻塞性DEL和FLUSHALL/FLUSHDB
在 Redis 4.0 之前, 用户在使用 DEL 命令删除体积较大的键, 又或者在使用 FLUSHDB 和 FLUSHALL 删除包含大量键的数据库时, 都可能会造成服务器阻塞。
redis 4.0提供了unlink命令来替代del命令。这个命令可以异步的删除大量key且不会阻塞。(注,为了保留向前的兼容性,del命令仍然保留)
同时,redis 4.0还提供了flushdb async和flushall async,两个命令的async选项,来提供异步的删除大量key。

redis 4.0还提供了一个交换db的命令swapdb,如swapdb 0 1,就可以将db0和db1交换。原来在db0中的key,全部去了db1。

5.支持mixed RDB-AOF的持久化模式。
Redis 就可以同时兼有 RDB 持久化和 AOF 持久化的优点 —— 既能够快速地生成重写文件,也能够在出现问题时,快速地载入数据。

开启混合存储模式后 aof 文件加载的流程如下:
a). aof 文件开头是 rdb 的格式, 先加载 rdb 内容再加载剩余的 aof
b). aof 文件开头不是 rdb 的格式,直接以 aof 格式加载整个文件
判断 aof 文件的前面部分是否为 rdb 格式,只需要判断前 5 个字符是否是 REDIS。这个是因为rdb持久化开头就是REDIS, 同时aof文件开头一定不会是 REDIS(以前的版本文件开头都是*)。

6. 增加了内存检查命令,memory。如memory stats,memory usage,memory purge

7.增加了对NAT的支持。(主要是为了解决redis cluster在docker上的问题)。

更多信息,可见: Redis 4.0 release notesThe first release candidate of Redis 4.0 is out


Real-time materialized view,面向开发者的12.2新特性

$
0
0

先来谈谈为什么要有这个real time mv。

在12.2之前,如果你想获得实时的数据,那么在利用query rewrite前,你必须得用on commit的刷新方式刷新物化视图。但是on commit的刷新方式有众多限制,如sql的复杂度,如频繁对系统的压力等等。所以,我们不得不采用on command的方式来进行刷新(不管是全量刷新还是增量刷新)。那么在使用on command刷新的时候,必须得有个job来定时的刷,那么,在一次job运行之后,下一次job到来之前,如果基表有数据变化,那么此时的数据肯定不是最新的。

real time mv就是为了解决这个问题而生的。它即可以帮你获取实时的数据,且不用频繁的刷新mv。

我们来看一下这是怎么实现的。

传统mv的创建方式:

SQL> create table t1 (x not null primary key, y not null) as
  2    select rownum x, mod(rownum, 10) y from dual connect by level <= 1000000;

Table created.

SQL> create materialized view log on t1 with rowid (x, y) including new values;

Materialized view log created.

SQL>
SQL> create materialized view mv_old
  2  refresh fast on demand
  3  enable on query computation
  4  enable query rewrite
  5  as
  6    select y , count(*) c1
  7    from t1
  8    group by y;

Materialized view created.

SQL>
SQL>

Real time mv的创建方式:
注意在create mv时的关键字:enable on query computation

SQL> create table t2 (x not null primary key, y not null) as
  2    select rownum x, mod(rownum, 10) y from dual connect by level <= 1000000;

Table created.

SQL> create materialized view log on t2 with rowid (x, y) including new values;

Materialized view log created.

SQL>
SQL> create materialized view mv_new
  2  refresh fast on demand
  3  enable on query computation
  4  enable query rewrite
  5  as
  6    select y , count(*) c1
  7    from t2
  8    group by y;

Materialized view created.

SQL>
SQL>

我们来比较一下传统mv和real time mv的差别:
相关参数:

SQL> show parameter rewr

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
query_rewrite_enabled                string      TRUE
query_rewrite_integrity              string      enforced
SQL>
SQL>
SQL>
SQL>
SQL> set autotrace on explain stat
SQL>

初始状态:
传统mv:

SQL> select  y as y_new_parse1, count(*) from t1
  2  group by y;

Y_NEW_PARSE1   COUNT(*)
------------ ----------
           1     100000
           6     100000
           2     100000
           4     100000
           5     100000
           8     100000
           3     100000
           7     100000
           9     100000
           0     100000

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2738786661

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |    10 |    60 |     3   (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| MV_OLD |    10 |    60 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
       1029  recursive calls
          2  db block gets
       1587  consistent gets
         76  physical reads
          0  redo size
        739  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         86  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>
SQL>

Real time mv:

SQL> select  y as y_new_parse1, count(*) from t2
  2  group by y;

Y_NEW_PARSE1   COUNT(*)
------------ ----------
           1     100000
           6     100000
           2     100000
           4     100000
           5     100000
           8     100000
           3     100000
           7     100000
           9     100000
           0     100000

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 496717744

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |    10 |    60 |     3   (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| MV_NEW |    10 |    60 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
        170  recursive calls
         13  db block gets
        248  consistent gets
          7  physical reads
       2008  redo size
        739  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         21  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>

看到此时2个物化视图,数据都是最新的,staleness显示是fresh:

SQL> select mview_name,staleness,on_query_computation from user_mviews;

MVIEW_NAME                               STALENESS           O
---------------------------------------- ------------------- -
MV_OLD                                   FRESH               N
MV_NEW                                   FRESH               Y

SQL>

物化视图日志里面也没有记录

SQL> select count(*) from MLOG$_T1;

  COUNT(*)
----------
         0

SQL> select count(*) from MLOG$_T2;

  COUNT(*)
----------
         0

SQL>

我们对基表insert数据:

SQL> insert into t1
  2  select 1000000+rownum, mod(rownum, 3) from dual connect by level <= 999;

999 rows created.

SQL>
SQL> insert into t2
  2  select 1000000+rownum, mod(rownum, 3) from dual connect by level <= 999;

999 rows created.

SQL> commit;

Commit complete.

SQL>

可以看到2个表的staleness已经变成need compile,且物化视图日志表里面,也与了日志的记录:

SQL> select mview_name,staleness,on_query_computation from user_mviews;

MVIEW_NAME                               STALENESS           O
---------------------------------------- ------------------- -
MV_OLD                                   NEEDS_COMPILE       N
MV_NEW                                   NEEDS_COMPILE       Y

SQL>
SQL> select count(*) from MLOG$_T1;

  COUNT(*)
----------
       999

SQL> select count(*) from MLOG$_T2;

  COUNT(*)
----------
       999

SQL>

我们来见证一下奇迹的时刻。我们先重复上面第一个查询,可以看到,由于数据stale,且没有set query_rewrite_integrity=stale_tolerated,传统mv没有进行query write。

SQL> select  y as y_new_parse1, count(*) from t1
  2  group by y;

Y_NEW_PARSE1   COUNT(*)
------------ ----------
           1     100333
           6     100000
           2     100333
           4     100000
           5     100000
           8     100000
           3     100000
           7     100000
           9     100000
           0     100333

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 136660032

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    30 |   515   (4)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    30 |   515   (4)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|  2929K|   498   (1)| 00:00:01 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
       1975  recursive calls
         30  db block gets
       4167  consistent gets
       1786  physical reads
       5440  redo size
        754  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
        131  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>
SQL>

我们看到,real time mv,进行了query rewrite,且查到的数据是最新实时数据!

SQL> select  y as y_new_parse1, count(*) from t2
  2  group by y;

Y_NEW_PARSE1   COUNT(*)
------------ ----------
           6     100000
           4     100000
           5     100000
           8     100000
           3     100000
           7     100000
           9     100000
           1     100333
           2     100333
           0     100333

10 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 542978159

------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |    12 |   312 |    22  (14)| 00:00:01 |
|   1 |  VIEW                               |                |    12 |   312 |    22  (14)| 00:00:01 |
|   2 |   UNION-ALL                         |                |       |       |            |          |
|*  3 |    VIEW                             | VW_FOJ_0       |    10 |   290 |     9  (12)| 00:00:01 |
|*  4 |     HASH JOIN FULL OUTER            |                |    10 |   240 |     9  (12)| 00:00:01 |
|   5 |      VIEW                           |                |     1 |     7 |     6  (17)| 00:00:01 |
|   6 |       HASH GROUP BY                 |                |     1 |    22 |     6  (17)| 00:00:01 |
|*  7 |        TABLE ACCESS FULL            | MLOG$_T2       |   999 | 21978 |     5   (0)| 00:00:01 |
|   8 |      VIEW                           |                |    10 |   170 |     3   (0)| 00:00:01 |
|   9 |       MAT_VIEW ACCESS FULL          | MV_NEW         |    10 |    60 |     3   (0)| 00:00:01 |
|  10 |    VIEW                             |                |     2 |    52 |    13  (16)| 00:00:01 |
|  11 |     UNION-ALL                       |                |       |       |            |          |
|* 12 |      FILTER                         |                |       |       |            |          |
|  13 |       NESTED LOOPS OUTER            |                |     1 |    32 |     6  (17)| 00:00:01 |
|  14 |        VIEW                         |                |     1 |    26 |     6  (17)| 00:00:01 |
|* 15 |         FILTER                      |                |       |       |            |          |
|  16 |          HASH GROUP BY              |                |     1 |    22 |     6  (17)| 00:00:01 |
|* 17 |           TABLE ACCESS FULL         | MLOG$_T2       |   999 | 21978 |     5   (0)| 00:00:01 |
|* 18 |        INDEX UNIQUE SCAN            | I_SNAP$_MV_NEW |     1 |     6 |     0   (0)| 00:00:01 |
|  19 |      NESTED LOOPS                   |                |     1 |    35 |     7  (15)| 00:00:01 |
|  20 |       VIEW                          |                |     1 |    29 |     6  (17)| 00:00:01 |
|  21 |        HASH GROUP BY                |                |     1 |    22 |     6  (17)| 00:00:01 |
|* 22 |         TABLE ACCESS FULL           | MLOG$_T2       |   999 | 21978 |     5   (0)| 00:00:01 |
|* 23 |       MAT_VIEW ACCESS BY INDEX ROWID| MV_NEW         |     1 |     6 |     1   (0)| 00:00:01 |
|* 24 |        INDEX UNIQUE SCAN            | I_SNAP$_MV_NEW |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("AV$0"."OJ_MARK" IS NULL)
   4 - access(SYS_OP_MAP_NONNULL("SNA$0"."Y")=SYS_OP_MAP_NONNULL("AV$0"."GB0"))
   7 - filter("MAS$"."SNAPTIME$$">TO_DATE(' 2017-07-12 14:35:01', 'syyyy-mm-dd hh24:mi:ss'))
  12 - filter(CASE  WHEN ROWID IS NOT NULL THEN 1 ELSE NULL END  IS NULL)
  15 - filter(SUM(1)>0)
  17 - filter("MAS$"."SNAPTIME$$">TO_DATE(' 2017-07-12 14:35:01', 'syyyy-mm-dd hh24:mi:ss'))
  18 - access("MV_NEW"."SYS_NC00003$"(+)=SYS_OP_MAP_NONNULL("AV$0"."GB0"))
  22 - filter("MAS$"."SNAPTIME$$">TO_DATE(' 2017-07-12 14:35:01', 'syyyy-mm-dd hh24:mi:ss'))
  23 - filter("MV_NEW"."C1"+"AV$0"."D0">0)
  24 - access(SYS_OP_MAP_NONNULL("Y")=SYS_OP_MAP_NONNULL("AV$0"."GB0"))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan


Statistics
----------------------------------------------------------
        906  recursive calls
         64  db block gets
       1232  consistent gets
         14  physical reads
      10548  redo size
        744  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         64  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>

我们看到,在查t2的时候,优化器会根据成本决定是否使用query rewrite。
我们的这个例子中CBO选择使用query rewrite。可以看到query rewrite到物化视图之后,不是取的是过期的物化视图的值,而是最新的值。结合执行计划,可以看到,是结合了stale的物化视图,再union all和hash join outer了物化视图日志。得到了最新的结果。

可以看到,使用的物化视图日志是”MAS$”.”SNAPTIME$$”>TO_DATE(‘ 2017-07-12 14:35:01’, ‘syyyy-mm-dd hh24:mi:ss’)之后的。

对比直接从table取值,到利用real time物化视图取值,consistent get从4167变成了1232。

注意我们的mv log还是没有被刷新的。还是需要去定期的job刷新:

SQL> select count(*) from MLOG$_T1;

  COUNT(*)
----------
       999

SQL> select count(*) from MLOG$_T2;

  COUNT(*)
----------
       999

SQL>

另外再提一下,有个/*+ fresh_mv */的hint,可以直接查询real time mv的实时结果:

SQL> select * from mv_new;

         Y         C1
---------- ----------
         1     100000
         6     100000
         2     100000
         4     100000
         5     100000
         8     100000
         3     100000
         7     100000
         9     100000
         0     100000

10 rows selected.

SQL>
SQL> select /*+ fresh_mv */* from mv_new;

         Y         C1
---------- ----------
         6     100000
         4     100000
         5     100000
         8     100000
         3     100000
         7     100000
         9     100000
         1     100333
         2     100333
         0     100333

10 rows selected.

综上,Real time mv利用原来的已经stale的物化视图,结合mv log,通过计算后,帮你获取实时的数据。你即能获得实时数据,又不必那么频繁的刷新mv。

参考:
https://blogs.oracle.com/sql/12-things-developers-will-love-about-oracle-database-12c-release-2#real-time-mv
https://blog.dbi-services.com/12cr2-real-time-materialized-view-on-query-computation/
https://uhesse.com/2017/01/05/real-time-materialized-views-in-oracle-12c/
https://docs.oracle.com/database/122/SQLRF/CREATE-MATERIALIZED-VIEW.htm#SQLRF01302

ASM添加磁盘最佳实践

$
0
0

当FRA区或者DATA区磁盘空间不够的时候,我们需要为ASM添加磁盘。
添加磁盘的high level的步骤为:

1. SA分配共享磁盘,要求在多个节点都能看到这些磁盘。
2. 将共享磁盘分区,将分区后的磁盘,创建成asmdisk
3. 将asmdisk加入到asm的diskgroup中

下面是具体的实施步骤:
(一). SA分配共享磁盘,要求在多个节点都能看到这些磁盘。
1. 在SA未加磁盘之前,记下/dev/sd*的磁盘名称,已经到了那个字母,以便识别后续的下一个字母为新加的磁盘。
对于已经加入到asm的磁盘,对应于哪个磁盘,可以先用oracleasm listdisks列出有多少个已经创建的asm磁盘,然后用oracleasm querydisk -p 看对应的物理路径

2. SA加盘之后,需要在多个节点都能看到这些盘,通过ls -l /dev/sd*应该可以看到新增之后的磁盘。

(二). 将共享磁盘分区,将分区后的磁盘,创建成asmdisk
1. 在一个节点上,用fdisk命令,将新建的共享磁盘分区:
fdisk /dev/sdn
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p):
Using default response p
Partition number (1-4, default 1):
First sector (32768-25165823, default 32768):
Using default value 32768
Last sector, +sectors or +size{K,M,G} (32768-25165823, default 25165823): +50G
Partition 1 of type Linux and of size 10 GiB is set
Command (m for help): w
注意,这里分区的大小,必须严格遵守和已经存在盘一致的大小,不然同一个diskgroup中不同大小的盘,会导致rebalance不平衡,引起性能问题。

2. 在一个节点完成分区后,在其他节点观察是否也完成了分区。查看是否存在sd*1,如果没有,可以利用fdisk /dev/sd 然后 p参数(p表示print),然后退出。即可看到分区后的硬盘sd1

3. 在一个节点上,创建对应的asm磁盘:
oracleasm createdisk FRA05 /dev/sdn1
oracleasm createdisk FRA06 /dev/sdo1
oracleasm createdisk FRA07 /dev/sdp1
oracleasm createdisk FRA08 /dev/sdq1

4. 在多个节点上oracleasm listdisks,查看是否创建了新的asmdisk(对比第(一)步的第1点),如果没有看到,用oracleasm scandisks一次之后,再次oracleasm listdisks。如果还是没有看到,说明之前的创建步骤有问题。停止后续操作,检查分析之前步骤的执行情况

5. 登录sqlplus ‘/as sysasm”
select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,name,path from v$asm_disk;
观察上述新加的asm磁盘的HEADER_STATUS状态,应该是PROVISIONED

(三)将asmdisk加入到asm的diskgroup中
1. 先在一个节点创建一个test diskgroup,注意使用的是path name,而不是name。
CREATE DISKGROUP TEST EXTERNAL REDUNDANCY DISK ‘ORCL:FRA05′,’ORCL:FRA06’;
ALTER DISKGROUP TEST ADD DISK ‘ORCL:FRA07′,’ORCL:FRA08′;

2. 在其他节点进行mount该diskgroup,注意mount之前,状态应该是DISMOUNTED
SELECT STATE, NAME FROM V$ASM_DISKGROUP where name=’TEST’;

3.mount磁盘,看是否有报错:
ALTER DISKGROUP TEST MOUNT;

4. mount之后,状态应该是MOUNTED:
SELECT STATE, NAME FROM V$ASM_DISKGROUP where name=’TEST’;

5. 确认上述操作步骤没有失败后。删除测试用的test diskgroup,先在别的节点dismount该磁盘组
alter diskgroup test dismount;

6. 在第一个节点drop diskrgoup:
DROP DISKGROUP TEST;

7. 在第一个节点添加磁盘:
ALTER DISKGROUP FRA ADD DISK ‘ORCL:FRA05′,’ORCL:FRA06′,’ORCL:FRA07′,’ORCL:FRA08’ ;

8. 根据情况,调整rebalance power,(注:白天业务高峰期,禁止使用超过3的power)
alter diskgroup fra rebalance power 8;

9. 观察asm rebalance的情况,直到v$asm_operation返回0行记录,才算变更完成。
select * from v$asm_operation;

利用Oracle存储过程发送邮件

$
0
0

/**配置ACL***/
begin
dbms_network_acl_admin.create_acl (
acl => ‘smtp_permissions.xml’, — or any other name
description => ‘SMTP Access’,
principal => ‘DBMGR’, — the user name trying to access the network resource
is_grant => TRUE,
privilege => ‘connect’,
start_date => null,
end_date => null
);
end;
/
commit;
begin
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(acl => ‘smtp_permissions.xml’,
principal => ‘DBMGR’,
is_grant => true,
privilege => ‘connect’);
end;
/
commit;

BEGIN
dbms_network_acl_admin.assign_acl (
acl => ‘smtp_permissions.xml’,
host => ‘10.10.8.1’, /*can be computer name or IP , wildcards are accepted as well for example – ‘*.us.oracle.com’*/
lower_port => 25,
upper_port => null
);
END;
/
commit;

/**创建发送邮件的存储过程***/
CREATE OR REPLACE PROCEDURE send_mail(
p_recipient VARCHAR2, — 邮件接收人
p_subject VARCHAR2, — 邮件标题
p_message VARCHAR2– 邮件正文
)
IS
–下面四个变量请根据实际邮件服务器进行赋值
v_mailhost VARCHAR2(30) := ‘10.10.8.1’; –SMTP服务器地址
v_user VARCHAR2(30) := ‘mymailuser’; –登录SMTP服务器的用户名
v_pass VARCHAR2(20) := ‘mypasswd’; –登录SMTP服务器的密码
v_sender VARCHAR2(50) := ‘mymailuser@dji.com ‘; –发送者邮箱,一般与 ps_user 对应
v_conn UTL_SMTP.connection; –到邮件服务器的连接
v_msg varchar2(4000); –邮件内容
BEGIN
v_conn := UTL_SMTP.open_connection(v_mailhost, 25);
UTL_SMTP.ehlo(v_conn, v_mailhost); –是用 ehlo() 而不是 helo() 函数
–否则会报:ORA-29279: SMTP 永久性错误: 503 5.5.2 Send hello first.
UTL_SMTP.command(v_conn, ‘AUTH LOGIN’); — smtp服务器登录校验
UTL_SMTP.command(v_conn,UTL_RAW.cast_to_varchar2(UTL_ENCODE.base64_encode(UTL_RAW.cast_to_raw(v_user))));
UTL_SMTP.command(v_conn,UTL_RAW.cast_to_varchar2(UTL_ENCODE.base64_encode(UTL_RAW.cast_to_raw(v_pass))));
UTL_SMTP.mail(v_conn, ‘<'||v_sender||'>‘); –设置发件人
UTL_SMTP.rcpt(v_conn, ‘<'||p_recipient||'>‘); –设置收件人
— 创建要发送的邮件内容 注意报头信息和邮件正文之间要空一行
v_msg :=’Date:’|| TO_CHAR(SYSDATE, ‘yyyy mm dd hh24:mi:ss’)
|| UTL_TCP.CRLF || ‘From: ‘|| v_sender || ”
|| UTL_TCP.CRLF || ‘To: ‘ || p_recipient || ”
|| UTL_TCP.CRLF || ‘Subject: ‘ || p_subject
|| UTL_TCP.CRLF || UTL_TCP.CRLF — 这前面是报头信息
|| p_message; — 这个是邮件正文
UTL_SMTP.open_data(v_conn); –打开流
UTL_SMTP.write_raw_data(v_conn, UTL_RAW.cast_to_raw(v_msg)); –这样写标题和内容都能用中文
UTL_SMTP.close_data(v_conn); –关闭流
UTL_SMTP.quit(v_conn); –关闭连接
EXCEPTION
WHEN OTHERS THEN
DBMS_OUTPUT.put_line(DBMS_UTILITY.format_error_stack);
DBMS_OUTPUT.put_line(DBMS_UTILITY.format_call_stack);
END send_mail;
/

/**发送邮件**/
begin
send_mail(‘xxxxx@dji.com’,’Tablespace XX is full.’,’Tablespace XXX is NN full, please add more space.’);
end;
/

是的,大疆DBA团队需要你的加入

$
0
0

是的,我们在招人。

大疆DBA团队扩建了,目前有6个headcount,欢迎各路豪杰的加入。

岗位职责:
1. 负责内网和云上(aws和阿里云)数据库的故障响应。
2. 负责公司数据库安装,部署,SQL优化,数据库故障的根因分析;
3.负责数据库自动化运维的开发,推进数据库的自动化建设;
4. 根据项目的不同需求,制定数据同步方案、高可用方案、备份方案、安全方案等。

任职要求:
1. 本科以上学历,2年以上相关工作经验;
2. 精通oracle、sql server、mysql、postgresql、redis、mongodb中的至少2种数据库。包括安装、备份恢复、问题诊断,高可用架构、容灾架构、性能优化和代码优化。熟悉公有云(AWS或者阿里云)数据库的问题诊断方式;
3. 熟悉数据原理,熟悉上述数据库之间的区别,包括上述数据库的事务隔离机制、锁机制、版本控制机制等。熟悉上述数据库常用监控指标、范围、影响;
4. 熟悉linux性能优化,熟悉shell/python任意一种脚本语言;
5. 了解数据库中间件,如mycat、atlas、codis等;
6. 了解存储、虚拟化和网络相关知识。

简历请投:jimmy.he[at]dji.com。

其他运维工程师同期也在火热招聘中:

pg的跨库查询

$
0
0

mysql和mssql的跨库查询,基本只需要dbname.schema.table_name就可以实现,而pg的跨库查询,和oracle一样,需要通过类似dblink的方式来实现。pg在9.3之前建议使用dblink,在9.3之后,建议使用postgres_fwd(foreign-data wrapper)。
我们假设有个库mydb001,里面有2个用户mydb001_rw和mydb001_r,分别是读写用户和只读用户。有另外一个库dbprd2,里面也是有2个用户dbprd2_rw和dbprd2_r。
我们需要在mydb001库中利用mydb001_rw用户,去只读的查询dbprd2库的tb_orad_mutex表。

一、需要以superuser安装extension(注,如果你需要每个database都使用,那么每个database都要装一次这个extension,或者你也可以一开始就在template1中安装,那么后续新建的database也都会包含了这个extension):

psql -U dbmgr -d mydb001
--drop extension postgres_fdw;
create extension postgres_fdw;

mydb001=> \dx
                               List of installed extensions
     Name     | Version |   Schema   |                    Description
--------------+---------+------------+----------------------------------------------------
 plpgsql      | 1.0     | pg_catalog | PL/pgSQL procedural language
 postgres_fdw | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 uuid-ossp    | 1.1     | public     | generate universally unique identifiers (UUIDs)
(3 rows)

mydb001=>

二、还是以superuser用户,创建remote server,用于连接远程数据库。

--drop server remote_db;
create server remote_db foreign data wrapper postgres_fdw options(host '127.0.0.1',port '5432',dbname 'dbprd2');
mydb001=> \des
         List of foreign servers
   Name    | Owner | Foreign-data wrapper
-----------+-------+----------------------
 remote_db | dbmgr | postgres_fdw
(1 row)

mydb001=>
GRANT USAGE ON FOREIGN SERVER remote_db TO mydb001_rw;
GRANT USAGE ON FOREIGN SERVER remote_db TO mydb001_r;
\q

注意此时修改pg_hba.conf文件,允许连接。

# TYPE DATABASE  USER   ADDRESS     METHOD
……
host all all 127.0.0.1/32 md5

三、以应用用户连接,创建user mapping:

psql -U mydb001_rw -d mydb001
--drop user mapping for mydb001_rw server remote_db;
create user mapping for mydb001_rw server remote_db options(user 'dbprd2_r',password 'WTDw2#@e');

四、应用用户下创建 FOREIGN TABLE:

--drop FOREIGN TABLE  db_dbprd2_tb_orad_mutex;
CREATE FOREIGN TABLE
db_dbprd2_tb_orad_mutex(appid integer,appkey character varying(40),appindex character varying(40) ,status integer)
server remote_db
options (schema_name 'dbprd2_rw',table_name 'tb_orad_mutex');

五、测试查询,以及尝试是否能更新(注,如果mapping user的时候,用的是读写用户,那么也是可以更新的)

-- mydb001_rw用户查询dbprd2数据库的表。
-bash-4.2$ psql -U mydb001_rw -d mydb001
psql (9.6.2)
Type "help" for help.

mydb001=> select * from db_dbprd2_tb_orad_mutex limit 2;
 appid  |      appkey      |    appindex    | status
--------+------------------+----------------+--------
 123456 | AAAAAAAAAAAAAAAA | lm             |    999
 654321 | BBBBBBBBBBBBBBB  | abcdefghijklm  |    999
(2 rows)


--由于之前的user mapping是通过只读用户连接,所以更新操作会报错:
mydb001=> begin;
BEGIN
mydb001=> update db_dbprd2_tb_orad_mutex set appindex='zxsaqwerre' where appid='654321' and appkey='BBBBBBBBBBBBBBB';
ERROR:  permission denied for relation tb_orad_mutex
CONTEXT:  Remote SQL command: UPDATE dbprd2_rw.tb_orad_mutex SET appindex = 'zxsaqwerre'::character varying(40) WHERE ((appid = 654321)) AND ((appkey = 'BBBBBBBBBBBBBBB'::text))
mydb001=> rollback;
ROLLBACK
mydb001=>
mydb001=>

Viewing all 129 articles
Browse latest View live