Hadoop java.io.IOException: Gap in transactions

Este es un problema que me he encontrado hoy mismo en unos de los entornos de Hadoop (BTW, usamos Hortonworks), que producia que no pudieramos arrancar el Namenode principal y por lo tanto, el cluster entero estuviera caido

Problema

Encontramos este error en el hdfs.log de hadoop
java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 57555113 but unable to find any edit logs containing txid 57321255 java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 57555113 but unable to find any edit logs containing txid 57321255

Solucion

Vamos a recuperar en nodo, ejecutamos como usurio hdfs y le damos a a para que vayamos descartando esos bloques

% ./bin/hadoop namenode -recover

1
2
3
4
5
6
7
18/07/24 08:34:02 ERROR namenode.MetaRecoveryContext: Failed to apply edit log operation CloseOp [length=0, inodeId=0, path=/user/kosys/streaming/topics/+tmp/consent_content_updates/partition=201807/91c3b2fe-0b63-44c5-a24c-bb9d2f22300f_tmp.orc, replication=3, mtime=1532347563607, atime=1532347495094, blockSize=268435456, blocks=[blk_1077417863_3677675], permissions=kosys:hdfs:rw-r--r--, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, txid=57554877]: error File does not exist: /user/kosys/streaming/topics/+tmp/consent_content_updates/partition=201807/91c3b2fe-0b63-44c5-a24c-bb9d2f22300f_tmp.orc

Enter 'c' to continue, applying edits
Enter 's' to stop reading the edit log here, abandoning any later edits
Enter 'q' to quit without saving
Enter 'a' to always select the first choice in the future without prompting. (c/s/q/a)
a

Probamos a arrancar el namenode de nuevo, si arranca bien, si no, vamos a ver si nos da este error:

java.io.FileNotFoundException: /hadoop/hdfs/namenode/current/VERSION (Permission denied)

Problemas de persmisos, los arreglamos ejecutando chown -R hdfs:hdfs /hadoop/hdfs/namenode y volvemos a probar…

1
2
3
4
5
6
7
2018-07-24 08:44:21,855 INFO  blockmanagement.BlockReportLeaseManager (BlockReportLeaseManager.java:registerNode(205)) - Registered DN fd18f58c-2329-4b63-b540-f557006e4d73 (10.25.162.42:50010).
2018-07-24 08:44:21,856 INFO hdfs.StateChange (FSNamesystem.java:reportStatus(5908)) - STATE* Safe mode ON.
The reported blocks 0 needs additional 73478 blocks to reach the threshold 1.0000 of total blocks 73477.
The number of live datanodes 2 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
2018-07-24 08:44:21,944 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateHeartbeatState(401)) - Number of failed storage changes from 0 to 0
2018-07-24 08:44:21,944 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(833)) - Adding new storage ID DS-c9e00683-05a3-479a-90fc-de32760fc505 for DN 10.25.162.42:50010
2018-07-24 08:44:21,944 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(833)) - Adding new storage ID DS-f06f6ac8-f8ac-4601-ba41-cac3de955799 for DN 10.25.162.42:50010

Dahora error de Safemode, por problemas de recursos, desactivamos el SafeMode una vez que ya tenemos el HDFS funcionando como usuario hdfs

1
2
3
4
5
<<< QA >>> root@hadoop-1:/var/log/hadoop/hdfs# su hdfs dfsadmin -safemode leave
su: failed to execute afemode: No such file or directory
<<< QA >>> root@hadoop-1:/var/log/hadoop/hdfs# su hdfs
<<< QA >>> hdfs@hadoop-1:/var/log/hadoop/hdfs$ hdfs dfsadmin -safemode leave
Safe mode is OFF

Ahora si, deberia de funcionar todo :)

Bonus:

en nuestro caso ademas de esto, un DataNode estaba prodrido, seguire en otro post como arreglarlo, pero al menos los Namenodes arrancan y podemos arrancar los otros DataNotes y servicios

Listo!


link 1 (Unable to Start the Name Node in hadoop - Stack Overflow)
link 2 (Permission denied error during NameNode start - Hortonworks)

Comentários

⬆︎TOP