What are the building blocks?
03 Jan 2024
So with a lot of data Hadoop needs a way to store it reliably across a fleet of machines. Thats what the HDFS does.
HDFS is a distributed filesystem, optimised for read heavy workloads.
Blocks
Blocks are the min unit for reading and writing data. Disk blocks are usually 512 and file blcoks are usually few KB. Hadoop blocks are usually 128MB. Big HDFS blocks to reduce time spent doing disk seeks to start of block.
How to list blocks:
hdfs fsck / -files -blocks
Namenodes and Datanodes
HDFS has two types of nodes operating in master worker pattern. Namenodes are masters, datanodes are workers.
Namenode
- Managers filesystem namespace
- Maintains filesystem tree and metadata for all files and directories
- Maintains what datanodes have what blocks
- Persisted to disk as namespace image and edit log
Datanode
- Store, retrieve blocks
- Report to namenode with lists of blocks they store
When the namenode is lost, we’re fucked
- So we can write namenode info to multiple filesystems
- And have a secondary namenode which just periodically merges namespace image with edit log. Fancy way to say that it does checkpointing.
We can run multiple namenodes that each manage a namespace volume
- Just partitioning
We can also run multiple nodes for higher availability
- One is on standby
- The two nodes need highly available shared storage for edit log, usually using Quorum Journal Manager that runs a few journal nodes, where each log is committed to the majority of nodes. Similar to ZooKeeper.
- Having multiple nodes is a pain when a node goes down. Failover controller just manages this, still has to go to great lengths in ungracegful handover to stop misbehaving node at all costs.
The CLI
Pretty much the same as unix.
- Just add -ls or -mkdir after hadoop fs
Hadoop filesystems
HDFS is one implementation of the hadoop filesystem. org.apache.hadoop.fs.FileSystem
How to talk to other implementations.
hadoop fs -ls s3a:///
How important is data locality really?
Hadoop can only write to the end of the file. Hadoop cannot do random modifications of a file.
What is FUSE Filesystem in Userspace allows filesystems that are implemented in user space to be integrated as Unix filesystems.
The Java Interface
You can read data with URLs specifying an URL FsUrlStreamHandlerFactory
Get a FileSystem
then you can open, seek, read, create, append files and directories.
From that get a FileStatus
that you can get all the file and directory metadata.
Data Flow
Read Pretty simple, opening a file with HDFS, talks to namenode first to get block locations of datanodes, then talks to the relevant datanodes.
Write
Talks to namenode, starts writing to the FSDataOutputStream
that writes packets to a pipeline of datanodes. Each datanodes passes data to next replication datanode. When all datanodes acknowledge they’ve written a packet then the packet is complete.
Huh
Content written to a file is not guaranteed to be visible even if stream is flushed. A block of data needs to be written or you need to call hflush
or hsync
for it to be persisted.
Parallel Copying with distcp
hadoop distcp file1 file2
This is pretty cool it just does a mapreduce without reduce to copy stuff.