Tuesday, 26 February 2019

vSAN Architecture Components

We are going to discuss vSAN architecture components in this post. Generally, vSAN is easy to work with using management tools like vSphere client, Ruby Console or Web client and it simplifies administrative tasks greatly. This where it is important from troubleshooting point of view to know how things work under the hood. vSAN operations are managed by these components behind the scenes.

vSAN architecture Components:

Image: VMware

  • Cluster Membership, Monitoring, and Directory Services (CMMDS):
    • CMMDS provides the topology and object configuration information to the CLOM and DOM.
    • CMMDS discovers, maintains, and establishes a cluster of networked node members
    • CMMDS defines the cluster roles: Master, Backup, and Agent
    • CMMDS selects the owners of the objects.
    • CMMDS inventories all items, such as hosts, networks and devices
    • CMMDS stores object metadata information, such as policy-related information on an in-memory database
  • Cluster-Level Object Manager (CLOM):
    • The CLOM process runs on each ESXi host in the vSAN cluster.
    • CLOM process validates if the objects can be created based on policies assigned and resources available in the vSAN cluster.
    • CLOM also defines the creation and migration of objects.
    • CLOM distributes loads evenly across vSAN nodes.
    • CLOM also manages proactive and reactive re-balance.
    • CLOM is responsible for object compliance.
  • Distributed Object Manager (DOM):
    • DOM runs on each ESXi host in the cluster. 
    • DOM receives instructions from the CLOM and other DOMs running on other hosts in the cluster.
    • DOM communicates with LSOM and instructs it to create local components of an object.
    • Each object in a vSAN cluster has a DOM owner and a DOM client.
    • There is one DOM owner that exists per object and it determines which processes are allowed to send I/O to the object.
    • The DOM client performs the I/O to an object on behalf of a particular virtual machine and runs on every node that contains components.
    • DOM services on ESXi hosts in vSAN cluster communicate with each other to co-ordinate the creation of components.
    • DOM re-synchronize objects during a recovery.
  • Local Log Structured Object Manager (LSOM):
    • LSOM creates the local components as instructed by the DOM.
    • LSOM performs the encryption process for the vSAN datastore when enabled.
    • LSOM interacts directly with the solid-state and magnetic devices.
    • LSOM performs solid-state drive log recovery when the vSAN node boots up.
    • LSOM reports unhealthy storage and network devices.
    • LSOM performs I/O retries on failing devices.
    • LSOM provides read and write buffering.
  • Reliable Datagram Transport (RDT):
    • RDT is the network protocol for the transmission of vSAN traffic over vSAN network.


Tuesday, 8 January 2019

vROPS 6.6 architecture components

In vROPS 6.0, new platform design was introduced by VMware to meet below listed goals.

  • Treat all solutions equally and manage both VMware and third party solutions.
  • Highly scalable platform with minimal reconfigurations and redesign requirements
  • Monitoring solution with native high availability
Following diagram shows components of vRealize Operations Manager 6.6.

Let's talk about each of these components in detail.

Watchdog:
  • It maintains the vROPS Services/Daemons.
  • Watchdog attempts to restart any vROPS daemon in case it is in failed state.
  • vcops-watchdog python script runs every 5 mins to check vROPS services.
  • Watchdog service checks include:
    • PID file of service
    • Service status
Apache2 HTTPD:
  • Provides backend platform for tomcat instances which provides vROPS UIs.
User Interfaces:
  • In vROPS 6.6, user interface is broken into two components
    • Product UI
    • Admin UI
Product UI:
  • Hosted by Pivotal tc Server (Apache Web server)
  • Can be accessed using https://<Nodename>/ui/login.action
  • Except remote collector role, this UI is present in all server roles (Master, Replica, Data Collector)
  • Primary purpose of product UI is to make GemFIRE calls to controller API to access data and create views, Dashboards or reports.
Admin UI:
  • Hosted by Pivotal tc server (Apache Web server)
  • Used to perform administrative tasks using HTTP REST calls to admin API
  • Can be accessed using https://<NodeName>/admin 
Suite API:
  • It is a public facing API
  • Used for Automation or scripting of common tasks.
  • Also used by vROPS for administrative tasks.
Collector:
  • Responsible for pulling inventory and metric data from configured sources using data adaptors.
  • After collecting data, collector contacts GemFIRE locator to locate one or more Controller Cache servers.
  • Then Collector connects to one or more Controller cache servers and sends collected data.
  • The collector sends heartbeat to controller every 30 seconds via HeartbeatThread process (Max 25 data collection threads) which runs on collector.
GemFIRE Locator:
  • Runs on Master node and Replica node.
  • On data collector and remote collector, GemFIRE runs as client process.
GemFIRE: 
  • VMware vFabric GemFIRE is an in-memory, low latency data grid.
  • Runs in same JVM as the controller and analytics.
  • Scales as needed when nodes are added to cluster.
  • GemFIRE allows caching, processing, and retrieval of metrics
  • Dependent on GemFIRE locator
Controller:
  • It's a sub-process of analytics process.
  • Monitors collector status every 1 minute.
  • Controller node runs HeartbeatServer thread which processes heartbeats from collector.
  • Responsible for co-ordinating activities between cluster members.
  • Manages storage and retrieval of inventory objects within system.
  • Leverages MAPReduce function (Google also uses same function for search results) for selective query and faster results.
Analytics:
  • Analytics layer is responsible for:
    • Metric calculations
    • Dynamic threshold
    • Alerts and Alarms
    • Storage and retrieval of metrics from Persistence layer
    • Root Cause Analysis
    • HIS metadata calculations and object relationship data.
  • Analytics works with GemFIRE, Controller and persistence layer 
  • Responsible for generating SMTP/SNMP alerts on Master and replica node.
Persistence layer:
  • Also known as Database layer.
  • This layer consists of series of databases each performing different functions as per role.
  • There are five primary database services 
    • Cassandra DB:
      • Introduced in vROPS 6.1
      • It is a Apache Cassandra DB
      • Replaces Global xDB in earlier versions
      • Stores all settings that are applied globally (CONTENT folder).
      • Designed to handle large structured data across multiple nodes.
      • Provides HA with no single point of failure
      • Highly scalable.
      • No Sharding is used by this DB
      • Stores below content
        • User preferences and configuration
        • Alerts Definition
        • Customizations
        • Dashboards, Policies and view
        • Reports, licensing
        • Shard maps
        • Activities
    • Central DB:
      • It is a PostgreSQL DB
      • Also called as REPL
      • Sharding is used by this DB
      • Exists only on Master node and Replica Node when HA is enabled
      • Accessible via port 5433
      • located at /storage/db/vcops/vpostgres/repl
      • Stores resource inventory information only.
    • Alerts/HIS DB:
      • Also called as Data
      • It is a PostgreSQL DB
      • Stores Alerts and Alarms history, history of resource property data, and history of resource relationship. 
      • Exists on Master, Replica and Data nodes
      • Accessible via port 5432
      • Sharding is used by this DB.
      • Located at /storage/db/vcops/vpostgres/data
    • FSDB:
      • FSDB is a GemFIRE Server which runs inside analytics JVM.
      • FSDB contains all raw time series metrics and super metrics data for resources.
      • It stores data collected by data adapters.
      • Also stores data calculated or generated after analysis
      • FSDB uses sharing for distributing data for new objects
      • FSDB is available on Master, Replica and Data collector nodes in vROPS
      • Sharding is used by this DB
    • CaSa DB:
      • Also called as HSQL
      • It is a small, flat, JSON-Based, in-memory DB
      • Used by CaSA for cluster administration 
      • Sharding is not used by CaSA DB

















Popular Posts This Week