PowerScale Failed Authentication
- PowerScale Failed Authentication
- Problem Summary
- Problem Details
- Expected Behavior
- Reproduction
- Cluster Setup
- Code for Testing Authentication Mechanisms
- Concepts
Problem Summary
PowerScale OneOS inaccurately reports authentication failures when the number of concurrent sessions is exceeded.
Problem Details
If --concurrent-session-limit=LIMIT
is set with isi auth settings global modify --concurrent-session-limit=15
and that limit is exceeded the logs will say:
HTTP Error Log
tail -f /var/log/apache2/webui_httpd_error.log
2023-11-28T17:39:39.572651+00:00 <18.3> grantcluster-1(id1) httpd[98700]: [auth_isilon:error] [pid 98700:tid 34421640960] [client 172.16.5.155:62570] (STATUS_ACCESS_DENIED (0xC0000022) HTTP error: 401) Failed issuing a new JWT from the JWT service., referer: https://10.10.25.80:8080
2023-11-28T17:39:39.572673+00:00 <18.3> grantcluster-1(id1) httpd[98700]: [auth_isilon:error] [pid 98700:tid 34421640960] [client 172.16.5.155:62570] (401) Unable to create session., referer: https://10.10.25.80:8080
...SNIP...
2023-11-28T17:39:39.603718+00:00 <18.3> grantcluster-1(id1) httpd[98700]: [auth_isilon:error] [pid 98700:tid 34422848768] [client 172.16.5.155:62559] (STATUS_ACCESS_DENIED (0xC0000022) HTTP error: 401) Failed issuing a new JWT from the JWT service., referer: https://10.10.25.80:8080
2023-11-28T17:39:39.603728+00:00 <18.3> grantcluster-1(id1) httpd[98700]: [auth_isilon:error] [pid 98700:tid 34422848768] [client 172.16.5.155:62559] (401) Unable to create session., referer: https://10.10.25.80:8080
HTTP Access Log
tail -f /var/log/apache2/webui_httpd_access.log
2023-11-28T17:41:43.101276+00:00 <19.6> grantcluster-1(id1) httpd[98697]: 172.16.5.155 - - [28/Nov/2023:17:41:43 +0000] "POST /session/1/session HTTP/1.1" 401 40 "https://10.10.25.80:8080" "python-requests/2.28.1"
REST API Response
Total Successful Sessions: 0
Authentication Failed: Status Code 401, Error: Unable to create session.
While this is accurate, technically, it is extremely misleading and has lead to a substantial waste of resources investigating authentication failures when in reality the problem is that concurrent sessions was exceeded.
Expected Behavior
The errors in Problem Details are misleading to both technicians and users. If the number of concurrent sessions is exceeded both the logs and the API responses should reflect that the issue is that the concurrent sessions have been exceeded instead of reporting an authentication error even if generating JWT tokens is the actual product of exceeding concurrent sessions.
The error message should make it so technicians resolve the problem without having to rely on developer support.
Reproduction
The below Python script will reproduce the problem. Replace the credentials with your PowerScale credentials and then run. It will generate 30 threads each of which will hold a session open for 10 seconds. If the number of concurrent sessions is below 30 it will fail.
import requests
import threading
import time
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def session_cookie_authentication(ip_address, username, password):
"""
Authenticate using Session Cookie and return the session cookies if successful.
"""
base_url = f"https://{ip_address}:8080"
session_url = f"{base_url}/session/1/session"
credentials = {"username": username, "password": password, "services": ["platform", "namespace"]}
headers = {"Content-Type": "application/json", "Referer": base_url}
try:
response = requests.post(session_url, headers=headers, json=credentials, verify=False)
if response.status_code == 201 and 'isisessid' in response.cookies:
return response.cookies
else:
error_message = response.json().get('message', 'No detailed error message provided.')
return f"Authentication Failed: Status Code {response.status_code}, Error: {error_message}"
except Exception as e:
return f"Error during Session Cookie Authentication: {e}"
def create_and_hold_session(ip_address, username, password, hold_time, results):
"""
Create a session and hold it open for a specified duration.
"""
session_result = session_cookie_authentication(ip_address, username, password)
if isinstance(session_result, requests.cookies.RequestsCookieJar):
time.sleep(hold_time) # Hold the session
results.append("Session created and held successfully.")
else:
results.append(session_result)
def main():
ip_address = "10.10.25.80" # Replace with the actual IP address of the PowerScale
username = "root"
password = "YOUR_PASSWORD"
hold_time = 10 # Hold time in seconds
session_threads = []
results = []
# Create 30 concurrent sessions
for _ in range(30):
thread = threading.Thread(target=create_and_hold_session, args=(ip_address, username, password, hold_time, results))
thread.start()
session_threads.append(thread)
# Wait for all threads to complete
for thread in session_threads:
thread.join()
# Analyze results and print summary
success_count = results.count("Session created and held successfully.")
print(f"Total Successful Sessions: {success_count}")
error_messages = set([result for result in results if result != "Session created and held successfully."])
for error in error_messages:
print(error)
if __name__ == "__main__":
main()
Demonstration
Confirm concurrent sessions is fixed at 15:
grantcluster-1# isi auth settings global view
Send NTLMv2: No
Space Replacement:
Workgroup: WORKGROUP
Provider Hostname Lookup: disabled
Alloc Retries: 5
User Object Cache Size: 47.68M
On Disk Identity: native
RPC Block Time: Now
RPC Max Requests: 64
RPC Timeout: 30s
Default LDAP TLS Revocation Check Level: none
System GID Threshold: 80
System UID Threshold: 80
Min Mapped Rid: 2147483648
Group UID: 4294967292
Null GID: 4294967293
Null UID: 4294967293
Unknown GID: 4294967294
Unknown UID: 4294967294
Failed Login Delay Time: Now
Concurrent Session Limit: 15
Now we run the above Python script:
C:\Users\grant\AppData\Local\Programs\Python\Python310\python.exe "C:\Users\grant\Documents\code\grantcurell.github.io\docs\PowerScale Failed Authentication\multiple_sessions_test.py"
Total Successful Sessions: 0
Authentication Failed: Status Code 401, Error: Unable to create session.
Change the concurrent sessions to 31:
grantcluster-1# isi auth settings global modify --concurrent-session-limit=31
grantcluster-1# isi auth settings global view
Send NTLMv2: No
Space Replacement:
Workgroup: WORKGROUP
Provider Hostname Lookup: disabled
Alloc Retries: 5
User Object Cache Size: 47.68M
On Disk Identity: native
RPC Block Time: Now
RPC Max Requests: 64
RPC Timeout: 30s
Default LDAP TLS Revocation Check Level: none
System GID Threshold: 80
System UID Threshold: 80
Min Mapped Rid: 2147483648
Group UID: 4294967292
Null GID: 4294967293
Null UID: 4294967293
Unknown GID: 4294967294
Unknown UID: 4294967294
Failed Login Delay Time: Now
Concurrent Session Limit: 31
Rerun the script:
C:\Users\grant\AppData\Local\Programs\Python\Python310\python.exe "C:\Users\grant\Documents\code\grantcurell.github.io\docs\PowerScale Failed Authentication\multiple_sessions_test.py"
Total Successful Sessions: 30
Process finished with exit code 0
Cluster Setup
Rebuild
I hopped on an old cluster I used for testing and ran isi_reformat_node
Initial Setup
These are the settings I used for my build. Since I was building this in a lab I told it to use the internal IP addresses for external as well instead of making them separate sets.
Configuration Item | Value |
---|---|
Cluster name | grantcluster |
Encoding | utf-8 |
int-a netmask | 255.255.255.0 |
int-a IP ranges | { 10.10.25.80-10.10.25.89 } |
int-a IP range | { 10.10.25.80-10.10.25.89 } |
int-a gateway | 10.10.25.1 |
SmartConnect zone name | onefs |
DNS servers | { 10.10.25.120 } |
Search domains | { grant.lan, lan } |
After I joined the nodes together I confirmed they had a quorum:
grantcluster-1# sysctl efs.gmp.has_quorum
efs.gmp.has_quorum: 1
grantcluster-1# sysctl efs.gmp.has_super_block_quorum
efs.gmp.has_super_block_quorum: 1
1 indicates success whereas 0 indicates that there is no quorum. Super Blocks are described here.
Code for Testing Authentication Mechanisms
I used this code to test the different authentication mechanisms to confirm valid credentials.
Concepts
Super Block Quorum
Referred to as efs.gmp.has_super_block_quorum
, is a property that ensures the file system's integrity by requiring more than half of the nodes in the cluster to be available and in agreement over the internal network. This quorum prevents data conflicts, such as conflicting versions of the same file if two groups of nodes become unsynchronized. If a node is unreachable, OneFS will separate it from the cluster, known as splitting. Operations can continue as long as a quorum of nodes remains connected. If the split nodes can reconnect and re-synchronize, they rejoin the majority group in a process known as merging. The superblock quorum status can be checked by connecting to a node via SSH and running the sysctl efs.gmp.has_super_block_quorum
command-line tool as root.