Headline
GHSA-q764-g6fm-555v: Path traversal in spotipy
Summary
If a malicious URI is passed to the library, the library can be tricked into performing an operation on a different API endpoint than intended.
Details
The code Spotipy uses to parse URIs and URLs accepts user data too liberally which allows a malicious user to insert arbitrary characters into the path that is used for API requests. Because it is possible to include ..
, an attacker can redirect for example a track lookup via spotifyApi.track()
to an arbitrary API endpoint like playlists, but this is possible for other endpoints as well.
Before the security advisory feature was enabled on GitHub, I was already in contact with Stéphane Bruckert via e-mail, and he asked me to look into a potential fix.
My recommendation is to perform stricter parsing of URLs and URIs, which I implemented in the patch included at the end of the report. If you prefer, I can also invite you to a private fork of the repository.
PoC
The POC expects SPOTIFY_CLIENT_ID
and SPOTIFY_CLIENT_SECRET
environment variables to be set to authenticate against the API.
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
def main():
spotifyApi = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())
# This URL contains the example playlist ID from the spotify docs, a malicious
# playlist could instead contain a XSS payload in their title. A playlist with
# such a title was also included in the initial report via mail to maintainer.
malicious_spotify_url = 'spotify:track:../playlists/3cEYpjA9oz9GiPac4AsH4n'
# Usage of the track function, expecting to get a non-user-controllable track name
# e.g. for displaying in a website.
# Our modified track uri however makes the library return the name of a playlist which
# may be created by anyone containing anything.
track = spotifyApi.track(malicious_spotify_url)
# Prints:
# 'Name of the track: Spotify Web API Testing playlist'
# A malicious playlist could also have an XSS payload as title, which would result in:
# 'Name of the track: <img src=x onerror=prompt(1)>'
print(f"Name of the track: {track['name']}")
if __name__ == '__main__':
main()
Impact
The impact of this vulnerability depends heavily on what operations a client application performs when it handles a URI from a user and how it uses the responses it receives from the API.
Possible Patch
Caviats of this patch
- The ID parsing functionality now newly raises
ValueError
if it cannot parse an ID, instead of logging a warning or silently passing back whatever it received as input.- WARNING I only adjusted unit tests to expect
ValueError
that didn’t require a valid user session, other tests may also need adjustment
- WARNING I only adjusted unit tests to expect
- Unfortunately, I could not find conclusive documentation on what constitutes a valid Spotify username, but apparently some exist that contain alphanumeric characters, mine just contains numbers and the ones of newly created accounts seem to follow the base-62 scheme. You as developers probably have deeper insight into this, otherwise it probably will have to be discovered via bug reports if additional characters are valid as well.
From 30cf29b16e893dcac974dbd7481fb58a073b853c Mon Sep 17 00:00:00 2001
From: Shaderbug <[email protected]>
Date: Tue, 10 Jan 2023 19:26:18 +0100
Subject: [PATCH] Improve URL and URI handling
---
spotipy/client.py | 61 +++++++++++++++-----
tests/integration/non_user_endpoints/test.py | 6 +-
2 files changed, 49 insertions(+), 18 deletions(-)
diff --git a/spotipy/client.py b/spotipy/client.py
index d7025a9..b094947 100644
--- a/spotipy/client.py
+++ b/spotipy/client.py
@@ -6,6 +6,7 @@ __all__ = ["Spotify", "SpotifyException"]
import json
import logging
+import re
import warnings
import requests
@@ -96,6 +97,29 @@ class Spotify(object):
"US",
"UY"]
+ # Spotify URI scheme defined in [1], and the ID format as base-62 in [2].
+ #
+ # Unfortunately the IANA specification is out of date and doesn't include the new types
+ # show and episode. Additionally, for the user URI, it does not specify which characters
+ # are valid for usernames, so the assumption is alphanumeric which coincidentially are also
+ # the same ones base-62 uses.
+ # In limited manual exploration this seems to hold true, as newly accounts are assigned an
+ # identifier that looks like the base-62 of all other IDs, but some older accounts only have
+ # numbers and even older ones seemed to have been allowed to freely pick this name.
+ #
+ # [1] https://www.iana.org/assignments/uri-schemes/prov/spotify
+ # [2] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+ _regex_spotify_uri = r'^spotify:(?P<type>track|artist|album|playlist|show|episode|user):(?P<id>[0-9A-Za-z]+)$'
+
+ # Spotify URLs are defined at [1]. The assumption is made that they are all
+ # pointing to open.spotify.com, so a regex is used to parse them as well,
+ # instead of a more complex URL parsing function.
+ #
+ # [1] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
+ _regex_spotify_url = r'^(http[s]?:\/\/)?open.spotify.com\/(?P<type>track|artist|album|playlist|show|episode|user)\/(?P<id>[0-9A-Za-z]+)(\?.*)?$'
+
+ _regex_base62 = r'^[0-9A-Za-z]+$'
+
def __init__(
self,
auth=None,
@@ -1940,20 +1964,27 @@ class Spotify(object):
return path
def _get_id(self, type, id):
- fields = id.split(":")
- if len(fields) >= 3:
- if type != fields[-2]:
- logger.warning('Expected id of type %s but found type %s %s',
- type, fields[-2], id)
- return fields[-1].split("?")[0]
- fields = id.split("/")
- if len(fields) >= 3:
- itype = fields[-2]
- if type != itype:
- logger.warning('Expected id of type %s but found type %s %s',
- type, itype, id)
- return fields[-1].split("?")[0]
- return id
+ uri_match = re.search(Spotify._regex_spotify_uri, id)
+ if uri_match is not None:
+ uri_match_groups = uri_match.groupdict()
+ if uri_match_groups['type'] != type:
+ raise ValueError("Unexpected Spotify URI type.")
+ else:
+ return uri_match_groups['id']
+
+ url_match = re.search(Spotify._regex_spotify_url, id)
+ if url_match is not None:
+ url_match_groups = url_match.groupdict()
+ if url_match_groups['type'] != type:
+ raise ValueError("Unexpected Spotify URL type.")
+ else:
+ return url_match_groups['id']
+
+ # Raw identifiers might be passed, ensure they are also base-62
+ if re.search(Spotify._regex_base62, id) is not None:
+ return id
+
+ raise ValueError("Unsupported URL / URI")
def _get_uri(self, type, id):
if self._is_uri(id):
@@ -1962,7 +1993,7 @@ class Spotify(object):
return "spotify:" + type + ":" + self._get_id(type, id)
def _is_uri(self, uri):
- return uri.startswith("spotify:") and len(uri.split(':')) == 3
+ return re.search(Spotify._regex_spotify_uri, uri) is not None
def _search_multiple_markets(self, q, limit, offset, type, markets, total):
if total and limit > total:
diff --git a/tests/integration/non_user_endpoints/test.py b/tests/integration/non_user_endpoints/test.py
index 96ee4da..116e1d9 100644
--- a/tests/integration/non_user_endpoints/test.py
+++ b/tests/integration/non_user_endpoints/test.py
@@ -280,7 +280,7 @@ class AuthTestSpotipy(unittest.TestCase):
try:
self.spotify.track(self.bad_id)
self.assertTrue(False)
- except SpotifyException:
+ except ValueError:
self.assertTrue(True)
def test_show_urn(self):
@@ -296,7 +296,7 @@ class AuthTestSpotipy(unittest.TestCase):
self.assertTrue(show['name'] == 'Heavyweight')
def test_show_bad_urn(self):
- with self.assertRaises(SpotifyException):
+ with self.assertRaises(ValueError):
self.spotify.show("bogus_urn", market="US")
def test_shows(self):
@@ -333,7 +333,7 @@ class AuthTestSpotipy(unittest.TestCase):
self.assertTrue(episode['name'] == '#1 Buzz')
def test_episode_bad_urn(self):
- with self.assertRaises(SpotifyException):
+ with self.assertRaises(ValueError):
self.spotify.episode("bogus_urn", market="US")
def test_episodes(self):
--
2.34.1
Summary
If a malicious URI is passed to the library, the library can be tricked into performing an operation on a different API endpoint than intended.
Details
The code Spotipy uses to parse URIs and URLs accepts user data too liberally which allows a malicious user to insert arbitrary characters into the path that is used for API requests. Because it is possible to include …, an attacker can redirect for example a track lookup via spotifyApi.track() to an arbitrary API endpoint like playlists, but this is possible for other endpoints as well.
Before the security advisory feature was enabled on GitHub, I was already in contact with Stéphane Bruckert via e-mail, and he asked me to look into a potential fix.
My recommendation is to perform stricter parsing of URLs and URIs, which I implemented in the patch included at the end of the report. If you prefer, I can also invite you to a private fork of the repository.
PoC
The POC expects SPOTIFY_CLIENT_ID and SPOTIFY_CLIENT_SECRET environment variables to be set to authenticate against the API.
import spotipy from spotipy.oauth2 import SpotifyClientCredentials
def main(): spotifyApi = spotipy.Spotify(client_credentials_manager=SpotifyClientCredentials())
\# This URL contains the example playlist ID from the spotify docs, a malicious
\# playlist could instead contain a XSS payload in their title. A playlist with
\# such a title was also included in the initial report via mail to maintainer.
malicious\_spotify\_url \= 'spotify:track:../playlists/3cEYpjA9oz9GiPac4AsH4n'
\# Usage of the track function, expecting to get a non-user-controllable track name
\# e.g. for displaying in a website.
\# Our modified track uri however makes the library return the name of a playlist which
\# may be created by anyone containing anything.
track \= spotifyApi.track(malicious\_spotify\_url)
\# Prints:
\# 'Name of the track: Spotify Web API Testing playlist'
\# A malicious playlist could also have an XSS payload as title, which would result in:
\# 'Name of the track: <img src=x onerror=prompt(1)>'
print(f"Name of the track: {track\['name'\]}")
if __name__ == '__main__’: main()
Impact
The impact of this vulnerability depends heavily on what operations a client application performs when it handles a URI from a user and how it uses the responses it receives from the API.
Possible Patch****Caviats of this patch
The ID parsing functionality now newly raises ValueError if it cannot parse an ID, instead of logging a warning or silently passing back whatever it received as input.
- WARNING I only adjusted unit tests to expect ValueError that didn’t require a valid user session, other tests may also need adjustment
Unfortunately, I could not find conclusive documentation on what constitutes a valid Spotify username, but apparently some exist that contain alphanumeric characters, mine just contains numbers and the ones of newly created accounts seem to follow the base-62 scheme. You as developers probably have deeper insight into this, otherwise it probably will have to be discovered via bug reports if additional characters are valid as well.
From 30cf29b16e893dcac974dbd7481fb58a073b853c Mon Sep 17 00:00:00 2001 From: Shaderbug [email protected] Date: Tue, 10 Jan 2023 19:26:18 +0100 Subject: [PATCH] Improve URL and URI handling
spotipy/client.py | 61 ++++++++++++++±---- tests/integration/non_user_endpoints/test.py | 6 ± 2 files changed, 49 insertions(+), 18 deletions(-)
diff --git a/spotipy/client.py b/spotipy/client.py index d7025a9…b094947 100644 — a/spotipy/client.py +++ b/spotipy/client.py @@ -6,6 +6,7 @@ all = ["Spotify", “SpotifyException”]
import json import logging +import re import warnings
import requests @@ -96,6 +97,29 @@ class Spotify(object): "US", “UY”]
Spotify URI scheme defined in [1], and the ID format as base-62 in [2].
Unfortunately the IANA specification is out of date and doesn’t include the new types
show and episode. Additionally, for the user URI, it does not specify which characters
are valid for usernames, so the assumption is alphanumeric which coincidentially are also
the same ones base-62 uses.
In limited manual exploration this seems to hold true, as newly accounts are assigned an
identifier that looks like the base-62 of all other IDs, but some older accounts only have
numbers and even older ones seemed to have been allowed to freely pick this name.
[1] https://www.iana.org/assignments/uri-schemes/prov/spotify
[2] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
_regex_spotify_uri = r’^spotify:(?P<type>track|artist|album|playlist|show|episode|user):(?P<id>[0-9A-Za-z]+)$’
Spotify URLs are defined at [1]. The assumption is made that they are all
pointing to open.spotify.com, so a regex is used to parse them as well,
instead of a more complex URL parsing function.
[1] https://developer.spotify.com/documentation/web-api/#spotify-uris-and-ids
_regex_spotify_url = r’^(http[s]?://)?open.spotify.com/(?P<type>track|artist|album|playlist|show|episode|user)/(?P<id>[0-9A-Za-z]+)(?.*)?$’
_regex_base62 = r’^[0-9A-Za-z]+$’
def init( self, auth=None, @@ -1940,20 +1964,27 @@ class Spotify(object): return path
def _get_id(self, type, id):
fields = id.split(":")
if len(fields) >= 3:
if type != fields[-2]:
logger.warning('Expected id of type %s but found type %s %s',
type, fields[-2], id)
return fields[-1].split("?")[0]
fields = id.split("/")
if len(fields) >= 3:
itype = fields[-2]
if type != itype:
logger.warning('Expected id of type %s but found type %s %s',
type, itype, id)
return fields[-1].split("?")[0]
return id
uri_match = re.search(Spotify._regex_spotify_uri, id)
if uri_match is not None:
uri_match_groups = uri_match.groupdict()
if uri_match_groups['type'] != type:
raise ValueError("Unexpected Spotify URI type.")
else:
return uri_match_groups['id']
url_match = re.search(Spotify._regex_spotify_url, id)
if url_match is not None:
url_match_groups = url_match.groupdict()
if url_match_groups['type'] != type:
raise ValueError("Unexpected Spotify URL type.")
else:
return url_match_groups['id']
# Raw identifiers might be passed, ensure they are also base-62
if re.search(Spotify._regex_base62, id) is not None:
return id
raise ValueError("Unsupported URL / URI")
def _get_uri(self, type, id): if self._is_uri(id): @@ -1962,7 +1993,7 @@ class Spotify(object): return “spotify:” + type + “:” + self._get_id(type, id)
def _is_uri(self, uri):
return uri.startswith("spotify:") and len(uri.split(':')) == 3
return re.search(Spotify._regex_spotify_uri, uri) is not None
def _search_multiple_markets(self, q, limit, offset, type, markets, total): if total and limit > total: diff --git a/tests/integration/non_user_endpoints/test.py b/tests/integration/non_user_endpoints/test.py index 96ee4da…116e1d9 100644 — a/tests/integration/non_user_endpoints/test.py +++ b/tests/integration/non_user_endpoints/test.py @@ -280,7 +280,7 @@ class AuthTestSpotipy(unittest.TestCase): try: self.spotify.track(self.bad_id) self.assertTrue(False)
except SpotifyException:
except ValueError: self.assertTrue(True)
def test_show_urn(self): @@ -296,7 +296,7 @@ class AuthTestSpotipy(unittest.TestCase): self.assertTrue(show[‘name’] == ‘Heavyweight’)
def test_show_bad_urn(self):
with self.assertRaises(SpotifyException):
with self.assertRaises(ValueError): self.spotify.show("bogus_urn", market="US")
def test_shows(self): @@ -333,7 +333,7 @@ class AuthTestSpotipy(unittest.TestCase): self.assertTrue(episode[‘name’] == ‘#1 Buzz’)
def test_episode_bad_urn(self):
with self.assertRaises(SpotifyException):
with self.assertRaises(ValueError): self.spotify.episode("bogus_urn", market="US")
def test_episodes(self): – 2.34.1
References
- GHSA-q764-g6fm-555v
Related news
Spotipy is a light weight Python library for the Spotify Web API. In versions prior to 2.22.1, if a malicious URI is passed to the library, the library can be tricked into performing an operation on a different API endpoint than intended. The code Spotipy uses to parse URIs and URLs allows an attacker to insert arbitrary characters into the path that is used for API requests. Because it is possible to include "..", an attacker can redirect for example a track lookup via spotifyApi.track() to an arbitrary API endpoint like playlists, but this is possible for other endpoints as well. The impact of this vulnerability depends heavily on what operations a client application performs when it handles a URI from a user and how it uses the responses it receives from the API. This issue is patched in version 2.22.1.