Headline

CVE-2023-46229: Add security note to recursive url loader (#11934) · langchain-ai/langchain@9ecb724

LangChain before 0.0.317 allows SSRF via document_loaders/recursive_url_loader.py because crawling can proceed from an external server to an internal server.

1 year ago

CVE

Open in Source

#web #ssrf

Expand Up

@@ -49,7 +49,36 @@ def _metadata_extractor(raw_html: str, url: str) -> dict:

class RecursiveUrlLoader(BaseLoader):

“""Load all child links from a URL page.""”

"""Load all child links from a URL page.

**Security Note**: This loader is a crawler that will start crawling

at a given URL and then expand to crawl child links recursively.

Web crawlers should generally NOT be deployed with network access

to any internal servers.

Control access to who can submit crawling requests and what network access

the crawler has.

While crawling, the crawler may encounter malicious URLs that would lead to a

server-side request forgery (SSRF) attack.

To mitigate risks, the crawler by default will only load URLs from the same

domain as the start URL (controlled via prevent_outside named argument).

This will mitigate the risk of SSRF attacks, but will not eliminate it.

For example, if crawling a host which hosts several sites:

https://some_host/alice_site/

https://some_host/bob_site/

A malicious URL on Alice’s site could cause the crawler to make a malicious

GET request to an endpoint on Bob’s site. Both sites are hosted on the

same host, so such a request would not be prevented by default.

See https://python.langchain.com/docs/security

“"”

def __init__(

self,

Expand All

@@ -60,12 +89,13 @@ def __init__(

metadata_extractor: Optional[Callable[[str, str], str]] = None,

exclude_dirs: Optional[Sequence[str]] = (),

timeout: Optional[int] = 10,

prevent_outside: Optional[bool] = True,

prevent_outside: bool = True,

link_regex: Union[str, re.Pattern, None] = None,

headers: Optional[dict] = None,

check_response_status: bool = False,

) -> None:

"""Initialize with URL to crawl and any subdirectories to exclude.

Args:

url: The URL to crawl.

max_depth: The max depth of the recursive loading.

Expand Down

Cybersecurity researchers are warning about the security risks in the machine learning (ML) software supply chain following the discovery of more than 20 vulnerabilities that could be exploited to target MLOps platforms. These vulnerabilities, which are described as inherent- and implementation-based flaws, could have severe consequences, ranging from arbitrary code execution to loading

10 months ago

The Hacker News

Open in Source

#xss #vulnerability #web #mac #dos #java #intel #auth #The Hacker News