Extractors

Extractor components are responsible for parsing subdomain addresses from any Content object

The extractor components already implemented in Subscan are as follows

HTMLExtractor

Extracts subdomain addresses from inner text by given XPath or CSS selector
JSONExtractor

Extracts subdomain addresses from JSON content. JSON parsing function must be given for this extractor
RegexExtractor

Regex extractor component generates subdomain pattern by given domain address and extracts subdomains via this pattern

Create Your Custom Extractor

Each extractor component should be implemented following the interface below. For a better understanding, you can explore the docs.rs page and review the crates listed below

#[async_trait]
#[enum_dispatch]
pub trait SubdomainExtractorInterface: Send + Sync {
    // Generic extract method, it should extract subdomain addresses
    // from given Content
    async fn extract(&self, content: Content, domain: &str) -> Result<BTreeSet<Subdomain>>;
}

Below is a simple example of a custom extractor. For more examples, you can check the examples/ folder on the project's GitHub page. You can also refer to the source code of predefined requester implementations for a better understanding

pub struct CustomExtractor {}

#[async_trait]
impl SubdomainExtractorInterface for CustomExtractor {
    async fn extract(&self, content: Content, _domain: &str) -> Result<BTreeSet<Subdomain>> {
        let subdomain = content.as_string().replace("-", "");

        Ok([subdomain].into())
    }
}

Keyboard shortcuts

subscan

Extractors

Create Your Custom Extractor