license-analyzer

SPDX license identification using hashes, fingerprints, and semantic similarity.
Supports command-line usage as well as Python module integration.

📦 Installation

pip install license-analyzer

To install from source (e.g., for development):

git clone https://github.com/envolution/license-analyzer.git
cd license-analyzer
pip install .

🚀 Command-Line Usage

Once installed, the CLI tool is available as:

license-analyzer [OPTIONS] FILE [FILE...]

🔧 Common Options

Option	Description
`--top-n N`	Return top N matches per file. If omitted, returns all matches tied for highest score.
`--format {text,json,csv}`	Output format. Default is `text`.
`--min-score FLOAT`	Filter out matches with a score below this threshold (default: `0.0`).
`--spdx-dir DIR`	Path to SPDX license text files. Defaults to `~/.cache/license-analyzer/spdx/text`.
`--cache-dir DIR`	Path to cache directory for license database.
`--embedding-model NAME`	SentenceTransformer model (default: `all-MiniLM-L6-v2`).
`--update, -u`	Force update of SPDX license data from GitHub.
`--verbose, -v`	Show progress and debug logs.

📄 Examples

Basic usage

license-analyzer LICENSE

Multiple files

license-analyzer license1.txt license2.txt

JSON output with top 3 matches

license-analyzer --format json --top-n 3 LICENSE

Force SPDX update

license-analyzer --update

🐍 Python Module Usage

You can also use license-analyzer directly in your Python code:

from license_analyzer.core import LicenseAnalyzer

analyzer = LicenseAnalyzer()
matches = analyzer.analyze_file("LICENSE")

for match in matches:
    print(match.name, match.score, match.method)

Or, if you want to analyze text (rather than a file):

text = open("LICENSE").read()
matches = analyzer.analyze_text(text)

for match in matches:
    print(match.name, match.score, match.method)

Use top_n=None to get all tied top-scoring matches:

matches = analyzer.analyze_text(text, top_n=None)

📈 Output Format (CLI)

Text (default)

Analysis results for: LICENSE
------------------------------------------------------------
MIT                            score: 1.0000  method: sha256

JSON

{
  "LICENSE": [
    {
      "name": "MIT",
      "score": 1.0,
      "method": "sha256"
    }
  ]
}

CSV

file_path,license_name,score,method
"LICENSE","MIT",1.0,"sha256"

🔄 Updating SPDX License Data

By default, license data is stored under:

~/.cache/license-analyzer/spdx

To update the SPDX license texts (from GitHub):

license-analyzer --update

This refreshes cached licenses and triggers database rebuild if needed.

🧠 Matching Strategies

✅ SHA256 Hash Match
✅ Canonical Fingerprint Match
✅ Semantic Embedding Match (via sentence-transformers)

📝 License

SPDX-License-Identifier: Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
contrib		contrib
license_analyzer		license_analyzer
tests		tests
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

license-analyzer

📦 Installation

🚀 Command-Line Usage

🔧 Common Options

📄 Examples

Basic usage

Multiple files

JSON output with top 3 matches

Force SPDX update

🐍 Python Module Usage

📈 Output Format (CLI)

Text (default)

JSON

CSV

🔄 Updating SPDX License Data

🧠 Matching Strategies

📝 License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

license-analyzer

📦 Installation

🚀 Command-Line Usage

🔧 Common Options

📄 Examples

Basic usage

Multiple files

JSON output with top 3 matches

Force SPDX update

🐍 Python Module Usage

📈 Output Format (CLI)

Text (default)

JSON

CSV

🔄 Updating SPDX License Data

🧠 Matching Strategies

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages