import torch
import torch.nn as nn
import torch.nn.functional as F
class SAE(nn.Module):
def __init__(self, D, K):
super().__init__()
self.encoder = nn.Linear(D, K)
self.decoder = nn.Linear(K, D, bias=True)
def forward(self, x):
z = F.relu(self.encoder(x))
x_hat = self.decoder(z)
return x_hat, zSparse Autoencoders
Readings (up to “Feature Surveys”) Code, Conda Env, Helpers
Items marked \(^{\dagger}\) will not be tested.
Setup
Goal. Expand the dimensionality of an LLM’s activation space to disentangle learned representations into interpretable components. For any observation’s activations \(x_n\) (at a given layer), we want the decomposition, \[ x_n \approx c + \sum_{\text{few } k} z_{nk}\, v_k \] where \(v_1, \dots, v_K\) are called atoms (dictionary elements) and \(z_{nk} \geq 0\) are sparse mixing weights.
Requirements. The method has to scale to realistic LLM models and datasets, and it shouldn’t involve too much manual labeling effort.
Motivation. Concept activation vectors require curated examples for each concept – they are a confirmatory technique. We want an exploratory method that automatically discovers features. For example, in fairness and safety audiets, the the most important features aren’t known in advance. Further, since LLMs are generative, we need do more than explain predictions – we need to “steer” model outputs towards (or away from) certain concepts.
Preview.
The technique discovers non-obvious features. One learned atom activates strongly on Golden Gate Bridge content, across languages and modalities.
One of the features has high score \(z_{nk}\) on tokens related to the Golden Gate Bridge. Those features can steer generation.
Sparse Autoencoder Model
Individual neurons. Earlier work described individual neurons by finding their highest-activating examples. E.g., some lower-layer neurons act as edge detectors.
Neurons in a CNN that activate for particular edge orientations, from Zeiler and Fergus (2014). However, in higher layers, individual neurons can play multiple roles depending on context (the “superposition hypothesis”) so single-neuron analysis fails.
Dimensionality reduction. We could apply dimensionality reduction methods (e.g., PCA or UMAP) to the learned activations and then interpret the resulting plots.
Johnson et al. (2016) applied dimensionality reduction to a translation model’s activations. Each color is one sentence translated into many languages. The model places sentences with the same meaning next to one another regardless of language. Dimensionality reduction represents \(x_n\) using a lower-dimensional \(z_n\). Unfortunately, this is not so useful at the scale of modern LLMs. The number of distinguishable concepts in the training data is much larger than the embedding dimension (e.g., the layer in GPT-3 similar to the one studied in the reading has 12K dimensions, but the internet contains many more than 12K concepts…). We need to increase, not reduce, dimensionality.
Notation. Fix a layer \(l\). The SAE is trained on token activations. Each training example is the activation of a single token within a particular context; index these by \(n = 1, \dots, N\).
- \(x_n \in \mathbb{R}^{D}\): The token activation at index \(n\).
- \(v_1, \dots, v_K \in \mathbb{R}^D\): learned atoms, representing the underlying features. Typically \(K \gg D\). We will stack these columnwise into \(V \in \reals^{D \times K}\).
- \(z_n \in \mathbb{R}^{K}\): sparse, nonnegative weights giving the contribution of feature \(k\) to token \(n\).
The SAE model represents \(x_n\) using a higher-dimensional sparse \(z_n\). Architecture. The sparse autoencoder (SAE) has two components1,
- Encoder: \(\quad z_n = g(Wx_n + b)\), where \(g(u) = u \cdot \mathbf{1}[u > 0]\) is the ReLU.
- Decoder: \(\quad \hat{x}_n = Vz_n + c\).
Matrix representation of the encoder. Matrix representation of the decoder. Each activation \(x_n\) is a sparse, nonnegative mixture of learned atoms \(v_k\), \[ x_n \approx \hat{x}_n = c + \sum_{k=1}^{K} z_{nk}\, v_k \] (\(z_{nk}\) is scalar, \(v_k \in \mathbb{R}^D\)).
Exercise: In the reading, what choices of \(K\) are considered?
Exercise: Why are the weights \(z_{nk}\) nonnegative?
A minimal implementation.
Training objective. The parameters \(W \in \mathbb{R}^{K \times D}\), \(V \in \mathbb{R}^{D \times K}\), \(b \in \mathbb{R}^{K}\), \(c \in \mathbb{R}^{D}\) are learned by minimizing
\[ \frac{1}{N}\sum_{n = 1}^{N}\left[\|x_n - \hat{x}_n\|_{2}^{2} + \lambda \sum_{k} z_{nk} \|v_k\|_{2}\right]. \tag{1}\] Note that the reconstruction term expands, \[\begin{align} \|x_n - \hat{x}_n\|_{2}^{2} = \|x_n - Vz_n - c\|_{2}^{2} = \|x_n - V g\left(W x_n + b\right) - c\|_2^2 \end{align}\] so depends on \(W,V, b\), and \(c\).
def sae_loss(x, x_hat, z, decoder, lam):
recon = F.mse_loss(x, x_hat)
col_norms = decoder.weight.norm(dim=0) # ||v_k||_2 for each k
penalty = (z * col_norms).sum(dim=1).mean()
return recon + lam * penaltyThe penalty is essentially an \(\ell^{1}\) norm. Since \(z_{nk} \geq 0\) and \(\|v_k\|_2 \geq 0\), setting \(\alpha_k = z_{nk}\|v_k\|_2\) gives \[ \lambda \sum_k z_{nk}\|v_k\|_2 = \lambda \sum_k |\alpha_k| \] This is a lasso-type penalty on the product of feature length and mixing weight. It will encourage most \(z_{nk}\) to be exactly zero, so each \(\sum_k z_{nk} v_k\) involves only a few nonzero terms even though \(K\) is large.
Overcomplete basis. When \(K > D\), the dictionary is called overcomplete. Sparse combinations of its atoms can represent geometric structure that no \(D\)-dimensional linear subspace can capture.
Data lying on a low-dimensional subspace An overcomplete basis supports more complex data structure. Exercise: Compare and contrast the atoms \(v_k\) here with the concept vectors \(v_c^l\) from the Concept Activation Vectors notes.
Exercise: What is the pattern of zeros in \(z_n\) for the tokens labeled \(A\), \(B\), and \(C\)? What is the relationship between \(z_n\) for point \(B\) vs. \(C\)?
Synthetic Example
- We give an example using synthetic data from two ground-truth atoms \(v_k\).
from sae_helpers import generate_v_true, generate_z_true
N = 500
v_true = generate_v_true()
K_true = v_true.shape[1]
z_true = generate_z_true(K_true=K_true, N=N)
X = v_true @ z_true + np.random.randn(2, N) * 0.05 # 2 x NWe deliberately misspecify \(K\) (3 instead of 2) to test robustness.
D, K = 2, 3
model = SAE(D, K)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
X_tensor = torch.tensor(X.T, dtype=torch.float32) # N x 2
lam = 0.5
for epoch in range(1000):
x_hat, z = model(X_tensor)
loss = sae_loss(X_tensor, x_hat, z, model.decoder, lam)
optimizer.zero_grad()
loss.backward()
optimizer.step()We extract the learned atoms and compare with ground truth.
V_learned = model.decoder.weight.detach().numpy() # D x KExercise: How would you compare the true and learned \(z_{nk}\)?
Scaling Laws
Training large SAEs is computationally expensive. Fixing all other training hyperparameters, compute cost scales like \(C = \eta T K\) where \(T\) is the number of gradient steps, \(K\) is the number of atoms, and \(\eta\) is a factor absorbing all other hyperparameter effects. Two questions,
- For a fixed \(C\), how should we allocate budget across \(T\) and \(K\)?
- For a proposed increase in \(C\), how much would training loss Equation 1 drop?
A scaling law analysis sweeps \(\left(T, K\right)\) across smaller compute budgets to answer both.
Consider budgets \(C_1 < \dots < C_M\). For each \(m\), evaluate \(L\) pairs \(\left(T_m^{l}, K_m^{l}\right)_{l = 1}^{L}\) along the IsoFLOP contour \(T_{m}^{l}K_m^l = C_m\). For a given \(m\), denote the loss-minimizing pair \(\left(T^\ast\left(C_m\right), K^\ast\left(C_m\right)\right)\).
The reading doesn’t label the axes of its scaling law plots (unfortunately) but we can see the loss against \(C_m\) (across curves) and each IsoFLOP contour (within a curve).
Loss against compute budget in the scaling law experiments. The optimal allocations are modeled well using power laws (i.e., linear in log-log space),
\[ \log T^\ast(C) \approx \beta_0 + \beta_1 \log C, \qquad \log K^\ast(C) \approx \alpha_0 + \alpha_1 \log C. \]
Given a new target \(C\), train the SAE with \(\hat{T}\left(C\right)\) steps and \(\hat{K}\left(C\right)\) features. Note that this approach doesn’t enforce \(C = \hat{T}\left(C\right)\hat{K}\left(C\right)\). For constrained alternative, see Approach 3 in Hoffmann et al. (2022).
Automatic Interpretability
Manual annotation isn’t possible with millions of features. Instead, the current best practice trains an explainer LLM, then evaluates it.
Explainer training. For each feature \(k\),
- Bin the positive activations \(\{z_{nk} : z_{nk} > 0\}\) across quantiles, and randomly sample the indices \(n\) from each bin. Call the result \(D_k^{\text{explain}}\), with source contexts \(y_n\).
- Prompt the explainer LLM using \((y_n, z_{nk})_{n \in D_k^{\text{explain}}}\) and ask for a natural-language description \(d_k\).
The paper doesn’t publish its prompt, but related peer-reviewed papers do. For example, from Paulo et al. (2025) Appendix B,
You are a meticulous AI researcher conducting an important investigation into patterns found in language. Your task is to analyze text and provide an interpretation that thoroughly encapsulates possible patterns found in it. Guidelines: You will be given a list of text examples on which special words are selected and between delimiters like << this >>. If a sequence of consecutive tokens all are important, the entire sequence of tokens will be contained between delimiters <<just like this>>. How important each token is for the behavior is listed after each example in parentheses. - Try to produce a concise final description. Simply describe the text features that are common in the examples, and what patterns you found. - If the examples are uninformative, you don’t need to mention them. Don’t focus on giving examples of important tokens, but try to summarize the patterns found in the examples. - Do not mention the marker tokens ($<<$ $>>$) in your interpretation. - Do not make lists of possible interpretations. Keep your interpretations short and concise. - The last line of your response must be the formatted interpretation, using [interpretation]:Simon and Zou (2025) Appendix E1, for SAE features from a protein language model,
Analyze this protein dataset to determine what predicts the ’Maximum activation value’ and ‘Amino acids of highest activated indices in protein’ columns. This description should be as concise as possible but sufficient to predict these two columns on held-out data given only the description and the rest of the protein metadata provided. The feature could be specific to a protein family, a structural motif, a sequence motif, a functional role, etc. These WILL be used to predict how much unseen proteins are activated by the feature so only highlight relevant factors for this. Focus on: * Properties of proteins from the metadata that are associated with high vs medium vs low activation. * Where in the protein sequence activation occurs (in relation to the protein sequence, length, structure, or other properties) * What functional annotations (binding sites, domains, etc.) and amino acids are present at or near the activated positions * This description that will be used to help predict missing activation values * should start with “The activation patterns are characterized by:” Then, in 1 sentence, summarize what biological feature or pattern this neural network activation is detecting. This concise summary should start with"“The feature activates on" Protein record: `Insert table with Swiss-Prot metadata and activation levels`Explainer evaluation. Gather new samples \(D_{k}^{\text{eval}}\). Prompt the LLM with \(y_n\) and \(d_k\) and ask it to predict \(z_{nk}\) (or \(\mathbb{1}\{z_{nk} > 0\}\)). Overall quality summarized with \(\mathrm{Cor}\left(z_{nk}, \hat{z}_{nk}\right)\) over \(n \in D_{k}^{\text{eval}}\). Here are the evaluation prompts: Paulo et al. (2025)
You are an intelligent and meticulous linguistics researcher. You will be given a certain feature of text, such as "male pronouns" or "text with negative sentiment". You will then be given several text examples. Your task is to determine which examples possess the feature. For each example in turn, return 1 if the sentence is correctly labeled or 0 if the tokens are mislabeled. You must return your response in a valid Python list. Do not return anything else besides a Python list.Simon and Zou (2025),
Given this protein metadata record, feature description, and empty table with query proteins, fill out the query table indicating the maximum feature activation value within in each protein (0.0-1.0). Base activation value on how well the protein matches the described patterns. There could be 0, 1 or multiple separate instances of activation in a protein and each activation could span 1 or many amino acids. Output only these values in the provided table starting with ”Entry,Maximum activation value”. Respond with nothing but this table. Protein record: Insert table with Swiss-Prot metadata Table to fill out with query proteins: Insert empty table of IDs to fill out with predictions The activation patterns are characterized by: Insert LLM description
Evaluation, Visualization, and Control
The SAE gives features \(k\) and automated interpretability gives descriptions \(d_k\). Neither tells us whether the features are good.
- Specificity: When \(z_{nk}\) is large, is the token context related to \(d_k\)?
- Completeness: For a finite set of concepts of interest, how many are reflected by an SAE feature?
Specificity. For a fixed \(k\),
- Bin \(\{z_{nk} : z_{nk} > 0\}\) as before and randomly sample \(n\) across bins.
- Prompt an LLM with \(y_n\) and \(d_k\), asking for a Likert score \(s_{nk} \in \{0, 1, 2, 3\}\), where 0 = irrelevant, \(\dots\), 3 = clearly related.
- Plot histograms of \(z_{nk}\), colored by \(s_{nk}\).
Now we can understand where the colors came from. Completness. Given a concept \(c\) in text (e.g, \(c = \text{"London Borough of Southwark"}\))
- Pass \(c\) through the LLM and get the activations \(x_{n}\) from its final token.
- Compute \(z_{nk}\) and let \(\mathcal{S} \subset \{1, \dots, K\}\) index the largest of them (the reading uses \(\left|S\right| = 5\)).
- Ask a human rater to judge whether any of the \(\left(d_{k}\right)_{k \in \mathcal{S}}\) are related to \(c\)
We can apply this to finite concept sets (e.g., the boroughs of London or elements of the periodic table, etc.) and see what fraction are covered. Empirically, coverage drops with concept frequency. Concepts that rarely appear in the training data require larger \(K\) to discover.
Coverage of the periodic table concept set. This also shows that a feature can be present in an LLM (e.g., it can be prompted to describe a particular London borough) even if it isn’t associated with a feature in the SAE.
Visualization To survey the landscape of features, apply UMAP to the columns of \(V \in \reals^{D \times K}\). Embedding all \(K\) columns at once would result in overplotting. Instead, fix a feature \(k\), take a neighborhood \(B\left(k\right)\) of \(v_k\), and embed only \(V_{B(k)}\).
For other feature neighborhoods, see this app. Feature splitting is the finding that when \(K\) grows, a single feature can split into coherent sub-features. A San Francisco in the \(K = 1\mathrm{M}\) model splits several more fine-grained at \(K = 34\mathrm{M}\).
Code Example
we’ll use an SAE to analyze how the final hidden layer of a GPTNeo model organizes articles from the fineweb-edu dataset. This hidden layer is 768-dimensional, but analyzing individual neurons is not an efficient way to work. We will find that looking at the learned dictionary atoms associated with this layer’s activations are much more interesting.
The libraries below link to data and models in huggingface. They are already included in the iisa312 environment, defined in this yaml file, which can be installed with
conda env create -f environment-iisa312.yamlafter downloading.
from datasets import load_dataset
from transformers import AutoTokenizer, GPTNeoModel
import torch
import numpy as np
np.random.seed(20241230)- This defines a data loader for the fineweb-edu dataset. This is a 7.5TB dataset, so we’ll only try working with a streaming version, which allows us to read a few articles at a time (we’ll be looking at a tiny fraction of the original data, but it will be enough to see some interesting structure).
fw = load_dataset("HuggingFaceFW/fineweb-edu", name="CC-MAIN-2024-10", split="train", streaming=True)- Let’s save 2500 articles on which to extract activations. You can see the first 200 characters of the raw text from a few articles below. They are all somewhat academic in style, but they range quite dramatically in the topics they discuss.
n_stream = 2500
texts = []
for x in fw:
texts.append(x["text"])
if len(texts) > n_stream: break
[f"{s[:200]}..." for s in texts[:10]]['- It means objects are Garbage Collected more quickly. (Incorrect).\n- Its a good way to make sure all your references are set to null. (Not necessary).\n- Its good practice to implement all the time. (...', 'CANUSWEST and CANUSWEST North were developed to assist Federal, State/Provincial, local, and Tribal/Aboriginal responders to mitigate the effects of oil and hazardous materials spills on human health ...', '– Computer viruses are parasitic programs which are able to replicate themselves, attach themselves to other executables in the computer, and perform some unwanted and often malicious actions. A virus...', 'For those unfamiliar with Cornish, it is classed as a p-Celtic member of the family of Celtic languages, which was once spoken across much of Europe, and is now restricted to the insular world and Bri...', 'Democracy is in trouble. No matter what index you look at, the number of countries rated as being fully democratic has declined dramatically over the last twenty years. Worryingly, this trend shows no...', 'Our cultural identity: Experience the culture and heritage of Cyprus Course Description Culture has the power to transform entire societies, strengthen local communities and forge a sense of identity ...', '“The more you empower kids, the more they can do,” said one Providence actor after working with Rhode Island public school students in the Arts/ Literacy Project, based at Brown University’s education...', 'Rhetorical analysis is not for the faint of heart. It’s for teachers and instructors who don’t mind students feeling uncomfortable enough to take a risk. Rhetorical analysis has changed everything for...', 'Sport plays an important role in the educational process since the TPS when the child’s need for movement is answered by daily activities within the school.\nTPS and PS practice sport with their teache...', 'There are a large number of students who have difficulty learning material using traditional teaching methods. Learning disabilities vary from mild forms such as attention deficit disorder to more sev...']
- The block below extracts embeddings from the final hidden state (.hidden_states[-1]) in a GPTNeo model. Notice that we’re averaging the hidden dimension across all tokens in the text. In theory, we could analyze activations within smaller stretches of text, but we are aiming more for simplicity than completeness.
def extract_embeddings(text, model, tokenizer):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=False)
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
return outputs.hidden_states[-1].mean(axis=(0, 1))
# Load pre-trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125m")
model = GPTNeoModel.from_pretrained("EleutherAI/gpt-neo-125m")
Loading weights: 0%| | 0/160 [00:00<?, ?it/s]
Loading weights: 100%|##########| 160/160 [00:00<00:00, 17835.29it/s]
model = model.eval()- The block below applies
extract_embeddingsto all the articles we downloaded above. If it’s already been saved, it will load it from thesave_pathdirectory.
import torch
from pathlib import Path
from tqdm import tqdm
save_path = Path("../data/fineweb_embeddings.pt")
if save_path.exists():
X_tensor = torch.load(save_path, map_location="cpu")
else:
embeddings = []
for text in tqdm(texts):
embeddings.append(extract_embeddings(text, model, tokenizer).cpu())
X_tensor = torch.stack(embeddings).float()
torch.save(X_tensor, save_path)- Next fit the SAE defined earlier in the notebook. The encoder activations have columns \(z_{nk}\) and allow us to study the high-activation articles for each \(k\).
K = 500
N, D = X_tensor.shape
sae = SAE(D, K)
optimizer = torch.optim.Adam(sae.parameters(), lr=1e-3)
lam = 0.0005
for epoch in tqdm(range(1000)):
x_hat, z = sae(X_tensor)
loss = sae_loss(X_tensor, x_hat, z, sae.decoder, lam)
optimizer.zero_grad()
loss.backward()
optimizer.step()
0%| | 0/1000 [00:00<?, ?it/s]
1%| | 6/1000 [00:00<00:19, 50.94it/s]
1%|1 | 12/1000 [00:00<00:18, 53.22it/s]
2%|1 | 18/1000 [00:00<00:18, 53.55it/s]
2%|2 | 24/1000 [00:00<00:18, 53.74it/s]
3%|3 | 30/1000 [00:00<00:18, 53.70it/s]
4%|3 | 36/1000 [00:00<00:17, 53.99it/s]
4%|4 | 42/1000 [00:00<00:17, 54.22it/s]
5%|4 | 48/1000 [00:00<00:17, 54.55it/s]
5%|5 | 54/1000 [00:01<00:17, 54.31it/s]
6%|6 | 60/1000 [00:01<00:17, 54.35it/s]
7%|6 | 66/1000 [00:01<00:17, 54.27it/s]
7%|7 | 72/1000 [00:01<00:17, 54.26it/s]
8%|7 | 78/1000 [00:01<00:17, 54.17it/s]
8%|8 | 84/1000 [00:01<00:16, 54.02it/s]
9%|9 | 90/1000 [00:01<00:16, 54.19it/s]
10%|9 | 96/1000 [00:01<00:16, 54.50it/s]
10%|# | 102/1000 [00:01<00:16, 54.57it/s]
11%|# | 108/1000 [00:01<00:16, 54.34it/s]
11%|#1 | 114/1000 [00:02<00:16, 54.29it/s]
12%|#2 | 120/1000 [00:02<00:16, 54.43it/s]
13%|#2 | 126/1000 [00:02<00:16, 54.32it/s]
13%|#3 | 132/1000 [00:02<00:15, 54.55it/s]
14%|#3 | 138/1000 [00:02<00:15, 54.06it/s]
14%|#4 | 144/1000 [00:02<00:15, 54.28it/s]
15%|#5 | 150/1000 [00:02<00:15, 54.24it/s]
16%|#5 | 156/1000 [00:02<00:15, 54.47it/s]
16%|#6 | 162/1000 [00:02<00:15, 54.41it/s]
17%|#6 | 168/1000 [00:03<00:15, 54.00it/s]
17%|#7 | 174/1000 [00:03<00:15, 54.16it/s]
18%|#8 | 180/1000 [00:03<00:15, 53.29it/s]
19%|#8 | 186/1000 [00:03<00:15, 53.63it/s]
19%|#9 | 192/1000 [00:03<00:15, 53.77it/s]
20%|#9 | 198/1000 [00:03<00:14, 53.65it/s]
20%|## | 204/1000 [00:03<00:14, 53.77it/s]
21%|##1 | 210/1000 [00:03<00:14, 54.05it/s]
22%|##1 | 216/1000 [00:03<00:14, 54.43it/s]
22%|##2 | 222/1000 [00:04<00:14, 53.98it/s]
23%|##2 | 228/1000 [00:04<00:14, 54.03it/s]
23%|##3 | 234/1000 [00:04<00:14, 54.09it/s]
24%|##4 | 240/1000 [00:04<00:14, 53.80it/s]
25%|##4 | 246/1000 [00:04<00:13, 54.01it/s]
25%|##5 | 252/1000 [00:04<00:13, 53.99it/s]
26%|##5 | 258/1000 [00:04<00:13, 54.13it/s]
26%|##6 | 264/1000 [00:04<00:13, 54.23it/s]
27%|##7 | 270/1000 [00:04<00:13, 54.61it/s]
28%|##7 | 276/1000 [00:05<00:13, 54.67it/s]
28%|##8 | 282/1000 [00:05<00:13, 54.62it/s]
29%|##8 | 288/1000 [00:05<00:13, 54.29it/s]
29%|##9 | 294/1000 [00:05<00:12, 54.33it/s]
30%|### | 300/1000 [00:05<00:12, 54.29it/s]
31%|### | 306/1000 [00:05<00:12, 54.15it/s]
31%|###1 | 312/1000 [00:05<00:12, 54.52it/s]
32%|###1 | 318/1000 [00:05<00:12, 54.21it/s]
32%|###2 | 324/1000 [00:05<00:12, 53.88it/s]
33%|###3 | 330/1000 [00:06<00:12, 53.92it/s]
34%|###3 | 336/1000 [00:06<00:12, 53.44it/s]
34%|###4 | 342/1000 [00:06<00:12, 53.19it/s]
35%|###4 | 348/1000 [00:06<00:12, 53.10it/s]
35%|###5 | 354/1000 [00:06<00:12, 53.16it/s]
36%|###6 | 360/1000 [00:06<00:11, 53.58it/s]
37%|###6 | 366/1000 [00:06<00:11, 53.61it/s]
37%|###7 | 372/1000 [00:06<00:11, 53.75it/s]
38%|###7 | 378/1000 [00:06<00:11, 53.90it/s]
38%|###8 | 384/1000 [00:07<00:11, 54.01it/s]
39%|###9 | 390/1000 [00:07<00:11, 53.92it/s]
40%|###9 | 396/1000 [00:07<00:11, 53.50it/s]
40%|#### | 402/1000 [00:07<00:11, 53.59it/s]
41%|#### | 408/1000 [00:07<00:11, 53.73it/s]
41%|####1 | 414/1000 [00:07<00:11, 52.79it/s]
42%|####2 | 420/1000 [00:07<00:10, 53.16it/s]
43%|####2 | 426/1000 [00:07<00:10, 53.42it/s]
43%|####3 | 432/1000 [00:08<00:10, 53.56it/s]
44%|####3 | 438/1000 [00:08<00:10, 53.00it/s]
44%|####4 | 444/1000 [00:08<00:10, 53.27it/s]
45%|####5 | 450/1000 [00:08<00:10, 53.35it/s]
46%|####5 | 456/1000 [00:08<00:10, 53.22it/s]
46%|####6 | 462/1000 [00:08<00:10, 53.32it/s]
47%|####6 | 468/1000 [00:08<00:09, 53.26it/s]
47%|####7 | 474/1000 [00:08<00:09, 53.36it/s]
48%|####8 | 480/1000 [00:08<00:09, 53.72it/s]
49%|####8 | 486/1000 [00:09<00:09, 53.78it/s]
49%|####9 | 492/1000 [00:09<00:09, 53.43it/s]
50%|####9 | 498/1000 [00:09<00:09, 53.76it/s]
50%|##### | 504/1000 [00:09<00:09, 53.57it/s]
51%|#####1 | 510/1000 [00:09<00:09, 53.69it/s]
52%|#####1 | 516/1000 [00:09<00:09, 53.54it/s]
52%|#####2 | 522/1000 [00:09<00:08, 53.82it/s]
53%|#####2 | 528/1000 [00:09<00:08, 53.90it/s]
53%|#####3 | 534/1000 [00:09<00:08, 53.37it/s]
54%|#####4 | 540/1000 [00:10<00:08, 53.48it/s]
55%|#####4 | 546/1000 [00:10<00:08, 52.94it/s]
55%|#####5 | 552/1000 [00:10<00:08, 52.91it/s]
56%|#####5 | 558/1000 [00:10<00:08, 52.90it/s]
56%|#####6 | 564/1000 [00:10<00:08, 52.74it/s]
57%|#####6 | 570/1000 [00:10<00:08, 52.52it/s]
58%|#####7 | 576/1000 [00:10<00:08, 52.82it/s]
58%|#####8 | 582/1000 [00:10<00:07, 52.61it/s]
59%|#####8 | 588/1000 [00:10<00:07, 52.89it/s]
59%|#####9 | 594/1000 [00:11<00:07, 52.76it/s]
60%|###### | 600/1000 [00:11<00:07, 52.98it/s]
61%|###### | 606/1000 [00:11<00:07, 52.81it/s]
61%|######1 | 612/1000 [00:11<00:07, 52.40it/s]
62%|######1 | 618/1000 [00:11<00:07, 51.81it/s]
62%|######2 | 624/1000 [00:11<00:07, 51.31it/s]
63%|######3 | 630/1000 [00:11<00:07, 51.54it/s]
64%|######3 | 636/1000 [00:11<00:07, 51.14it/s]
64%|######4 | 642/1000 [00:11<00:07, 51.05it/s]
65%|######4 | 648/1000 [00:12<00:06, 51.67it/s]
65%|######5 | 654/1000 [00:12<00:06, 52.32it/s]
66%|######6 | 660/1000 [00:12<00:06, 52.46it/s]
67%|######6 | 666/1000 [00:12<00:06, 52.43it/s]
67%|######7 | 672/1000 [00:12<00:06, 52.56it/s]
68%|######7 | 678/1000 [00:12<00:06, 52.96it/s]
68%|######8 | 684/1000 [00:12<00:06, 52.33it/s]
69%|######9 | 690/1000 [00:12<00:05, 52.22it/s]
70%|######9 | 696/1000 [00:13<00:05, 52.25it/s]
70%|####### | 702/1000 [00:13<00:05, 52.49it/s]
71%|####### | 708/1000 [00:13<00:05, 52.70it/s]
71%|#######1 | 714/1000 [00:13<00:05, 52.66it/s]
72%|#######2 | 720/1000 [00:13<00:05, 52.60it/s]
73%|#######2 | 726/1000 [00:13<00:05, 52.46it/s]
73%|#######3 | 732/1000 [00:13<00:05, 52.95it/s]
74%|#######3 | 738/1000 [00:13<00:04, 52.68it/s]
74%|#######4 | 744/1000 [00:13<00:04, 52.89it/s]
75%|#######5 | 750/1000 [00:14<00:04, 52.99it/s]
76%|#######5 | 756/1000 [00:14<00:04, 53.40it/s]
76%|#######6 | 762/1000 [00:14<00:04, 53.07it/s]
77%|#######6 | 768/1000 [00:14<00:04, 52.63it/s]
77%|#######7 | 774/1000 [00:14<00:04, 52.51it/s]
78%|#######8 | 780/1000 [00:14<00:04, 52.65it/s]
79%|#######8 | 786/1000 [00:14<00:04, 52.54it/s]
79%|#######9 | 792/1000 [00:14<00:03, 52.59it/s]
80%|#######9 | 798/1000 [00:14<00:03, 52.78it/s]
80%|######## | 804/1000 [00:15<00:03, 53.02it/s]
81%|########1 | 810/1000 [00:15<00:03, 53.24it/s]
82%|########1 | 816/1000 [00:15<00:03, 52.41it/s]
82%|########2 | 822/1000 [00:15<00:03, 52.40it/s]
83%|########2 | 828/1000 [00:15<00:03, 52.50it/s]
83%|########3 | 834/1000 [00:15<00:03, 52.71it/s]
84%|########4 | 840/1000 [00:15<00:03, 52.98it/s]
85%|########4 | 846/1000 [00:15<00:02, 52.77it/s]
85%|########5 | 852/1000 [00:15<00:02, 52.69it/s]
86%|########5 | 858/1000 [00:16<00:02, 53.03it/s]
86%|########6 | 864/1000 [00:16<00:02, 52.64it/s]
87%|########7 | 870/1000 [00:16<00:02, 51.94it/s]
88%|########7 | 876/1000 [00:16<00:02, 52.32it/s]
88%|########8 | 882/1000 [00:16<00:02, 52.47it/s]
89%|########8 | 888/1000 [00:16<00:02, 52.29it/s]
89%|########9 | 894/1000 [00:16<00:02, 52.57it/s]
90%|######### | 900/1000 [00:16<00:01, 50.89it/s]
91%|######### | 906/1000 [00:17<00:01, 51.49it/s]
91%|#########1| 912/1000 [00:17<00:01, 52.10it/s]
92%|#########1| 918/1000 [00:17<00:01, 52.44it/s]
92%|#########2| 924/1000 [00:17<00:01, 52.74it/s]
93%|#########3| 930/1000 [00:17<00:01, 52.99it/s]
94%|#########3| 936/1000 [00:17<00:01, 52.70it/s]
94%|#########4| 942/1000 [00:17<00:01, 52.42it/s]
95%|#########4| 948/1000 [00:17<00:00, 52.36it/s]
95%|#########5| 954/1000 [00:17<00:00, 52.48it/s]
96%|#########6| 960/1000 [00:18<00:00, 52.52it/s]
97%|#########6| 966/1000 [00:18<00:00, 52.51it/s]
97%|#########7| 972/1000 [00:18<00:00, 52.71it/s]
98%|#########7| 978/1000 [00:18<00:00, 52.94it/s]
98%|#########8| 984/1000 [00:18<00:00, 52.82it/s]
99%|#########9| 990/1000 [00:18<00:00, 53.10it/s]
100%|#########9| 996/1000 [00:18<00:00, 53.02it/s]
100%|##########| 1000/1000 [00:18<00:00, 53.23it/s]
with torch.no_grad():
_, Z = sae(X_tensor)
Z = Z.detach().cpu().numpy().T
print(Z[0].max())26.120123
- Finally, we can look at articles that have especially high activations on subsets of articles. For example, it seems the first dictionary atom is mainly related to languages.
for k in range(10):
print(f"\nFeature {k}")
top_ix = np.argsort(Z[k])[-10:][::-1]
print("\n".join([" ".join(texts[i][:200].split()) + "..." for i in top_ix]))
Feature 0
We recognize that the Sacraments have a visible and invisible reality, a reality open to all the human senses but grasped in its God-given depths with the eyes of faith. When parents hug their childre...
Book traversal links for General Remarks On The Prophetic Word 1. Old Testament. There is at the outset a great distinction to make between the prophets. Some wrote before the captivity and called the...
Roman Catholics and Lutherans are both Christian faiths. Lutheranism falls under Protestants, a branch of Christianity, Just like Roman Catholicism. The history of the split between Roman Catholicism...
From the moment God gave directions about the building and the furnishing of the Tabernacle and Tent of Meeting, architecture and theology were married forever within the context of worship. Over time...
To teach Justification, Law and Gospel must both be taught and be properly divided Although it is likely a composite of many related quotations, Luther is credited with stating the following: “Justifi...
“And the LORD said to Moses, ‘Is the LORD’s hand shortened? Now you shall see whether My word will come true for you or not.’” (Exodus 11:23) On the journey to Canaan, God’s promised land, the Israeli...
What is a Book of God? To the various Prophets that God sent for people's guidance, He revealed His teachings. The Prophets made this revelation public, and their followers learnt it and passed it dow...
Your cart is currently empty! How Tall I Jesus The name Jesus refers to the Son of God, also known as Jesus Christ or Yeshua. Jesus is considered the biggest figure in religion, with over two billion...
Hollywood's depiction of the "end of the world" has been portrayed as a nuclear winter, an asteroid striking the earth or a cataclysmic natural disaster. In Christian theology, the subject of End Time...
To understand the “one and only Son”, we need to keep turning to The Father. In the John 3:16 Podcast, two New Testament experts explore the facets and fascinating detail of the Most Famous Verse in t...
Feature 1
Education & Career Trends: September 24, 2022 Curated by the Knowledge Team of ICS Career GPS Conflict can be a healthy part of personal and professional relationships. Extensive research has demonstr...
Founded by 30 members in December 1989, the current number of members has reached to about 1000. The need for the formation of the Biological Society of Ethiopia was recognized as far back as the earl...
Inflation & Your Money "If the current annual inflation rate is 3 percent, why do my bills seem like they're 10 percent higher than last year?"1 Many of us ask ourselves that question, and it illustra...
Thirteen centuries before Christ, just before the fall of Troy, and in the age of King Tutankhamun, a royal ship sailing through the eastern Mediterranean collided with a promontory now known as Ulubu...
We are excited to introduce our new “Kids Corner” blog series, a once-a-month blog post geared entirely towards kids! Building curiosity, compassion, and connection to animals among children is one of...
Housing Blog Original Source: In Ottawa, Ontario, the issue of housing affordability in Canada has worsened significantly during Justin Trudeau's eight-year tenure as the Prime Minister. In the past,...
Capital budgeting is a cornerstone in any organization’s financial decision-making process. It involves evaluating, selecting, and managing long-term investments that can significantly impact a compan...
Level 1 East 50 Grenfell St, Adelaide SA 5000 With digital media and liquid modernity being co-opted into a totalizing world network, Generation Alpha has become the first generation to be brought up...
The focal length is a measure of how a lens converges light. It can be used to know the magnification factor of the lens and given the size of the sensor, calculate the angle of view. A standard refer...
Scientists fear coronavirus may trigger diabetes in previously healthy people - here's what that means As we continue to learn new things about the novel coronavirus still wreaking havoc on the planet...
Feature 2
You are responsible for fire prevention at work for your safety and that of your co-workers. Be aware of and on the lookout for potential fire hazards. Report hazardous situations to the supervisor. K...
LAKE TAHOE, Calif./Nev., Oct. 12, 2021– Under the coordination of the Tahoe Fire & Fuels Team (TFFT), the Lake Tahoe Basin fall prescribed fire program may begin as early as November, weather and cond...
A Smokey and Hazy Day in Wyoming Monday Thanks to Wildfires Wildfires burning in Montana and Colorado and across the western United States are blanketing parts of Wyoming in a smokey haze Monday, July...
In the Northwest, as across the United States, political giving is an elite affair, heavily concentrated among one percenters and residents of affluent, white neighborhoods. Even in Seattle, which has...
Oak forests are valuable ecosystems that provide numerous ecological benefits. To preserve them, sustainable logging practices, effective forest fire management, wildlife conservation measures, and in...
Winter is a magical season with its snowy landscapes and the joy of the holiday season. However, it also brings challenges, especially for seniors. Cold weather, slippery sidewalks, and shorter daylig...
Brushing up on Missouri's rules of the road is a simple way to help keep yourself and other drivers safe. It may also help you avoid a ticket from a police officer. In this article, you can learn more...
BB guns and pellet guns act very similar to normal firearms, but are known for being significantly less powerful, leading you to ask, “Are BB guns and pellet guns considered firearms?” Strictly speaki...
Severe bleeding, penetrating wounds, hypothermia, and other life-threatening trauma can occur at any time, and require immediate medical attention. To better empower the general population of Prince G...
Mosquitoes are unwelcome guests that can disrupt our indoor spaces and make our lives uncomfortable. While we often associate mosquitoes with outdoor environments, they can still find their way indoor...
Feature 3
Sub Read ( hObject As Object [ , sKey As String, vDefault As Variant ] ) Initializes the specified object from the settings file. The settings must have been written with the This method can handle Wi...
COOKIES AND LOCAL STORAGE POLICY Navigating through this website with cookies and local storage mechanisms activated, implies in an essential way the acceptance of the use of same according to this po...
Ad reach is an estimate of the number of people within a location target, based on signed-in users. You can use the provided reach numbers to get a rough idea of how many people your ads could reach w...
What are cookies Cookies are small text files which a website may put on your computer or mobile device when you first visit a site or page. The cookie will help the website, or another website, to re...
In the realm of advertising, music plays a critical role in crafting a memorable message that resonates with audiences. The power of a catchy jingle or a well-chosen song can elevate an advertisement...
Have you ever found yourself mystified by the power of Google and wondered how search engines work? We are here to explain! In this guide, we will give you an introduction to how search engines work a...
INFORMATION ABOUT COOKIES Cookies are small text files that are stored on your computer or mobile phone when you use our webpage. WHAT IS A COOKIE? A cookie is a small text file that a website request...
In this constantly online world, we are continuously being bombarded with all forms of content – be it in the form of text, images, or videos. The type of content we are exposed to shapes our minds an...
This method is obsolete. See the Remarks section below. | Use the BuiltIn.SetKeyboardLayout method to set the desired keyboard layout for any process running in the operating system. The method is a s...
What are cookies? A cookie is a small amount of data sent to your computer or mobile phone from a website. This means the website can recognise your device (your computer or mobile phone) if you retur...
Feature 4
Education & Career Trends: September 24, 2022 Curated by the Knowledge Team of ICS Career GPS Conflict can be a healthy part of personal and professional relationships. Extensive research has demonstr...
Founded by 30 members in December 1989, the current number of members has reached to about 1000. The need for the formation of the Biological Society of Ethiopia was recognized as far back as the earl...
Inflation & Your Money "If the current annual inflation rate is 3 percent, why do my bills seem like they're 10 percent higher than last year?"1 Many of us ask ourselves that question, and it illustra...
Thirteen centuries before Christ, just before the fall of Troy, and in the age of King Tutankhamun, a royal ship sailing through the eastern Mediterranean collided with a promontory now known as Ulubu...
We are excited to introduce our new “Kids Corner” blog series, a once-a-month blog post geared entirely towards kids! Building curiosity, compassion, and connection to animals among children is one of...
Housing Blog Original Source: In Ottawa, Ontario, the issue of housing affordability in Canada has worsened significantly during Justin Trudeau's eight-year tenure as the Prime Minister. In the past,...
Capital budgeting is a cornerstone in any organization’s financial decision-making process. It involves evaluating, selecting, and managing long-term investments that can significantly impact a compan...
Level 1 East 50 Grenfell St, Adelaide SA 5000 With digital media and liquid modernity being co-opted into a totalizing world network, Generation Alpha has become the first generation to be brought up...
The focal length is a measure of how a lens converges light. It can be used to know the magnification factor of the lens and given the size of the sensor, calculate the angle of view. A standard refer...
Scientists fear coronavirus may trigger diabetes in previously healthy people - here's what that means As we continue to learn new things about the novel coronavirus still wreaking havoc on the planet...
Feature 5
ζ Canis Minoris (zeta Canis Minoris) ζ Canis Minoris is a bright giant star in the constellation of Canis Minor. ζ Canis Minoris visual magnitude is 5.14. Because of its reltive faintness, ζ Canis Min...
Australia is a country in the Southern hemisphere between the Pacific Ocean and the Indian Ocean. Its official name is the Commonwealth of Australia. Australia is the sixth biggest country in the worl...
An aspirator is a device that makes vacuum, because of the Venturi effect. In the aspirator, fluid flows through it. The tube gets more thin, making the fluid flow faster, and making the pressure smal...
During pregnancy, the fetus can be positioned in many different ways inside the mother's uterus. The fetus may be head up or down or facing the mother's back or front. At first, the fetus can move aro...
A Pragmatic Alliance Discusses the political cooperation between Jews and Lithuanians in the Tsarist Empire from the last decades of the 19th century until the early 1920s. These years saw the transfo...
Bacteriophages determine the composition of microbial populations by killing some bacteria and sparing others. Bacteriophages are typically host specific, a property that is largely determined at the...
Amazon Web Services (AWS) is a collection of remote computing services (also called web services) that together make up a cloud computing platform, offered over the Internet by Amazon.com. The most ce...
The constellation Piscis Austrinus - Other names / Symbolism - Southern Fisch - Southern hemisphere - July - September - 245 deg² - Brightest star - Formalhaut (HIP number 113368) The Piscis Austrinus...
For the next 50 years, the village of Stillwater was mostly German. The community was mostly centered on a union church shared by Lutheran and German Reformed congregations. The German population bega...
Racial and Ethnic Groups & Associated Definitions Due to an alarming increase in racial hatred and violence against Asians Americans in particular, arising from the Covid-19 pandemic, President Biden...
Feature 6
The Unseen Potential of AI: Paralyzed Individual Regains Mobility with Innovative Tech 💡🚶♂️ TL;DR: 🎯 Utilizing advanced artificial intelligence (AI), a man paralyzed for more than a decade has begun t...
We all know how important great posture and balance are for dancers, but not everyone is necessarily aware of the role of proprioception (kinesthesia) training in achieving these qualities. Before we...
A position bowl can be placed deliberately or come to rest accidentally in a place that prevents the opponent from drawing to the jack. A position bowl can also be a “stand by” bowl, for instance in c...
Q: Why does my dog eat poop? Is there something missing from his diet? A: It is baffling to most people why their sweet precious pup would stoop to the level of eating his own poop, especially when he...
This is mainly a video lecture course with additional reading material which explains the importance of core strength development for young children particularly for those involved in sports. The athl...
Dancing Shapes: Ballet and Body Awareness for Young Dancers by Once Upon a Dance Dancing Shapes is a beautiful book created during the pandemic by an award-winning dance teacher and her ballerina daug...
We often get asked ‘Why Us’ when we chat with new pet parents. One reason is very simple, we are pet parents too! Which really helps us get to grips with our clients needs. The Home Pet People team re...
Flexibility training is an essential component of a healthy running routine. Regularly carving out time for flexibility exercises can help keep muscles flexible and maintain joint range of motion. If...
As aware individuals, we all understand the importance of being mindful of our health and wellness. But the question remains- how do we truly assess how fit and healthy we are? Although we may have a...
What is the difference between potential and kinetic energy What is the difference between potential and kinetic energy? The main difference between potential and kinetic energy is that one is the ene...
Feature 7
A record number of students from disadvantaged areas are securing places at Scottish universities, but the increase has not been at the expense of entrants from less deprived areas, according to new f...
WE’VE ALL been there: quickly checking our smartphones; drumming our fingers on the steering wheel; willing the red light to change to green. If ever you’ve felt that traffic lights seem to be taking...
The 2020s seem to be the era of housing misery (at least for the two thirds of us who don’t own their homes outright). Mortgage costs are rising, as are rents for private tenants and in social housing...
There are few boasts that Conservative politicians enjoy making more than “income inequality has fallen”. It suggests that austerity, far from widening the gap between the rich and the poor, has reduc...
Who enforces the Data Protection Act? Who enforces UK GDPR? And who can you ask about data protection legislation? Who enforces the UK’s Data Protection Act? The Information Commissioner’s Office (ICO...
The Badgers Act 1991 and Protection of Badgers Act 1992 This Act provides comprehensive protection for badgers and their setts in England and Wales. Under this Act, It is illegal to kill or harm badge...
The below are local and national organisations that can offer further support and advice on Bullying and Harassment. National Bullying Helpline is a national helpline for adults and children experienc...
11 February 2022 Illuminating the future of roads Trials have been carried out by National Highways, which have seen the future of roads explored. The research has focused on how intelligent street li...
National Strategy Group for Hunger Prevention in Schools The National Strategy Group for Hunger Prevention in Schools, established by the Educational Disadvantage Centre in 2013-2014, is composed of t...
TEN new heritage panels have been installed across the city, helping to bring Leicester’s extensive history to life. The colourful information panels have been commissioned to give residents and visit...
Feature 8
Education & Career Trends: September 24, 2022 Curated by the Knowledge Team of ICS Career GPS Conflict can be a healthy part of personal and professional relationships. Extensive research has demonstr...
Founded by 30 members in December 1989, the current number of members has reached to about 1000. The need for the formation of the Biological Society of Ethiopia was recognized as far back as the earl...
Inflation & Your Money "If the current annual inflation rate is 3 percent, why do my bills seem like they're 10 percent higher than last year?"1 Many of us ask ourselves that question, and it illustra...
Thirteen centuries before Christ, just before the fall of Troy, and in the age of King Tutankhamun, a royal ship sailing through the eastern Mediterranean collided with a promontory now known as Ulubu...
We are excited to introduce our new “Kids Corner” blog series, a once-a-month blog post geared entirely towards kids! Building curiosity, compassion, and connection to animals among children is one of...
Housing Blog Original Source: In Ottawa, Ontario, the issue of housing affordability in Canada has worsened significantly during Justin Trudeau's eight-year tenure as the Prime Minister. In the past,...
Capital budgeting is a cornerstone in any organization’s financial decision-making process. It involves evaluating, selecting, and managing long-term investments that can significantly impact a compan...
Level 1 East 50 Grenfell St, Adelaide SA 5000 With digital media and liquid modernity being co-opted into a totalizing world network, Generation Alpha has become the first generation to be brought up...
The focal length is a measure of how a lens converges light. It can be used to know the magnification factor of the lens and given the size of the sensor, calculate the angle of view. A standard refer...
Scientists fear coronavirus may trigger diabetes in previously healthy people - here's what that means As we continue to learn new things about the novel coronavirus still wreaking havoc on the planet...
Feature 9
Fishing spiders and wolf spiders are two common types of spiders that are often mistaken for one another. They share similarities in size, shape, and coloration, which can make it difficult for the ca...
Welcome to Mountain View Safe routes to Schools! In 2011, the City launched a Safe Routes to School (SRTS) program to promote walking and bicycling to school for Mountain View students and families. W...
Stuttering, also known as stammering, is a speech disorder in which the flow of speech is disrupted by involuntary repetitions and prolongations of sounds, syllables, words or phrases as well as invol...
Discovering Billions of Ancestors & Automating Family Research. Researching one’s own genealogy and ancestry can provide numerous benefits and fulfill various personal, emotional, and practical purpos...
The sight of gross growth on groceries can be disconcerting and raises concerns about food safety and hygiene. In this blog post, we delve into the causes, potential risks, and preventive measures to...
12 Feb SPATIAL COMPUTING EXPERT GUIDE Spatial computing speaks to interactions between humans, machines and the environment across both digital and physical spaces. It builds on extended reality (XR)...
Geomembrane welding process Geomembrane welding is a crucial process used in the installation and construction of geosynthetic liners for various applications such as landfill liners, mining ponds, ag...
Students will develop an understanding of the eat well guide and a healthy, balanced diet. They will then understand food preparation skills with a knowledge of food hygiene and safe practice. Student...
This week, we continue with our four-part series on Alzheimer’s and look at the signs and symptoms of Alzheimer’s, and how to get a diagnosis. As we know, Alzheimer’s disease is a type of brain disord...
Computing technology continues to advance at an astounding rate, with new breakthroughs and innovations regularly emerging from the field of computer science. From artificial intelligence (AI) to quan...